nreimers commited on
Commit
65b0f39
1 Parent(s): e018959

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -9
README.md CHANGED
@@ -4,13 +4,59 @@ tags:
4
  - sentence-transformers
5
  - feature-extraction
6
  - sentence-similarity
 
 
7
  ---
8
 
9
- # {MODEL_NAME}
10
 
11
  This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
12
 
13
- <!--- Describe your model here -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
  ## Usage (Sentence-Transformers)
16
 
@@ -26,28 +72,26 @@ Then you can use the model like this:
26
  from sentence_transformers import SentenceTransformer
27
  sentences = ["This is an example sentence", "Each sentence is converted"]
28
 
29
- model = SentenceTransformer('{MODEL_NAME}')
30
  embeddings = model.encode(sentences)
31
  print(embeddings)
32
  ```
33
 
34
 
 
35
 
36
- ## Evaluation Results
37
-
38
- <!--- Describe how your model was evaluated -->
39
 
40
- For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name={MODEL_NAME})
41
 
42
 
43
- ## Training
44
  The model was trained with the parameters:
45
 
46
  **DataLoader**:
47
 
48
  `MultiDatasetDataLoader.MultiDatasetDataLoader` of length 5371 with parameters:
49
  ```
50
- {'batch_size': 'unknown'}
51
  ```
52
 
53
  **Loss**:
 
4
  - sentence-transformers
5
  - feature-extraction
6
  - sentence-similarity
7
+ datasets:
8
+ - code_search_net
9
  ---
10
 
11
+ # flax-sentence-embeddings/st-codesearch-distilroberta-base
12
 
13
  This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
14
 
15
+ It was trained on the [code_search_net](https://huggingface.co/datasets/code_search_net) dataset and can be used to search program code given text.
16
+
17
+ ## Usage:
18
+
19
+ ```python
20
+ from sentence_transformers import SentenceTransformer, util
21
+
22
+
23
+ #This list the defines the different programm codes
24
+ code = ["""def sort_list(x):
25
+ return sorted(x)""",
26
+ """def count_above_threshold(elements, threshold=0):
27
+ counter = 0
28
+ for e in elements:
29
+ if e > threshold:
30
+ counter += 1
31
+ return counter""",
32
+ """def find_min_max(elements):
33
+ min_ele = 99999
34
+ max_ele = -99999
35
+ for e in elements:
36
+ if e < min_ele:
37
+ min_ele = e
38
+ if e > max_ele:
39
+ max_ele = e
40
+ return min_ele, max_ele"""]
41
+
42
+
43
+ model = SentenceTransformer("flax-sentence-embeddings/st-codesearch-distilroberta-base")
44
+
45
+ # Encode our code into the vector space
46
+ code_emb = model.encode(code, convert_to_tensor=True)
47
+
48
+ # Interactive demo: Enter queries, and the method returns the best function from the
49
+ # 3 functions we defined
50
+ while True:
51
+ query = input("Query: ")
52
+ query_emb = model.encode(query, convert_to_tensor=True)
53
+ hits = util.semantic_search(query_emb, code_emb)[0]
54
+ top_hit = hits[0]
55
+
56
+ print("Cossim: {:.2f}".format(top_hit['score']))
57
+ print(code[top_hit['corpus_id']])
58
+ print("\n\n")
59
+ ```
60
 
61
  ## Usage (Sentence-Transformers)
62
 
 
72
  from sentence_transformers import SentenceTransformer
73
  sentences = ["This is an example sentence", "Each sentence is converted"]
74
 
75
+ model = SentenceTransformer('flax-sentence-embeddings/st-codesearch-distilroberta-base')
76
  embeddings = model.encode(sentences)
77
  print(embeddings)
78
  ```
79
 
80
 
81
+ ## Training
82
 
83
+ The model was trained with a DistilRoBERTa-base model for 10k training steps on the codesearch dataset with batch_size 256 and MultipleNegativesRankingLoss.
 
 
84
 
85
+ It is some preliminary model. It was neither tested nor was the trained quite sophisticated
86
 
87
 
 
88
  The model was trained with the parameters:
89
 
90
  **DataLoader**:
91
 
92
  `MultiDatasetDataLoader.MultiDatasetDataLoader` of length 5371 with parameters:
93
  ```
94
+ {'batch_size': 256}
95
  ```
96
 
97
  **Loss**: