bobox commited on
Commit
f0c76d4
1 Parent(s): 35045af

AdaptiveLayerLoss(model=model,

Browse files

loss=train_loss,
n_layers_per_step = 1,
last_layer_weight = 1,
prior_layers_weight= 1,
kl_div_weight = 1,
kl_temperature= 1,
)''')
lr = 1e-6. batch = 42, schedule = cosine

Files changed (2) hide show
  1. README.md +325 -109
  2. pytorch_model.bin +1 -1
README.md CHANGED
@@ -7,7 +7,7 @@ tags:
7
  - sentence-similarity
8
  - feature-extraction
9
  - generated_from_trainer
10
- - dataset_size:314315
11
  - loss:AdaptiveLayerLoss
12
  - loss:MultipleNegativesRankingLoss
13
  base_model: microsoft/deberta-v3-small
@@ -49,34 +49,44 @@ metrics:
49
  - max_precision
50
  - max_recall
51
  - max_ap
 
 
 
 
 
 
 
 
 
 
52
  widget:
53
- - source_sentence: A man plays the violin.
54
  sentences:
55
- - A man is playing violin.
56
- - The back of a pig under a tree with a cow in the background.
57
- - The plane is getting ready to take off.
58
- - source_sentence: A person drops a camera down an escelator.
 
59
  sentences:
60
- - Something is bothering your cat and he does not like it.
61
- - A man tosses a bag down an escalator.
62
- - Two smiling women holding a baby.
63
- - source_sentence: One football player tries to tackle a player on the opposing team.
 
64
  sentences:
65
- - I think Stephen King's comments are helpful in this regard.
66
- - Our interactions are merely depends on where we put our perception.
67
- - A football player attempts a tackle.
68
- - source_sentence: The two men are wearing jeans.
69
  sentences:
70
- - Four people eating dessert around a table.
71
- - Here are some things that worked with my son who started toilet training around
72
- 2.5 years.
73
- - The two men are wearing pants.
74
- - source_sentence: This may be overly obvious, but in American English, saying "you're
75
- welcome" is certainly polite and standard.
76
  sentences:
77
- - I'm not sure how "Not at all" sounds in response to "thank you".
78
- - As bikeboy389 said, you can learn a lot by looking at students' native languages.
79
- - A laptop and a PC at a workstation.
80
  pipeline_tag: sentence-similarity
81
  model-index:
82
  - name: SentenceTransformer based on microsoft/deberta-v3-small
@@ -89,110 +99,147 @@ model-index:
89
  type: unknown
90
  metrics:
91
  - type: cosine_accuracy
92
- value: 0.5397679884752445
93
  name: Cosine Accuracy
94
  - type: cosine_accuracy_threshold
95
- value: 0.9089176654815674
96
  name: Cosine Accuracy Threshold
97
  - type: cosine_f1
98
- value: 0.6834040429248815
99
  name: Cosine F1
100
  - type: cosine_f1_threshold
101
- value: 0.3752323389053345
102
  name: Cosine F1 Threshold
103
  - type: cosine_precision
104
- value: 0.5191082802547771
105
  name: Cosine Precision
106
  - type: cosine_recall
107
- value: 0.9998539506353147
108
  name: Cosine Recall
109
  - type: cosine_ap
110
- value: 0.5794582374804604
111
  name: Cosine Ap
112
  - type: dot_accuracy
113
- value: 0.5302903935097429
114
  name: Dot Accuracy
115
  - type: dot_accuracy_threshold
116
- value: 391.4422302246094
117
  name: Dot Accuracy Threshold
118
  - type: dot_f1
119
- value: 0.6834040429248815
120
  name: Dot F1
121
  - type: dot_f1_threshold
122
- value: 175.07894897460938
123
  name: Dot F1 Threshold
124
  - type: dot_precision
125
- value: 0.5191082802547771
126
  name: Dot Precision
127
  - type: dot_recall
128
- value: 0.9998539506353147
129
  name: Dot Recall
130
  - type: dot_ap
131
- value: 0.5621671154600225
132
  name: Dot Ap
133
  - type: manhattan_accuracy
134
- value: 0.5644855561452726
135
  name: Manhattan Accuracy
136
  - type: manhattan_accuracy_threshold
137
- value: 160.045654296875
138
  name: Manhattan Accuracy Threshold
139
  - type: manhattan_f1
140
- value: 0.6834381551362683
141
  name: Manhattan F1
142
  - type: manhattan_f1_threshold
143
- value: 322.75946044921875
144
  name: Manhattan F1 Threshold
145
  - type: manhattan_precision
146
- value: 0.5191476454083567
147
  name: Manhattan Precision
148
  - type: manhattan_recall
149
- value: 0.9998539506353147
150
  name: Manhattan Recall
151
  - type: manhattan_ap
152
- value: 0.6033119142961784
153
  name: Manhattan Ap
154
  - type: euclidean_accuracy
155
- value: 0.5387064978391084
156
  name: Euclidean Accuracy
157
  - type: euclidean_accuracy_threshold
158
- value: 8.973075866699219
159
  name: Euclidean Accuracy Threshold
160
  - type: euclidean_f1
161
- value: 0.6834065495207667
162
  name: Euclidean F1
163
  - type: euclidean_f1_threshold
164
- value: 24.51708221435547
165
  name: Euclidean F1 Threshold
166
  - type: euclidean_precision
167
- value: 0.5191505498672734
168
  name: Euclidean Precision
169
  - type: euclidean_recall
170
- value: 0.9997079012706295
171
  name: Euclidean Recall
172
  - type: euclidean_ap
173
- value: 0.577277049262529
174
  name: Euclidean Ap
175
  - type: max_accuracy
176
- value: 0.5644855561452726
177
  name: Max Accuracy
178
  - type: max_accuracy_threshold
179
- value: 391.4422302246094
180
  name: Max Accuracy Threshold
181
  - type: max_f1
182
- value: 0.6834381551362683
183
  name: Max F1
184
  - type: max_f1_threshold
185
- value: 322.75946044921875
186
  name: Max F1 Threshold
187
  - type: max_precision
188
- value: 0.5191505498672734
189
  name: Max Precision
190
  - type: max_recall
191
- value: 0.9998539506353147
192
  name: Max Recall
193
  - type: max_ap
194
- value: 0.6033119142961784
195
  name: Max Ap
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
196
  ---
197
 
198
  # SentenceTransformer based on microsoft/deberta-v3-small
@@ -245,9 +292,9 @@ from sentence_transformers import SentenceTransformer
245
  model = SentenceTransformer("bobox/DeBERTaV3-small-ST-AdaptiveLayers-ep2")
246
  # Run inference
247
  sentences = [
248
- 'This may be overly obvious, but in American English, saying "you\'re welcome" is certainly polite and standard.',
249
- 'I\'m not sure how "Not at all" sounds in response to "thank you".',
250
- "As bikeboy389 said, you can learn a lot by looking at students' native languages.",
251
  ]
252
  embeddings = model.encode(sentences)
253
  print(embeddings.shape)
@@ -293,41 +340,58 @@ You can finetune this model on your own dataset.
293
 
294
  | Metric | Value |
295
  |:-----------------------------|:-----------|
296
- | cosine_accuracy | 0.5398 |
297
- | cosine_accuracy_threshold | 0.9089 |
298
- | cosine_f1 | 0.6834 |
299
- | cosine_f1_threshold | 0.3752 |
300
- | cosine_precision | 0.5191 |
301
- | cosine_recall | 0.9999 |
302
- | cosine_ap | 0.5795 |
303
- | dot_accuracy | 0.5303 |
304
- | dot_accuracy_threshold | 391.4422 |
305
- | dot_f1 | 0.6834 |
306
- | dot_f1_threshold | 175.0789 |
307
- | dot_precision | 0.5191 |
308
- | dot_recall | 0.9999 |
309
- | dot_ap | 0.5622 |
310
- | manhattan_accuracy | 0.5645 |
311
- | manhattan_accuracy_threshold | 160.0457 |
312
- | manhattan_f1 | 0.6834 |
313
- | manhattan_f1_threshold | 322.7595 |
314
- | manhattan_precision | 0.5191 |
315
- | manhattan_recall | 0.9999 |
316
- | manhattan_ap | 0.6033 |
317
- | euclidean_accuracy | 0.5387 |
318
- | euclidean_accuracy_threshold | 8.9731 |
319
- | euclidean_f1 | 0.6834 |
320
- | euclidean_f1_threshold | 24.5171 |
321
- | euclidean_precision | 0.5192 |
322
- | euclidean_recall | 0.9997 |
323
- | euclidean_ap | 0.5773 |
324
- | max_accuracy | 0.5645 |
325
- | max_accuracy_threshold | 391.4422 |
326
- | max_f1 | 0.6834 |
327
- | max_f1_threshold | 322.7595 |
328
- | max_precision | 0.5192 |
329
- | max_recall | 0.9999 |
330
- | **max_ap** | **0.6033** |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
331
 
332
  <!--
333
  ## Bias, Risks and Limitations
@@ -348,19 +412,19 @@ You can finetune this model on your own dataset.
348
  #### stanfordnlp/snli
349
 
350
  * Dataset: [stanfordnlp/snli](https://huggingface.co/datasets/stanfordnlp/snli) at [cdb5c3d](https://huggingface.co/datasets/stanfordnlp/snli/tree/cdb5c3d5eed6ead6e5a341c8e56e669bb666725b)
351
- * Size: 314,315 training samples
352
  * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
353
  * Approximate statistics based on the first 1000 samples:
354
- | | sentence1 | sentence2 | label |
355
- |:--------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:-----------------------------|
356
- | type | string | string | int |
357
- | details | <ul><li>min: 5 tokens</li><li>mean: 16.62 tokens</li><li>max: 62 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 9.46 tokens</li><li>max: 29 tokens</li></ul> | <ul><li>0: 100.00%</li></ul> |
358
  * Samples:
359
- | sentence1 | sentence2 | label |
360
- |:---------------------------------------------------------------------------|:-------------------------------------------------|:---------------|
361
- | <code>A person on a horse jumps over a broken down airplane.</code> | <code>A person is outdoors, on a horse.</code> | <code>0</code> |
362
- | <code>Children smiling and waving at camera</code> | <code>There are children present</code> | <code>0</code> |
363
- | <code>A boy is jumping on skateboard in the middle of a red bridge.</code> | <code>The boy does a skateboarding trick.</code> | <code>0</code> |
364
  * Loss: [<code>AdaptiveLayerLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#adaptivelayerloss) with these parameters:
365
  ```json
366
  {
@@ -403,10 +467,162 @@ You can finetune this model on your own dataset.
403
  }
404
  ```
405
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
406
  ### Training Logs
407
- | Epoch | Step | loss | max_ap |
408
- |:-----:|:----:|:------:|:------:|
409
- | None | 0 | 4.6204 | 0.6033 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
410
 
411
 
412
  ### Framework Versions
 
7
  - sentence-similarity
8
  - feature-extraction
9
  - generated_from_trainer
10
+ - dataset_size:67190
11
  - loss:AdaptiveLayerLoss
12
  - loss:MultipleNegativesRankingLoss
13
  base_model: microsoft/deberta-v3-small
 
49
  - max_precision
50
  - max_recall
51
  - max_ap
52
+ - pearson_cosine
53
+ - spearman_cosine
54
+ - pearson_manhattan
55
+ - spearman_manhattan
56
+ - pearson_euclidean
57
+ - spearman_euclidean
58
+ - pearson_dot
59
+ - spearman_dot
60
+ - pearson_max
61
+ - spearman_max
62
  widget:
63
+ - source_sentence: A worker peers out from atop a building under construction.
64
  sentences:
65
+ - The man pleads for mercy.
66
+ - People and a baby crossing the street.
67
+ - A person is atop of a building.
68
+ - source_sentence: An aisle at Best Buy with an employee standing at the computer
69
+ and a Geek Squad sign in the background.
70
  sentences:
71
+ - the man is watching the stars
72
+ - The employee is wearing a blue shirt.
73
+ - A person balancing.
74
+ - source_sentence: A man with a long white beard is examining a camera and another
75
+ man with a black shirt is in the background.
76
  sentences:
77
+ - a man is with another man
78
+ - Children in uniforms climb a tower.
79
+ - There are five children.
80
+ - source_sentence: A black dog with a blue collar is jumping into the water.
81
  sentences:
82
+ - The dog is playing tug of war with a stick.
83
+ - There is a woman painting.
84
+ - A black dog wearing a blue collar is chasing something into the water.
85
+ - source_sentence: A wet child stands in chest deep ocean water.
 
 
86
  sentences:
87
+ - A woman paints a portrait of her best friend.
88
+ - A person in red is cutting the grass on a riding mower
89
+ - The child s playing on the beach.
90
  pipeline_tag: sentence-similarity
91
  model-index:
92
  - name: SentenceTransformer based on microsoft/deberta-v3-small
 
99
  type: unknown
100
  metrics:
101
  - type: cosine_accuracy
102
+ value: 0.6583157259281618
103
  name: Cosine Accuracy
104
  - type: cosine_accuracy_threshold
105
+ value: 0.6766541004180908
106
  name: Cosine Accuracy Threshold
107
  - type: cosine_f1
108
+ value: 0.7049362860324137
109
  name: Cosine F1
110
  - type: cosine_f1_threshold
111
+ value: 0.6017583012580872
112
  name: Cosine F1 Threshold
113
  - type: cosine_precision
114
+ value: 0.6115046147241897
115
  name: Cosine Precision
116
  - type: cosine_recall
117
+ value: 0.8320677570093458
118
  name: Cosine Recall
119
  - type: cosine_ap
120
+ value: 0.6995030811464378
121
  name: Cosine Ap
122
  - type: dot_accuracy
123
+ value: 0.6272260790824027
124
  name: Dot Accuracy
125
  - type: dot_accuracy_threshold
126
+ value: 163.25054931640625
127
  name: Dot Accuracy Threshold
128
  - type: dot_f1
129
+ value: 0.6976381461675579
130
  name: Dot F1
131
  - type: dot_f1_threshold
132
+ value: 119.20779418945312
133
  name: Dot F1 Threshold
134
  - type: dot_precision
135
+ value: 0.5639409221902018
136
  name: Dot Precision
137
  - type: dot_recall
138
+ value: 0.914427570093458
139
  name: Dot Recall
140
  - type: dot_ap
141
+ value: 0.643747511442345
142
  name: Dot Ap
143
  - type: manhattan_accuracy
144
+ value: 0.6571083610021129
145
  name: Manhattan Accuracy
146
  - type: manhattan_accuracy_threshold
147
+ value: 243.75453186035156
148
  name: Manhattan Accuracy Threshold
149
  - type: manhattan_f1
150
+ value: 0.7055783910745744
151
  name: Manhattan F1
152
  - type: manhattan_f1_threshold
153
+ value: 295.95947265625
154
  name: Manhattan F1 Threshold
155
  - type: manhattan_precision
156
+ value: 0.5900608917697898
157
  name: Manhattan Precision
158
  - type: manhattan_recall
159
+ value: 0.8773364485981309
160
  name: Manhattan Recall
161
  - type: manhattan_ap
162
+ value: 0.7072033306346501
163
  name: Manhattan Ap
164
  - type: euclidean_accuracy
165
+ value: 0.6590703290069424
166
  name: Euclidean Accuracy
167
  - type: euclidean_accuracy_threshold
168
+ value: 12.141830444335938
169
  name: Euclidean Accuracy Threshold
170
  - type: euclidean_f1
171
+ value: 0.7036813518406759
172
  name: Euclidean F1
173
  - type: euclidean_f1_threshold
174
+ value: 14.197540283203125
175
  name: Euclidean F1 Threshold
176
  - type: euclidean_precision
177
+ value: 0.5996708496194199
178
  name: Euclidean Precision
179
  - type: euclidean_recall
180
+ value: 0.8513434579439252
181
  name: Euclidean Recall
182
  - type: euclidean_ap
183
+ value: 0.7035256676322055
184
  name: Euclidean Ap
185
  - type: max_accuracy
186
+ value: 0.6590703290069424
187
  name: Max Accuracy
188
  - type: max_accuracy_threshold
189
+ value: 243.75453186035156
190
  name: Max Accuracy Threshold
191
  - type: max_f1
192
+ value: 0.7055783910745744
193
  name: Max F1
194
  - type: max_f1_threshold
195
+ value: 295.95947265625
196
  name: Max F1 Threshold
197
  - type: max_precision
198
+ value: 0.6115046147241897
199
  name: Max Precision
200
  - type: max_recall
201
+ value: 0.914427570093458
202
  name: Max Recall
203
  - type: max_ap
204
+ value: 0.7072033306346501
205
  name: Max Ap
206
+ - task:
207
+ type: semantic-similarity
208
+ name: Semantic Similarity
209
+ dataset:
210
+ name: Unknown
211
+ type: unknown
212
+ metrics:
213
+ - type: pearson_cosine
214
+ value: 0.732169941341086
215
+ name: Pearson Cosine
216
+ - type: spearman_cosine
217
+ value: 0.7344587206087978
218
+ name: Spearman Cosine
219
+ - type: pearson_manhattan
220
+ value: 0.7537099624360986
221
+ name: Pearson Manhattan
222
+ - type: spearman_manhattan
223
+ value: 0.7550555196955944
224
+ name: Spearman Manhattan
225
+ - type: pearson_euclidean
226
+ value: 0.7468210439584286
227
+ name: Pearson Euclidean
228
+ - type: spearman_euclidean
229
+ value: 0.74849026008206
230
+ name: Spearman Euclidean
231
+ - type: pearson_dot
232
+ value: 0.6142835401925993
233
+ name: Pearson Dot
234
+ - type: spearman_dot
235
+ value: 0.6100201108417316
236
+ name: Spearman Dot
237
+ - type: pearson_max
238
+ value: 0.7537099624360986
239
+ name: Pearson Max
240
+ - type: spearman_max
241
+ value: 0.7550555196955944
242
+ name: Spearman Max
243
  ---
244
 
245
  # SentenceTransformer based on microsoft/deberta-v3-small
 
292
  model = SentenceTransformer("bobox/DeBERTaV3-small-ST-AdaptiveLayers-ep2")
293
  # Run inference
294
  sentences = [
295
+ 'A wet child stands in chest deep ocean water.',
296
+ 'The child s playing on the beach.',
297
+ 'A woman paints a portrait of her best friend.',
298
  ]
299
  embeddings = model.encode(sentences)
300
  print(embeddings.shape)
 
340
 
341
  | Metric | Value |
342
  |:-----------------------------|:-----------|
343
+ | cosine_accuracy | 0.6583 |
344
+ | cosine_accuracy_threshold | 0.6767 |
345
+ | cosine_f1 | 0.7049 |
346
+ | cosine_f1_threshold | 0.6018 |
347
+ | cosine_precision | 0.6115 |
348
+ | cosine_recall | 0.8321 |
349
+ | cosine_ap | 0.6995 |
350
+ | dot_accuracy | 0.6272 |
351
+ | dot_accuracy_threshold | 163.2505 |
352
+ | dot_f1 | 0.6976 |
353
+ | dot_f1_threshold | 119.2078 |
354
+ | dot_precision | 0.5639 |
355
+ | dot_recall | 0.9144 |
356
+ | dot_ap | 0.6437 |
357
+ | manhattan_accuracy | 0.6571 |
358
+ | manhattan_accuracy_threshold | 243.7545 |
359
+ | manhattan_f1 | 0.7056 |
360
+ | manhattan_f1_threshold | 295.9595 |
361
+ | manhattan_precision | 0.5901 |
362
+ | manhattan_recall | 0.8773 |
363
+ | manhattan_ap | 0.7072 |
364
+ | euclidean_accuracy | 0.6591 |
365
+ | euclidean_accuracy_threshold | 12.1418 |
366
+ | euclidean_f1 | 0.7037 |
367
+ | euclidean_f1_threshold | 14.1975 |
368
+ | euclidean_precision | 0.5997 |
369
+ | euclidean_recall | 0.8513 |
370
+ | euclidean_ap | 0.7035 |
371
+ | max_accuracy | 0.6591 |
372
+ | max_accuracy_threshold | 243.7545 |
373
+ | max_f1 | 0.7056 |
374
+ | max_f1_threshold | 295.9595 |
375
+ | max_precision | 0.6115 |
376
+ | max_recall | 0.9144 |
377
+ | **max_ap** | **0.7072** |
378
+
379
+ #### Semantic Similarity
380
+
381
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
382
+
383
+ | Metric | Value |
384
+ |:--------------------|:-----------|
385
+ | pearson_cosine | 0.7322 |
386
+ | **spearman_cosine** | **0.7345** |
387
+ | pearson_manhattan | 0.7537 |
388
+ | spearman_manhattan | 0.7551 |
389
+ | pearson_euclidean | 0.7468 |
390
+ | spearman_euclidean | 0.7485 |
391
+ | pearson_dot | 0.6143 |
392
+ | spearman_dot | 0.61 |
393
+ | pearson_max | 0.7537 |
394
+ | spearman_max | 0.7551 |
395
 
396
  <!--
397
  ## Bias, Risks and Limitations
 
412
  #### stanfordnlp/snli
413
 
414
  * Dataset: [stanfordnlp/snli](https://huggingface.co/datasets/stanfordnlp/snli) at [cdb5c3d](https://huggingface.co/datasets/stanfordnlp/snli/tree/cdb5c3d5eed6ead6e5a341c8e56e669bb666725b)
415
+ * Size: 67,190 training samples
416
  * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
417
  * Approximate statistics based on the first 1000 samples:
418
+ | | sentence1 | sentence2 | label |
419
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-----------------------------|
420
+ | type | string | string | int |
421
+ | details | <ul><li>min: 4 tokens</li><li>mean: 21.19 tokens</li><li>max: 133 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 11.77 tokens</li><li>max: 49 tokens</li></ul> | <ul><li>0: 100.00%</li></ul> |
422
  * Samples:
423
+ | sentence1 | sentence2 | label |
424
+ |:---------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|:---------------|
425
+ | <code>Without a placebo group, we still won't know if any of the treatments are better than nothing and therefore worth giving.</code> | <code>It is necessary to use a controlled method to ensure the treatments are worthwhile.</code> | <code>0</code> |
426
+ | <code>It was conducted in silence.</code> | <code>It was done silently.</code> | <code>0</code> |
427
+ | <code>oh Lewisville any decent food in your cafeteria up there</code> | <code>Is there any decent food in your cafeteria up there in Lewisville?</code> | <code>0</code> |
428
  * Loss: [<code>AdaptiveLayerLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#adaptivelayerloss) with these parameters:
429
  ```json
430
  {
 
467
  }
468
  ```
469
 
470
+ ### Training Hyperparameters
471
+ #### Non-Default Hyperparameters
472
+
473
+ - `eval_strategy`: steps
474
+ - `per_device_train_batch_size`: 42
475
+ - `per_device_eval_batch_size`: 22
476
+ - `learning_rate`: 3e-06
477
+ - `weight_decay`: 1e-08
478
+ - `num_train_epochs`: 2
479
+ - `lr_scheduler_type`: cosine
480
+ - `warmup_ratio`: 0.5
481
+ - `save_safetensors`: False
482
+ - `fp16`: True
483
+ - `hub_model_id`: bobox/DeBERTaV3-small-ST-AdaptiveLayers-ep2-tmp
484
+ - `hub_strategy`: checkpoint
485
+ - `hub_private_repo`: True
486
+ - `batch_sampler`: no_duplicates
487
+
488
+ #### All Hyperparameters
489
+ <details><summary>Click to expand</summary>
490
+
491
+ - `overwrite_output_dir`: False
492
+ - `do_predict`: False
493
+ - `eval_strategy`: steps
494
+ - `prediction_loss_only`: True
495
+ - `per_device_train_batch_size`: 42
496
+ - `per_device_eval_batch_size`: 22
497
+ - `per_gpu_train_batch_size`: None
498
+ - `per_gpu_eval_batch_size`: None
499
+ - `gradient_accumulation_steps`: 1
500
+ - `eval_accumulation_steps`: None
501
+ - `learning_rate`: 3e-06
502
+ - `weight_decay`: 1e-08
503
+ - `adam_beta1`: 0.9
504
+ - `adam_beta2`: 0.999
505
+ - `adam_epsilon`: 1e-08
506
+ - `max_grad_norm`: 1.0
507
+ - `num_train_epochs`: 2
508
+ - `max_steps`: -1
509
+ - `lr_scheduler_type`: cosine
510
+ - `lr_scheduler_kwargs`: {}
511
+ - `warmup_ratio`: 0.5
512
+ - `warmup_steps`: 0
513
+ - `log_level`: passive
514
+ - `log_level_replica`: warning
515
+ - `log_on_each_node`: True
516
+ - `logging_nan_inf_filter`: True
517
+ - `save_safetensors`: False
518
+ - `save_on_each_node`: False
519
+ - `save_only_model`: False
520
+ - `restore_callback_states_from_checkpoint`: False
521
+ - `no_cuda`: False
522
+ - `use_cpu`: False
523
+ - `use_mps_device`: False
524
+ - `seed`: 42
525
+ - `data_seed`: None
526
+ - `jit_mode_eval`: False
527
+ - `use_ipex`: False
528
+ - `bf16`: False
529
+ - `fp16`: True
530
+ - `fp16_opt_level`: O1
531
+ - `half_precision_backend`: auto
532
+ - `bf16_full_eval`: False
533
+ - `fp16_full_eval`: False
534
+ - `tf32`: None
535
+ - `local_rank`: 0
536
+ - `ddp_backend`: None
537
+ - `tpu_num_cores`: None
538
+ - `tpu_metrics_debug`: False
539
+ - `debug`: []
540
+ - `dataloader_drop_last`: False
541
+ - `dataloader_num_workers`: 0
542
+ - `dataloader_prefetch_factor`: None
543
+ - `past_index`: -1
544
+ - `disable_tqdm`: False
545
+ - `remove_unused_columns`: True
546
+ - `label_names`: None
547
+ - `load_best_model_at_end`: False
548
+ - `ignore_data_skip`: False
549
+ - `fsdp`: []
550
+ - `fsdp_min_num_params`: 0
551
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
552
+ - `fsdp_transformer_layer_cls_to_wrap`: None
553
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
554
+ - `deepspeed`: None
555
+ - `label_smoothing_factor`: 0.0
556
+ - `optim`: adamw_torch
557
+ - `optim_args`: None
558
+ - `adafactor`: False
559
+ - `group_by_length`: False
560
+ - `length_column_name`: length
561
+ - `ddp_find_unused_parameters`: None
562
+ - `ddp_bucket_cap_mb`: None
563
+ - `ddp_broadcast_buffers`: False
564
+ - `dataloader_pin_memory`: True
565
+ - `dataloader_persistent_workers`: False
566
+ - `skip_memory_metrics`: True
567
+ - `use_legacy_prediction_loop`: False
568
+ - `push_to_hub`: False
569
+ - `resume_from_checkpoint`: None
570
+ - `hub_model_id`: bobox/DeBERTaV3-small-ST-AdaptiveLayers-ep2-tmp
571
+ - `hub_strategy`: checkpoint
572
+ - `hub_private_repo`: True
573
+ - `hub_always_push`: False
574
+ - `gradient_checkpointing`: False
575
+ - `gradient_checkpointing_kwargs`: None
576
+ - `include_inputs_for_metrics`: False
577
+ - `eval_do_concat_batches`: True
578
+ - `fp16_backend`: auto
579
+ - `push_to_hub_model_id`: None
580
+ - `push_to_hub_organization`: None
581
+ - `mp_parameters`:
582
+ - `auto_find_batch_size`: False
583
+ - `full_determinism`: False
584
+ - `torchdynamo`: None
585
+ - `ray_scope`: last
586
+ - `ddp_timeout`: 1800
587
+ - `torch_compile`: False
588
+ - `torch_compile_backend`: None
589
+ - `torch_compile_mode`: None
590
+ - `dispatch_batches`: None
591
+ - `split_batches`: None
592
+ - `include_tokens_per_second`: False
593
+ - `include_num_input_tokens_seen`: False
594
+ - `neftune_noise_alpha`: None
595
+ - `optim_target_modules`: None
596
+ - `batch_eval_metrics`: False
597
+ - `batch_sampler`: no_duplicates
598
+ - `multi_dataset_batch_sampler`: proportional
599
+
600
+ </details>
601
+
602
  ### Training Logs
603
+ | Epoch | Step | Training Loss | loss | max_ap | spearman_cosine |
604
+ |:-----:|:----:|:-------------:|:------:|:------:|:---------------:|
605
+ | 0.1 | 160 | 4.6003 | 4.8299 | 0.6017 | - |
606
+ | 0.2 | 320 | 4.0659 | 4.3436 | 0.6168 | - |
607
+ | 0.3 | 480 | 3.4886 | 4.0840 | 0.6339 | - |
608
+ | 0.4 | 640 | 3.0592 | 3.6422 | 0.6611 | - |
609
+ | 0.5 | 800 | 2.5728 | 3.1927 | 0.6773 | - |
610
+ | 0.6 | 960 | 2.184 | 2.8322 | 0.6893 | - |
611
+ | 0.7 | 1120 | 1.8744 | 2.4892 | 0.6954 | - |
612
+ | 0.8 | 1280 | 1.757 | 2.4453 | 0.7002 | - |
613
+ | 0.9 | 1440 | 1.5872 | 2.2565 | 0.7010 | - |
614
+ | 1.0 | 1600 | 1.446 | 2.1391 | 0.7046 | - |
615
+ | 1.1 | 1760 | 1.3892 | 2.1236 | 0.7058 | - |
616
+ | 1.2 | 1920 | 1.2567 | 1.9738 | 0.7053 | - |
617
+ | 1.3 | 2080 | 1.2233 | 1.8925 | 0.7063 | - |
618
+ | 1.4 | 2240 | 1.1954 | 1.8392 | 0.7075 | - |
619
+ | 1.5 | 2400 | 1.1395 | 1.9081 | 0.7065 | - |
620
+ | 1.6 | 2560 | 1.1211 | 1.8080 | 0.7074 | - |
621
+ | 1.7 | 2720 | 1.0825 | 1.8408 | 0.7073 | - |
622
+ | 1.8 | 2880 | 1.1358 | 1.7363 | 0.7073 | - |
623
+ | 1.9 | 3040 | 1.0628 | 1.8936 | 0.7072 | - |
624
+ | 2.0 | 3200 | 1.1412 | 1.7846 | 0.7072 | - |
625
+ | None | 0 | - | 3.0121 | 0.7072 | 0.7345 |
626
 
627
 
628
  ### Framework Versions
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e75f9f0d0ccf1ea68d57e5e49eadbe854516a7a239c28fe45742d13c727c0aae
3
  size 565251810
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:302073fb610aae136ce3650813dfe4e09b6216dbe2b7ded3d56cad2822c48514
3
  size 565251810