sileod commited on
Commit
9ecb16f
1 Parent(s): ae80b39

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -28
README.md CHANGED
@@ -290,17 +290,41 @@ tags:
290
 
291
  # Model Card for DeBERTa-v3-small-tasksource-nli
292
 
293
- This is [DeBERTa-v3-small](https://hf.co/microsoft/deberta-v3-small) fine-tuned with multi-task learning on 600+ tasks of the [tasksource collection](https://github.com/sileod/tasksource/).
294
- This checkpoint has strong zero-shot validation performance on many tasks, and can be used for:
 
 
 
 
 
295
  - Zero-shot entailment-based classification for arbitrary labels [ZS].
296
  - Natural language inference [NLI]
297
  - Hundreds of previous tasks with tasksource-adapters [TA].
298
  - Further fine-tuning on a new task or tasksource task (classification, token classification or multiple-choice) [FT].
299
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
300
  # [ZS] Zero-shot classification pipeline
301
  ```python
302
  from transformers import pipeline
303
- classifier = pipeline("zero-shot-classification",model="sileod/deberta-v3-small-tasksource-nli")
304
 
305
  text = "one day I will see the world"
306
  candidate_labels = ['travel', 'cooking', 'dancing']
@@ -312,42 +336,26 @@ NLI training data of this model includes [label-nli](https://huggingface.co/data
312
 
313
  ```python
314
  from transformers import pipeline
315
- pipe = pipeline("text-classification",model="sileod/deberta-v3-small-tasksource-nli")
316
  pipe([dict(text='there is a cat',
317
  text_pair='there is a black cat')]) #list of (premise,hypothesis)
318
  # [{'label': 'neutral', 'score': 0.9952911138534546}]
319
  ```
320
 
321
- # [TA] Tasksource-adapters: 1 line access to hundreds of tasks
322
-
323
- ```python
324
- # !pip install tasknet
325
- import tasknet as tn
326
- pipe = tn.load_pipeline('sileod/deberta-v3-small-tasksource-nli','glue/sst2') # works for 500+ tasksource tasks
327
- pipe(['That movie was great !', 'Awful movie.'])
328
- # [{'label': 'positive', 'score': 0.9956}, {'label': 'negative', 'score': 0.9967}]
329
- ```
330
- The list of tasks is available in model config.json.
331
- This is more efficient than ZS since it requires only one forward pass per example, but it is less flexible.
332
-
333
-
334
  # [FT] Tasknet: 3 lines fine-tuning
335
 
336
  ```python
337
  # !pip install tasknet
338
  import tasknet as tn
339
- hparams=dict(model_name='sileod/deberta-v3-small-tasksource-nli', learning_rate=2e-5)
340
  model, trainer = tn.Model_Trainer([tn.AutoTask("glue/rte")], hparams)
341
  trainer.train()
342
  ```
343
 
344
- ## Evaluation
345
- This the base equivalent of this model was ranked 1st among all models with the microsoft/deberta-v3-base architecture according to the IBM model recycling evaluation.
346
- https://ibm.github.io/model-recycling/
347
 
348
  ### Software and training details
349
 
350
- The model was trained on 600 tasks for 200k steps with a batch size of 384 and a peak learning rate of 2e-5. Training took 12 days on Nvidia A30 24GB gpu.
351
  This is the shared model with the MNLI classifier on top. Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
352
 
353
 
@@ -359,12 +367,16 @@ Training code: https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olU
359
 
360
  More details on this [article:](https://arxiv.org/abs/2301.05948)
361
  ```
362
- @article{sileo2023tasksource,
363
- title={tasksource: Structured Dataset Preprocessing Annotations for Frictionless Extreme Multi-Task Learning and Evaluation},
364
- author={Sileo, Damien},
365
- url= {https://arxiv.org/abs/2301.05948},
366
- journal={arXiv preprint arXiv:2301.05948},
367
- year={2023}
 
 
 
 
368
  }
369
  ```
370
 
 
290
 
291
  # Model Card for DeBERTa-v3-small-tasksource-nli
292
 
293
+
294
+ [DeBERTa-v3-small](https://hf.co/microsoft/deberta-v3-small) with context length of 1680 fine-tuned on tasksource for 250k steps. I oversampled long NLI tasks (ConTRoL, doc-nli).
295
+ Training data include helpsteer v1/v2, logical reasoning tasks (FOLIO, FOL-nli, LogicNLI...), OASST, hh/rlhf, linguistics oriented NLI tasks, tasksource-dpo, fact verification tasks.
296
+
297
+ This model is suitable for long context NLI or as a backbone for reward models or classifiers fine-tuning.
298
+
299
+ This checkpoint has strong zero-shot validation performance on many tasks (e.g. 70% on WNLI), and can be used for:
300
  - Zero-shot entailment-based classification for arbitrary labels [ZS].
301
  - Natural language inference [NLI]
302
  - Hundreds of previous tasks with tasksource-adapters [TA].
303
  - Further fine-tuning on a new task or tasksource task (classification, token classification or multiple-choice) [FT].
304
 
305
+
306
+ | test_name | accuracy |
307
+ |:----------------------------|----------------:|
308
+ | anli/a1 | 57.2 |
309
+ | anli/a2 | 46.1 |
310
+ | anli/a3 | 47.2 |
311
+ | nli_fever | 71.7 |
312
+ | FOLIO | 47.1 |
313
+ | ConTRoL-nli | 52.2 |
314
+ | cladder | 52.8 |
315
+ | zero-shot-label-nli | 70.0 |
316
+ | chatbot_arena_conversations | 67.8 |
317
+ | oasst2_pairwise_rlhf_reward | 75.6 |
318
+ | doc-nli | 75.0 |
319
+
320
+
321
+ Zero-shot GPT-4 scores 61% on FOLIO (logical reasoning), 62% on cladder (probabilistic reasoning) and 56.4% on ConTRoL (long context NLI).
322
+
323
+
324
  # [ZS] Zero-shot classification pipeline
325
  ```python
326
  from transformers import pipeline
327
+ classifier = pipeline("zero-shot-classification",model="tasksource/deberta-small-long-nli")
328
 
329
  text = "one day I will see the world"
330
  candidate_labels = ['travel', 'cooking', 'dancing']
 
336
 
337
  ```python
338
  from transformers import pipeline
339
+ pipe = pipeline("text-classification",model="tasksource/deberta-small-long-nli")
340
  pipe([dict(text='there is a cat',
341
  text_pair='there is a black cat')]) #list of (premise,hypothesis)
342
  # [{'label': 'neutral', 'score': 0.9952911138534546}]
343
  ```
344
 
 
 
 
 
 
 
 
 
 
 
 
 
 
345
  # [FT] Tasknet: 3 lines fine-tuning
346
 
347
  ```python
348
  # !pip install tasknet
349
  import tasknet as tn
350
+ hparams=dict(model_name='tasksource/deberta-small-long-nli', learning_rate=2e-5)
351
  model, trainer = tn.Model_Trainer([tn.AutoTask("glue/rte")], hparams)
352
  trainer.train()
353
  ```
354
 
 
 
 
355
 
356
  ### Software and training details
357
 
358
+ The model was trained on 600 tasks for 250k steps with a batch size of 384 and a peak learning rate of 2e-5. Training took 14 days on Nvidia A30 24GB gpu.
359
  This is the shared model with the MNLI classifier on top. Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
360
 
361
 
 
367
 
368
  More details on this [article:](https://arxiv.org/abs/2301.05948)
369
  ```
370
+ @inproceedings{sileo-2024-tasksource,
371
+ title = "tasksource: A Large Collection of {NLP} tasks with a Structured Dataset Preprocessing Framework",
372
+ author = "Sileo, Damien",
373
+ booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
374
+ month = may,
375
+ year = "2024",
376
+ address = "Torino, Italia",
377
+ publisher = "ELRA and ICCL",
378
+ url = "https://aclanthology.org/2024.lrec-main.1361",
379
+ pages = "15655--15684",
380
  }
381
  ```
382