TakoData
/

deberta-small-long-nli

Model card Files Files and versions Community

sileod commited on Aug 7

Commit

9ecb16f

•

1 Parent(s): ae80b39

Update README.md

Browse files

Files changed (1) hide show

README.md +40 -28

README.md CHANGED Viewed

@@ -290,17 +290,41 @@ tags:
 # Model Card for DeBERTa-v3-small-tasksource-nli
-This is [DeBERTa-v3-small](https://hf.co/microsoft/deberta-v3-small) fine-tuned with multi-task learning on 600+ tasks of the [tasksource collection](https://github.com/sileod/tasksource/).
-This checkpoint has strong zero-shot validation performance on many tasks, and can be used for:
 - Zero-shot entailment-based classification for arbitrary labels [ZS].
 - Natural language inference [NLI]
 - Hundreds of previous tasks with tasksource-adapters [TA].
 - Further fine-tuning on a new task or tasksource task (classification, token classification or multiple-choice) [FT].
 # [ZS] Zero-shot classification pipeline
 ```python
 from transformers import pipeline
-classifier = pipeline("zero-shot-classification",model="sileod/deberta-v3-small-tasksource-nli")
 text = "one day I will see the world"
 candidate_labels = ['travel', 'cooking', 'dancing']
@@ -312,42 +336,26 @@ NLI training data of this model includes [label-nli](https://huggingface.co/data
 ```python
 from transformers import pipeline
-pipe = pipeline("text-classification",model="sileod/deberta-v3-small-tasksource-nli")
 pipe([dict(text='there is a cat',
   text_pair='there is a black cat')]) #list of (premise,hypothesis)
 # [{'label': 'neutral', 'score': 0.9952911138534546}]
 ```
-# [TA] Tasksource-adapters: 1 line access to hundreds of tasks
-```python
-# !pip install tasknet
-import tasknet as tn
-pipe = tn.load_pipeline('sileod/deberta-v3-small-tasksource-nli','glue/sst2') # works for 500+ tasksource tasks
-pipe(['That movie was great !', 'Awful movie.'])
-# [{'label': 'positive', 'score': 0.9956}, {'label': 'negative', 'score': 0.9967}]
-```
-The list of tasks is available in model config.json.
-This is more efficient than ZS since it requires only one forward pass per example, but it is less flexible.
 # [FT] Tasknet: 3 lines fine-tuning
 ```python
 # !pip install tasknet
 import tasknet as tn
-hparams=dict(model_name='sileod/deberta-v3-small-tasksource-nli', learning_rate=2e-5)
 model, trainer = tn.Model_Trainer([tn.AutoTask("glue/rte")], hparams)
 trainer.train()
 ```
-## Evaluation
-This the base equivalent of this model was ranked 1st among all models with the microsoft/deberta-v3-base architecture according to the IBM model recycling evaluation.
-https://ibm.github.io/model-recycling/
 ### Software and training details
-The model was trained on 600 tasks for 200k steps with a batch size of 384 and a peak learning rate of 2e-5. Training took 12 days on Nvidia A30 24GB gpu.
 This is the shared model with the MNLI classifier on top. Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
@@ -359,12 +367,16 @@ Training code: https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olU
 More details on this [article:](https://arxiv.org/abs/2301.05948)
 ```
-@article{sileo2023tasksource,
-  title={tasksource: Structured Dataset Preprocessing Annotations for Frictionless Extreme Multi-Task Learning and Evaluation},
-  author={Sileo, Damien},
-  url= {https://arxiv.org/abs/2301.05948},
-  journal={arXiv preprint arXiv:2301.05948},
-  year={2023}
 }
 ```

 # Model Card for DeBERTa-v3-small-tasksource-nli
+[DeBERTa-v3-small](https://hf.co/microsoft/deberta-v3-small) with context length of 1680 fine-tuned on tasksource for 250k steps. I oversampled long NLI tasks (ConTRoL, doc-nli).
+Training data include helpsteer v1/v2, logical reasoning tasks (FOLIO, FOL-nli, LogicNLI...), OASST, hh/rlhf, linguistics oriented NLI tasks, tasksource-dpo, fact verification tasks.
+This model is suitable for long context NLI or as a backbone for reward models or classifiers fine-tuning.
+This checkpoint has strong zero-shot validation performance on many tasks (e.g. 70% on WNLI), and can be used for:
 - Zero-shot entailment-based classification for arbitrary labels [ZS].
 - Natural language inference [NLI]
 - Hundreds of previous tasks with tasksource-adapters [TA].
 - Further fine-tuning on a new task or tasksource task (classification, token classification or multiple-choice) [FT].
+| test_name                   |        accuracy |
+|:----------------------------|----------------:|
+| anli/a1                     |            57.2 |
+| anli/a2                     |            46.1 |
+| anli/a3                     |            47.2 |
+| nli_fever                   |            71.7 |
+| FOLIO                       |            47.1 |
+| ConTRoL-nli                 |            52.2 |
+| cladder                     |            52.8 |
+| zero-shot-label-nli         |            70.0 |
+| chatbot_arena_conversations |            67.8 |
+| oasst2_pairwise_rlhf_reward |            75.6 |
+| doc-nli                     |            75.0 |
+Zero-shot GPT-4 scores 61% on FOLIO (logical reasoning), 62% on cladder (probabilistic reasoning) and 56.4% on ConTRoL (long context NLI).
 # [ZS] Zero-shot classification pipeline
 ```python
 from transformers import pipeline
+classifier = pipeline("zero-shot-classification",model="tasksource/deberta-small-long-nli")
 text = "one day I will see the world"
 candidate_labels = ['travel', 'cooking', 'dancing']
 ```python
 from transformers import pipeline
+pipe = pipeline("text-classification",model="tasksource/deberta-small-long-nli")
 pipe([dict(text='there is a cat',
   text_pair='there is a black cat')]) #list of (premise,hypothesis)
 # [{'label': 'neutral', 'score': 0.9952911138534546}]
 ```
 # [FT] Tasknet: 3 lines fine-tuning
 ```python
 # !pip install tasknet
 import tasknet as tn
+hparams=dict(model_name='tasksource/deberta-small-long-nli', learning_rate=2e-5)
 model, trainer = tn.Model_Trainer([tn.AutoTask("glue/rte")], hparams)
 trainer.train()
 ```
 ### Software and training details
+The model was trained on 600 tasks for 250k steps with a batch size of 384 and a peak learning rate of 2e-5. Training took 14 days on Nvidia A30 24GB gpu.
 This is the shared model with the MNLI classifier on top. Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
 More details on this [article:](https://arxiv.org/abs/2301.05948)
 ```
+@inproceedings{sileo-2024-tasksource,
+    title = "tasksource: A Large Collection of {NLP} tasks with a Structured Dataset Preprocessing Framework",
+    author = "Sileo, Damien",
+    booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
+    month = may,
+    year = "2024",
+    address = "Torino, Italia",
+    publisher = "ELRA and ICCL",
+    url = "https://aclanthology.org/2024.lrec-main.1361",
+    pages = "15655--15684",
 }
 ```