roberta-base distilled into tinyroberta
Overview
Language model: roberta-base
Language: English
Training data: The PILE
Infrastructure: 4x Tesla v100
Hyperparameters
batch_size = 96
n_epochs = 4
max_seq_len = 384
learning_rate = 1e-4
lr_schedule = LinearWarmup
warmup_proportion = 0.2
teacher = "deepset/roberta-base"
Distillation
This model was distilled using the TinyBERT approach described in this paper and implemented in haystack. We have performed intermediate layer distillation with roberta-base as the teacher which resulted in deepset/tinyroberta-6l-768d. This model has not been distilled for any specific task. If you are interested in using distillation to improve its performance on a downstream task, you can take advantage of haystack's new distillation functionality. You can also check out deepset/tinyroberta-squad2 for a model that is already distilled on an extractive QA downstream task.
About us
deepset is the company behind the production-ready open-source AI framework Haystack.
Some of our other work:
- Distilled roberta-base-squad2 (aka "tinyroberta-squad2")
- German BERT, GermanQuAD and GermanDPR, German embedding model
- deepset Cloud, deepset Studio
Get in touch and join the Haystack community
For more info on Haystack, visit our GitHub repo and Documentation.
We also have a Discord community open to everyone!
Twitter | LinkedIn | Discord | GitHub Discussions | Website | YouTube
By the way: we're hiring!
- Downloads last month
- 101