Update README.md
Browse files
README.md
CHANGED
@@ -36,6 +36,7 @@ widget:
|
|
36 |
- [Licensing information](#licensing-information)
|
37 |
- [Funding](#funding)
|
38 |
- [Disclaimer](#disclaimer)
|
|
|
39 |
|
40 |
## Model description
|
41 |
The longformer-base-4096-bne-es is the [Longformer](https://huggingface.co/allenai/longformer-base-4096) version of the [roberta-base-bne](https://https://huggingface.co/PlanTL-GOB-ES/roberta-base-bne) masked language model for the Spanish language. The model started from the **roberta-base-bne** checkpoint and was pretrained for MLM on long documents from our biomedical and clinical corpora.
|
@@ -84,6 +85,19 @@ For this Longformer, we have used a small random partition of 7,2GB containing d
|
|
84 |
### Tokenization and pre-training
|
85 |
The training corpus has been tokenized using a byte version of Byte-Pair Encoding (BPE) used in the original [RoBERTA](https://arxiv.org/abs/1907.11692) model with a vocabulary size of 50,262 tokens. The RoBERTa-base-bne pre-training consists of a masked language model training that follows the approach employed for the RoBERTa base. The training lasted a total of 40 hours with 8 computing nodes each one with 2 AMD MI50 GPUs of 32GB VRAM.
|
86 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
87 |
|
88 |
## Additional information
|
89 |
|
|
|
36 |
- [Licensing information](#licensing-information)
|
37 |
- [Funding](#funding)
|
38 |
- [Disclaimer](#disclaimer)
|
39 |
+
</details>
|
40 |
|
41 |
## Model description
|
42 |
The longformer-base-4096-bne-es is the [Longformer](https://huggingface.co/allenai/longformer-base-4096) version of the [roberta-base-bne](https://https://huggingface.co/PlanTL-GOB-ES/roberta-base-bne) masked language model for the Spanish language. The model started from the **roberta-base-bne** checkpoint and was pretrained for MLM on long documents from our biomedical and clinical corpora.
|
|
|
85 |
### Tokenization and pre-training
|
86 |
The training corpus has been tokenized using a byte version of Byte-Pair Encoding (BPE) used in the original [RoBERTA](https://arxiv.org/abs/1907.11692) model with a vocabulary size of 50,262 tokens. The RoBERTa-base-bne pre-training consists of a masked language model training that follows the approach employed for the RoBERTa base. The training lasted a total of 40 hours with 8 computing nodes each one with 2 AMD MI50 GPUs of 32GB VRAM.
|
87 |
|
88 |
+
## Evaluation
|
89 |
+
|
90 |
+
When fine-tuned on downstream tasks, this model achieved the following performance:
|
91 |
+
| Dataset | Metric | [**Longformer-base**](https://huggingface.co/PlanTL-GOB-ES/longformer-base-4096-bne-es) |
|
92 |
+
|--------------|----------|------------|
|
93 |
+
| MLDoc | F1 | 0.9608 |
|
94 |
+
| CoNLL-NERC | F1 | 0.8757 |
|
95 |
+
| CAPITEL-NERC | F1 | 0.8985 |
|
96 |
+
| PAWS-X | F1 | 0.8878 |
|
97 |
+
| UD-POS | F1 | 0.9903 |
|
98 |
+
| CAPITEL-POS | F1 | 0.9853 |
|
99 |
+
| SQAC | F1 | 0.8026 |
|
100 |
+
| STS | Combined | 0.8338 |
|
101 |
|
102 |
## Additional information
|
103 |
|