Pablogps commited on
Commit
c45bad9
1 Parent(s): a7e4332

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -0
README.md CHANGED
@@ -120,6 +120,8 @@ for split in ("random", "stepwise", "gaussian"):
120
 
121
  We then used the same setup as Liu et al. (2019) but trained only for half the steps (250k) on a sequence length of 128. Then, we continued training the most promising model for 25k more on sequence length 512.
122
 
 
 
123
  ## Results
124
 
125
  Our first test, tagged `beta` in this repository, refers to an initial experiment using `stepwise` on 128 sequence lengths but a small `factor` to oversample everything. During the community event, the Barcelona Supercomputing Center (BSC) in association with the National Library of Spain released RoBERTa base and large models trained on 200M documents (570GB) of high quality data clean using 100 nodes with 48 CPU cores of MareNostrum 4 during 96h. At the end of the process they were left with 2TB of clean data at the document level that were further cleaned up to the final 570GB. In all our experiments and procedures, we had access to 3xTPUv3-8 for 10 days to do cleaning, sampling, taining, and evaluation. The BSC team evaluated our early release of the model `beta` and the results can be seen in Table 1.
@@ -143,6 +145,32 @@ Our final models were trained on a different number of steps and sequence length
143
  <caption>Table 1. Evaluation made by the Barcelona Supercomputing Center of their models and BERTIN (beta).</caption>
144
  </figure>
145
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
146
  # Conclusions
147
 
148
  With roughly 10 days worth of access to 3xTPUv3-8, we have achieved remarkable results surpassing previous state of the art in a few tasks, and even improving document classification on models trained in massive supercomputers with very large—private—and highly curated datasets.
 
120
 
121
  We then used the same setup as Liu et al. (2019) but trained only for half the steps (250k) on a sequence length of 128. Then, we continued training the most promising model for 25k more on sequence length 512.
122
 
123
+ **MENTION TWO WAYS TO CONTINUE TRAINING ON 512 AND SHOW DIFFERENCE IN PERFORMANCE, DO WE HAVE A GRAPH FOR THIS?**
124
+
125
  ## Results
126
 
127
  Our first test, tagged `beta` in this repository, refers to an initial experiment using `stepwise` on 128 sequence lengths but a small `factor` to oversample everything. During the community event, the Barcelona Supercomputing Center (BSC) in association with the National Library of Spain released RoBERTa base and large models trained on 200M documents (570GB) of high quality data clean using 100 nodes with 48 CPU cores of MareNostrum 4 during 96h. At the end of the process they were left with 2TB of clean data at the document level that were further cleaned up to the final 570GB. In all our experiments and procedures, we had access to 3xTPUv3-8 for 10 days to do cleaning, sampling, taining, and evaluation. The BSC team evaluated our early release of the model `beta` and the results can be seen in Table 1.
 
145
  <caption>Table 1. Evaluation made by the Barcelona Supercomputing Center of their models and BERTIN (beta).</caption>
146
  </figure>
147
 
148
+ We are currently in the process of applying our language models to downstream tasks.
149
+
150
+ **SQUAD-es**
151
+ Using sequence length 128 we have achieved exact match 50.96 and F1 68.74.
152
+
153
+ **POS**
154
+
155
+ <figure>
156
+
157
+ | Model | Metric |
158
+ |----------------------------------------------------|----------|
159
+ | bert-base-multilingual-cased | 0.9629 |
160
+ | dccuchile/bert-base-spanish-wwm-cased | 0.9642 |
161
+ | BSC-TeMU/roberta-base-bne | 0.9659 |
162
+ | flax-community/bertin-roberta-large-spanish | 0.9646 |
163
+ | bertin-project/bertin-roberta-base-spanish | 0.9638 |
164
+ | bertin-project/bertin-base-random | 0.9656 |
165
+ | bertin-project/bertin-base-stepwise | 0.9656 |
166
+ | bertin-project/bertin-base-gaussian | **0.9662** |
167
+ | bertin-project/bertin-base-random-exp-512seqlen | 0.9660 |
168
+ | bertin-project/bertin-base-gaussian-exp-512seqlen | **0.9662** |
169
+
170
+
171
+ <caption>Table 2. Results for POS **add details like number of epochs etc**.</caption>
172
+ </figure>
173
+
174
  # Conclusions
175
 
176
  With roughly 10 days worth of access to 3xTPUv3-8, we have achieved remarkable results surpassing previous state of the art in a few tasks, and even improving document classification on models trained in massive supercomputers with very large—private—and highly curated datasets.