tgf-xlm-roberta-base-pt-br
This model is a fine-tuned version of xlm-roberta-base on the BrWac dataset.
Model description
This is a fine-tuned version of the Brazilian Portuguese language. It was trained using the BrWac dataset and followed the principles from Roberta's paper. The key strategies are:
Full-Sentences: Quoted from the paper: "Each input is packed with full sentences sampled contiguously from one or more documents, such that the total length is at most 512 tokens. Inputs may cross document boundaries. When we reach the end of one document, we begin sampling sentences from the next document and add an extra separator token between documents".
Tunned hyperparameters: adam_beta1=0.9, adam_beta2=0.98, adam_epsilon=1e-6 (as paper suggests)
Availability
The source code is available here
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-4
- train_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 512
- optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-06
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 1000
- num_epochs: 2
- mixed_precision_training: Native AMP
Framework versions
- Transformers 4.23.1
- Pytorch 1.11.0a0+b6df043
- Datasets 2.6.1
- Tokenizers 0.13.1
Environment
4xA100.88V NVIDIA
Special thanks to DataCrunch.io with their amazing, and affordable GPUs.
- Downloads last month
- 77
Model tree for thegoodfellas/tgf-xlm-roberta-base-pt-br
Base model
FacebookAI/xlm-roberta-base