mega-small-2048 on simple wikipedia
MEGA for masked LM 'small' (12 layers, 512 hidden size, 2048 ctx in chunks of 1024) on the pszemraj/simple_wikipedia_LM
dataset.
It achieves the following results on the evaluation set:
- Loss: 3.4773
- Accuracy: 0.4591
Model description
See config for architecture details. While not a ready 'pretrained' model, this was trained from scratch.
This model uses the tokenizer from roberta-base
.
Intended uses & limitations
More information needed
Training and evaluation data
Note: this was trained in
bf16
. the official recommendation is fp32 - still exploring this.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 1
- eval_batch_size: 1
- seed: 3208
- gradient_accumulation_steps: 64
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-07
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 3.0
Additionally:
- mask rate of 50% (See paper for details)
- whole-word masking
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
7.2691 | 0.11 | 50 | 7.1000 | 0.0677 |
7.1597 | 0.22 | 100 | 6.8388 | 0.0794 |
6.5476 | 0.33 | 150 | 6.4004 | 0.1359 |
6.5335 | 0.44 | 200 | 6.1776 | 0.1708 |
5.7228 | 0.55 | 250 | 5.6106 | 0.2437 |
5.4574 | 0.66 | 300 | 5.1391 | 0.2884 |
5.2275 | 0.78 | 350 | 4.8626 | 0.3174 |
4.9589 | 0.89 | 400 | 4.6454 | 0.3374 |
4.6406 | 1.0 | 450 | 4.4498 | 0.3578 |
4.8251 | 1.11 | 500 | 4.3055 | 0.3706 |
4.4728 | 1.22 | 550 | 4.1877 | 0.3821 |
4.3975 | 1.33 | 600 | 4.0709 | 0.3955 |
4.4245 | 1.44 | 650 | 3.9909 | 0.4045 |
4.2613 | 1.55 | 700 | 3.8976 | 0.4128 |
4.1806 | 1.66 | 750 | 3.8515 | 0.4177 |
3.9469 | 1.77 | 800 | 3.7883 | 0.4227 |
3.9563 | 1.88 | 850 | 3.7314 | 0.4306 |
4.0063 | 1.99 | 900 | 3.6975 | 0.4336 |
3.9274 | 2.1 | 950 | 3.6561 | 0.4378 |
3.788 | 2.21 | 1000 | 3.6280 | 0.4410 |
3.8711 | 2.33 | 1050 | 3.5736 | 0.4467 |
3.8623 | 2.44 | 1100 | 3.5535 | 0.4496 |
3.8575 | 2.55 | 1150 | 3.5407 | 0.4521 |
4.0079 | 2.66 | 1200 | 3.5172 | 0.4543 |
3.8265 | 2.77 | 1250 | 3.4786 | 0.4591 |
3.9513 | 2.88 | 1300 | 3.4741 | 0.4578 |
3.554 | 2.99 | 1350 | 3.4773 | 0.4591 |
Framework versions
- Transformers 4.33.1
- Pytorch 2.2.0.dev20230907+cu118
- Datasets 2.13.1
- Tokenizers 0.13.3
- Downloads last month
- 5
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for pszemraj/mega-small-2048-C1024-tk_id-simplewiki-MR50
Base model
pszemraj/random-mega-small-2048