Hyperparameter | Value |
---|---|
Steps | 150k |
Max length | 256 |
LR | 1e-4 |
LR schedule | constant |
Optimizer | AdamW |
beta_1, beta_2 | 0.9, 0.95 |
Final eval loss | 2.245 |
Final eval perplexity | 9.44 |
Hyperparameter | Value |
---|---|
Steps | 150k |
Max length | 256 |
LR | 1e-4 |
LR schedule | constant |
Optimizer | AdamW |
beta_1, beta_2 | 0.9, 0.95 |
Final eval loss | 2.245 |
Final eval perplexity | 9.44 |