llama_8b_lima_11_kto
This model is a fine-tuned version of OpenLeecher/llama_8b_lima_11 on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.1369
- Rewards/chosen: 9.4796
- Logps/chosen: -655.7220
- Rewards/rejected: -11.1496
- Logps/rejected: -294.5653
- Rewards/margins: 20.6292
- Kl: 4.1962
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-06
- train_batch_size: 1
- eval_batch_size: 1
- seed: 66
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 20
- total_train_batch_size: 40
- total_eval_batch_size: 2
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: polynomial
- lr_scheduler_warmup_steps: 30
- num_epochs: 1.0
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Logps/chosen | Rewards/rejected | Logps/rejected | Rewards/margins | Kl |
---|---|---|---|---|---|---|---|---|---|
0.3224 | 0.2973 | 90 | 0.2095 | 5.6903 | -664.1426 | -9.8058 | -291.5791 | 15.4961 | 3.0351 |
0.2716 | 0.5945 | 180 | 0.1669 | 8.8699 | -657.0769 | -11.5150 | -295.3773 | 20.3849 | 4.1776 |
0.2089 | 0.8918 | 270 | 0.1369 | 9.4796 | -655.7220 | -11.1496 | -294.5653 | 20.6292 | 4.1962 |
Framework versions
- Transformers 4.45.0
- Pytorch 2.4.1+cu124
- Datasets 2.21.0
- Tokenizers 0.20.1
- Downloads last month
- 9
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.