zephyr-7b-dpo-full-prometheus-low-curriculum
This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.4969
- Rewards/chosen: -1.0990
- Rewards/rejected: -1.9910
- Rewards/accuracies: 0.7328
- Rewards/margins: 0.8920
- Logps/rejected: -447.3745
- Logps/chosen: -369.8574
- Logits/rejected: 0.8388
- Logits/chosen: 0.0187
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 55
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 2
- total_train_batch_size: 128
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6558 | 0.1143 | 50 | 0.6420 | -0.0440 | -0.1562 | 0.6724 | 0.1122 | -263.8943 | -264.3628 | -2.5310 | -2.5775 |
0.5602 | 0.2286 | 100 | 0.5737 | -0.9542 | -1.4308 | 0.6509 | 0.4765 | -391.3545 | -355.3837 | -1.4641 | -1.6090 |
0.533 | 0.3429 | 150 | 0.5293 | -0.6925 | -1.3308 | 0.6853 | 0.6382 | -381.3522 | -329.2125 | -0.9684 | -1.4379 |
0.5466 | 0.4571 | 200 | 0.5157 | -0.7144 | -1.4929 | 0.6983 | 0.7785 | -397.5714 | -331.4009 | -0.1901 | -0.8315 |
0.5044 | 0.5714 | 250 | 0.5078 | -0.9462 | -1.7789 | 0.6853 | 0.8327 | -426.1694 | -354.5842 | 0.4657 | -0.2137 |
0.4967 | 0.6857 | 300 | 0.5044 | -1.0772 | -1.9084 | 0.7371 | 0.8312 | -439.1140 | -367.6776 | 0.4821 | -0.2917 |
0.5055 | 0.8 | 350 | 0.4981 | -1.0311 | -1.9125 | 0.7414 | 0.8814 | -439.5230 | -363.0710 | 0.6801 | -0.1449 |
0.5131 | 0.9143 | 400 | 0.4969 | -1.0990 | -1.9910 | 0.7328 | 0.8920 | -447.3745 | -369.8574 | 0.8388 | 0.0187 |
Framework versions
- Transformers 4.44.0.dev0
- Pytorch 2.1.2
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 0
Model tree for sfulay/zephyr-7b-dpo-full-prometheus-low-curriculum
Base model
mistralai/Mistral-7B-v0.1
Finetuned
alignment-handbook/zephyr-7b-sft-full