zephyr-7b-dpo-qlora
This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:
- Loss: 0.4757
- Rewards/chosen: -3.6825
- Rewards/rejected: -4.9601
- Rewards/accuracies: 0.7540
- Rewards/margins: 1.2776
- Logps/rejected: -740.5720
- Logps/chosen: -632.8636
- Logits/rejected: -1.1984
- Logits/chosen: -1.3150
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6809 | 0.0262 | 100 | 0.6807 | 0.0519 | 0.0257 | 0.6580 | 0.0262 | -241.9869 | -259.4206 | -2.0558 | -2.1488 |
0.6438 | 0.0523 | 200 | 0.6351 | -0.1905 | -0.3429 | 0.6800 | 0.1524 | -278.8497 | -283.6621 | -2.0145 | -2.1026 |
0.5829 | 0.0785 | 300 | 0.6072 | -0.4462 | -0.7133 | 0.6780 | 0.2671 | -315.8949 | -309.2386 | -2.0508 | -2.1342 |
0.6201 | 0.1047 | 400 | 0.5892 | -1.4907 | -1.9543 | 0.6845 | 0.4636 | -439.9887 | -413.6829 | -1.6374 | -1.7202 |
0.5798 | 0.1309 | 500 | 0.5667 | -1.3123 | -2.0041 | 0.7020 | 0.6918 | -444.9709 | -395.8432 | -1.2046 | -1.3376 |
0.5395 | 0.1570 | 600 | 0.5524 | -1.2157 | -1.8227 | 0.7030 | 0.6069 | -426.8258 | -386.1879 | -1.1445 | -1.2781 |
0.5278 | 0.1832 | 700 | 0.5336 | -3.1382 | -4.0509 | 0.7265 | 0.9127 | -649.6522 | -578.4380 | -0.6999 | -0.8394 |
0.4969 | 0.2094 | 800 | 0.5242 | -1.8373 | -2.6256 | 0.7245 | 0.7883 | -507.1189 | -448.3450 | -1.1250 | -1.2524 |
0.4794 | 0.2355 | 900 | 0.5246 | -2.0059 | -2.8266 | 0.7255 | 0.8207 | -527.2198 | -465.2022 | -0.8588 | -0.9944 |
0.5261 | 0.2617 | 1000 | 0.5109 | -2.8850 | -3.8029 | 0.7395 | 0.9179 | -624.8492 | -553.1188 | -0.6716 | -0.8193 |
0.6001 | 0.2879 | 1100 | 0.5050 | -2.4905 | -3.3317 | 0.7375 | 0.8412 | -577.7299 | -513.6636 | -0.6634 | -0.8245 |
0.5911 | 0.3141 | 1200 | 0.4983 | -2.2735 | -3.2228 | 0.7385 | 0.9493 | -566.8434 | -491.9688 | -0.9871 | -1.1192 |
0.5345 | 0.3402 | 1300 | 0.5001 | -3.5214 | -4.7330 | 0.7450 | 1.2115 | -717.8565 | -616.7566 | -0.8540 | -0.9911 |
0.5291 | 0.3664 | 1400 | 0.4987 | -2.7865 | -3.7479 | 0.7475 | 0.9614 | -619.3545 | -543.2670 | -1.0816 | -1.2062 |
0.4495 | 0.3926 | 1500 | 0.5144 | -2.4600 | -3.6484 | 0.7330 | 1.1884 | -609.4039 | -510.6184 | -1.1934 | -1.3216 |
0.5586 | 0.4187 | 1600 | 0.4937 | -2.4987 | -3.5027 | 0.7430 | 1.0040 | -594.8329 | -514.4847 | -1.1838 | -1.3066 |
0.4895 | 0.4449 | 1700 | 0.4948 | -3.6212 | -4.8051 | 0.7295 | 1.1839 | -725.0694 | -626.7305 | -0.9648 | -1.1064 |
0.485 | 0.4711 | 1800 | 0.4885 | -4.0215 | -5.2285 | 0.7525 | 1.2070 | -767.4141 | -666.7680 | -1.0276 | -1.1613 |
0.4387 | 0.4973 | 1900 | 0.4897 | -3.8136 | -5.0345 | 0.7460 | 1.2208 | -748.0074 | -645.9786 | -1.1075 | -1.2419 |
0.4613 | 0.5234 | 2000 | 0.4941 | -4.5643 | -5.6977 | 0.7410 | 1.1334 | -814.3307 | -721.0457 | -0.9859 | -1.1242 |
0.4939 | 0.5496 | 2100 | 0.4877 | -4.6441 | -5.8517 | 0.75 | 1.2077 | -829.7325 | -729.0210 | -1.1445 | -1.2699 |
0.4782 | 0.5758 | 2200 | 0.4813 | -3.2786 | -4.2916 | 0.7485 | 1.0130 | -673.7171 | -592.4716 | -1.2439 | -1.3665 |
0.4682 | 0.6019 | 2300 | 0.4885 | -4.1629 | -5.5525 | 0.7455 | 1.3897 | -799.8126 | -680.9020 | -1.0667 | -1.1952 |
0.4582 | 0.6281 | 2400 | 0.4859 | -3.7434 | -4.9841 | 0.7460 | 1.2407 | -742.9675 | -638.9534 | -1.0476 | -1.1735 |
0.4948 | 0.6543 | 2500 | 0.4817 | -3.6128 | -4.8362 | 0.7425 | 1.2234 | -728.1769 | -625.8918 | -1.0472 | -1.1781 |
0.4588 | 0.6805 | 2600 | 0.4854 | -3.5980 | -4.8557 | 0.7430 | 1.2577 | -730.1331 | -624.4171 | -1.1158 | -1.2400 |
0.5354 | 0.7066 | 2700 | 0.4857 | -4.1262 | -5.3649 | 0.7445 | 1.2387 | -781.0517 | -677.2343 | -1.0720 | -1.1950 |
0.4782 | 0.7328 | 2800 | 0.4822 | -3.8568 | -5.1115 | 0.7460 | 1.2547 | -755.7133 | -650.2979 | -1.1544 | -1.2733 |
0.5135 | 0.7590 | 2900 | 0.4807 | -3.9503 | -5.2306 | 0.7475 | 1.2804 | -767.6244 | -659.6406 | -1.1773 | -1.2961 |
0.4613 | 0.7851 | 3000 | 0.4783 | -3.6454 | -4.8177 | 0.7545 | 1.1723 | -726.3349 | -629.1588 | -1.1940 | -1.3123 |
0.4904 | 0.8113 | 3100 | 0.4787 | -3.8925 | -5.1623 | 0.7535 | 1.2698 | -760.7857 | -653.8602 | -1.1654 | -1.2847 |
0.4706 | 0.8375 | 3200 | 0.4755 | -3.4858 | -4.6973 | 0.7525 | 1.2116 | -714.2923 | -613.1915 | -1.2139 | -1.3301 |
0.519 | 0.8636 | 3300 | 0.4762 | -3.6863 | -4.9393 | 0.7525 | 1.2530 | -738.4901 | -633.2412 | -1.1986 | -1.3147 |
0.4446 | 0.8898 | 3400 | 0.4762 | -3.8220 | -5.1066 | 0.7535 | 1.2847 | -755.2252 | -646.8135 | -1.1676 | -1.2864 |
0.5378 | 0.9160 | 3500 | 0.4759 | -3.7562 | -5.0452 | 0.7530 | 1.2890 | -749.0795 | -640.2327 | -1.1933 | -1.3106 |
0.4506 | 0.9422 | 3600 | 0.4759 | -3.7087 | -4.9945 | 0.7535 | 1.2857 | -744.0071 | -635.4867 | -1.1944 | -1.3115 |
0.4732 | 0.9683 | 3700 | 0.4758 | -3.6903 | -4.9695 | 0.7540 | 1.2792 | -741.5083 | -633.6405 | -1.1938 | -1.3109 |
0.5041 | 0.9945 | 3800 | 0.4758 | -3.6841 | -4.9619 | 0.7545 | 1.2778 | -740.7547 | -633.0256 | -1.1922 | -1.3094 |
Framework versions
- PEFT 0.12.0
- Transformers 4.44.0
- Pytorch 2.1.2
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 2
Model tree for L1nkee/zephyr-7b-dpo-qlora
Base model
mistralai/Mistral-7B-v0.1