Edit model card

zephyr-7b-dpo-qlora

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4757
  • Rewards/chosen: -3.6825
  • Rewards/rejected: -4.9601
  • Rewards/accuracies: 0.7540
  • Rewards/margins: 1.2776
  • Logps/rejected: -740.5720
  • Logps/chosen: -632.8636
  • Logits/rejected: -1.1984
  • Logits/chosen: -1.3150

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6809 0.0262 100 0.6807 0.0519 0.0257 0.6580 0.0262 -241.9869 -259.4206 -2.0558 -2.1488
0.6438 0.0523 200 0.6351 -0.1905 -0.3429 0.6800 0.1524 -278.8497 -283.6621 -2.0145 -2.1026
0.5829 0.0785 300 0.6072 -0.4462 -0.7133 0.6780 0.2671 -315.8949 -309.2386 -2.0508 -2.1342
0.6201 0.1047 400 0.5892 -1.4907 -1.9543 0.6845 0.4636 -439.9887 -413.6829 -1.6374 -1.7202
0.5798 0.1309 500 0.5667 -1.3123 -2.0041 0.7020 0.6918 -444.9709 -395.8432 -1.2046 -1.3376
0.5395 0.1570 600 0.5524 -1.2157 -1.8227 0.7030 0.6069 -426.8258 -386.1879 -1.1445 -1.2781
0.5278 0.1832 700 0.5336 -3.1382 -4.0509 0.7265 0.9127 -649.6522 -578.4380 -0.6999 -0.8394
0.4969 0.2094 800 0.5242 -1.8373 -2.6256 0.7245 0.7883 -507.1189 -448.3450 -1.1250 -1.2524
0.4794 0.2355 900 0.5246 -2.0059 -2.8266 0.7255 0.8207 -527.2198 -465.2022 -0.8588 -0.9944
0.5261 0.2617 1000 0.5109 -2.8850 -3.8029 0.7395 0.9179 -624.8492 -553.1188 -0.6716 -0.8193
0.6001 0.2879 1100 0.5050 -2.4905 -3.3317 0.7375 0.8412 -577.7299 -513.6636 -0.6634 -0.8245
0.5911 0.3141 1200 0.4983 -2.2735 -3.2228 0.7385 0.9493 -566.8434 -491.9688 -0.9871 -1.1192
0.5345 0.3402 1300 0.5001 -3.5214 -4.7330 0.7450 1.2115 -717.8565 -616.7566 -0.8540 -0.9911
0.5291 0.3664 1400 0.4987 -2.7865 -3.7479 0.7475 0.9614 -619.3545 -543.2670 -1.0816 -1.2062
0.4495 0.3926 1500 0.5144 -2.4600 -3.6484 0.7330 1.1884 -609.4039 -510.6184 -1.1934 -1.3216
0.5586 0.4187 1600 0.4937 -2.4987 -3.5027 0.7430 1.0040 -594.8329 -514.4847 -1.1838 -1.3066
0.4895 0.4449 1700 0.4948 -3.6212 -4.8051 0.7295 1.1839 -725.0694 -626.7305 -0.9648 -1.1064
0.485 0.4711 1800 0.4885 -4.0215 -5.2285 0.7525 1.2070 -767.4141 -666.7680 -1.0276 -1.1613
0.4387 0.4973 1900 0.4897 -3.8136 -5.0345 0.7460 1.2208 -748.0074 -645.9786 -1.1075 -1.2419
0.4613 0.5234 2000 0.4941 -4.5643 -5.6977 0.7410 1.1334 -814.3307 -721.0457 -0.9859 -1.1242
0.4939 0.5496 2100 0.4877 -4.6441 -5.8517 0.75 1.2077 -829.7325 -729.0210 -1.1445 -1.2699
0.4782 0.5758 2200 0.4813 -3.2786 -4.2916 0.7485 1.0130 -673.7171 -592.4716 -1.2439 -1.3665
0.4682 0.6019 2300 0.4885 -4.1629 -5.5525 0.7455 1.3897 -799.8126 -680.9020 -1.0667 -1.1952
0.4582 0.6281 2400 0.4859 -3.7434 -4.9841 0.7460 1.2407 -742.9675 -638.9534 -1.0476 -1.1735
0.4948 0.6543 2500 0.4817 -3.6128 -4.8362 0.7425 1.2234 -728.1769 -625.8918 -1.0472 -1.1781
0.4588 0.6805 2600 0.4854 -3.5980 -4.8557 0.7430 1.2577 -730.1331 -624.4171 -1.1158 -1.2400
0.5354 0.7066 2700 0.4857 -4.1262 -5.3649 0.7445 1.2387 -781.0517 -677.2343 -1.0720 -1.1950
0.4782 0.7328 2800 0.4822 -3.8568 -5.1115 0.7460 1.2547 -755.7133 -650.2979 -1.1544 -1.2733
0.5135 0.7590 2900 0.4807 -3.9503 -5.2306 0.7475 1.2804 -767.6244 -659.6406 -1.1773 -1.2961
0.4613 0.7851 3000 0.4783 -3.6454 -4.8177 0.7545 1.1723 -726.3349 -629.1588 -1.1940 -1.3123
0.4904 0.8113 3100 0.4787 -3.8925 -5.1623 0.7535 1.2698 -760.7857 -653.8602 -1.1654 -1.2847
0.4706 0.8375 3200 0.4755 -3.4858 -4.6973 0.7525 1.2116 -714.2923 -613.1915 -1.2139 -1.3301
0.519 0.8636 3300 0.4762 -3.6863 -4.9393 0.7525 1.2530 -738.4901 -633.2412 -1.1986 -1.3147
0.4446 0.8898 3400 0.4762 -3.8220 -5.1066 0.7535 1.2847 -755.2252 -646.8135 -1.1676 -1.2864
0.5378 0.9160 3500 0.4759 -3.7562 -5.0452 0.7530 1.2890 -749.0795 -640.2327 -1.1933 -1.3106
0.4506 0.9422 3600 0.4759 -3.7087 -4.9945 0.7535 1.2857 -744.0071 -635.4867 -1.1944 -1.3115
0.4732 0.9683 3700 0.4758 -3.6903 -4.9695 0.7540 1.2792 -741.5083 -633.6405 -1.1938 -1.3109
0.5041 0.9945 3800 0.4758 -3.6841 -4.9619 0.7545 1.2778 -740.7547 -633.0256 -1.1922 -1.3094

Framework versions

  • PEFT 0.12.0
  • Transformers 4.44.0
  • Pytorch 2.1.2
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
2
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for L1nkee/zephyr-7b-dpo-qlora

Adapter
(1171)
this model

Dataset used to train L1nkee/zephyr-7b-dpo-qlora