Edit model card

zephyr-7b-lora-dpo-dibt-v0

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the argilla/10k_prompts_dpo dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1666
  • Rewards/chosen: -0.7428
  • Rewards/rejected: -5.5139
  • Rewards/accuracies: 0.9375
  • Rewards/margins: 4.7711
  • Logps/rejected: -387.5656
  • Logps/chosen: -341.2073
  • Logits/rejected: -2.1864
  • Logits/chosen: -2.2314

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6028 0.19 20 0.5286 0.8789 0.4471 0.8125 0.4318 -327.9556 -324.9910 -2.6143 -2.6401
0.3363 0.39 40 0.3232 0.5215 -1.1097 0.8594 1.6312 -343.5236 -328.5651 -2.5076 -2.5352
0.2458 0.58 60 0.2501 0.5738 -1.8685 0.9115 2.4423 -351.1114 -328.0413 -2.5602 -2.5924
0.2116 0.78 80 0.1991 -0.6755 -3.8274 0.9167 3.1519 -370.7006 -340.5351 -2.3129 -2.3427
0.1386 0.97 100 0.2002 0.2920 -3.0192 0.9375 3.3111 -362.6181 -330.8600 -2.3132 -2.3535
0.0458 1.17 120 0.1748 -1.3802 -5.8772 0.9479 4.4969 -391.1983 -347.5820 -2.2290 -2.2717
0.0426 1.36 140 0.1755 -0.0635 -4.3090 0.9375 4.2455 -375.5160 -334.4143 -2.1959 -2.2403
0.029 1.55 160 0.1692 -0.7990 -5.4881 0.9375 4.6891 -387.3076 -341.7697 -2.1893 -2.2329
0.0676 1.75 180 0.1676 -0.6944 -5.4513 0.9375 4.7569 -386.9397 -340.7238 -2.1864 -2.2314
0.0517 1.94 200 0.1666 -0.7428 -5.5139 0.9375 4.7711 -387.5656 -341.2073 -2.1864 -2.2314

Framework versions

  • PEFT 0.8.2
  • Transformers 4.38.0
  • Pytorch 2.1.1+cu121
  • Datasets 2.16.1
  • Tokenizers 0.15.2
Downloads last month
5
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for plaguss/zephyr-7b-lora-adapter-dpo-dibt-v0

Adapter
(136)
this model