Edit model card

zephyr-7b-dpo-full-magpi-high-bleu-3-epochs

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0050
  • Rewards/chosen: -1.4582
  • Rewards/rejected: -44.8746
  • Rewards/accuracies: 0.9960
  • Rewards/margins: 43.4164
  • Logps/rejected: -5128.2480
  • Logps/chosen: -512.8050
  • Logits/rejected: -3.4441
  • Logits/chosen: -3.5504

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 55
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 128
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.0025 0.4739 50 0.0067 -1.4454 -37.1575 0.9940 35.7120 -4356.5356 -511.5262 -3.0434 -3.2142
0.0036 0.9479 100 0.0053 -2.0303 -41.2450 0.9940 39.2146 -4765.2842 -570.0164 -3.2429 -3.4104
0.0001 1.4218 150 0.0070 -1.9459 -45.2030 0.9940 43.2570 -5161.0879 -561.5757 -3.5068 -3.5867
0.0 1.8957 200 0.0047 -1.4539 -44.2686 0.9960 42.8147 -5067.6450 -512.3704 -3.4229 -3.5020
0.0 2.3697 250 0.0050 -1.4525 -44.7537 0.9960 43.3012 -5116.1577 -512.2269 -3.4445 -3.5510
0.0 2.8436 300 0.0050 -1.4582 -44.8746 0.9960 43.4164 -5128.2480 -512.8050 -3.4441 -3.5504

Framework versions

  • Transformers 4.44.0.dev0
  • Pytorch 2.1.2
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
7.24B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for sfulay/zephyr-7b-dpo-full-magpi-high-bleu-3-epochs

Finetuned
(278)
this model