zephyr-7b-dpo-full / README.md
lewtun's picture
lewtun HF staff
Update README.md
2eb22f9
|
raw
history blame
7.62 kB
metadata
license: apache-2.0
base_model: alignment-handbook/zephyr-7b-sft-full
tags:
  - generated_from_trainer
  - alignment-handbook
model-index:
  - name: zephyr-7b-dpo-full
    results: []

zephyr-7b-dpo-full

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6910
  • Rewards/chosen: -3.9218
  • Rewards/rejected: -8.2942
  • Rewards/accuracies: 0.8125
  • Rewards/margins: 4.3724
  • Logps/rejected: -279.5480
  • Logps/chosen: -293.9998
  • Logits/rejected: -2.6725
  • Logits/chosen: -2.7826

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 32
  • total_train_batch_size: 64
  • total_eval_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5386 0.1 100 0.5208 0.0564 -0.7521 0.7188 0.8085 -204.1269 -254.2179 -3.0136 -3.0550
0.4931 0.21 200 0.4882 -0.0132 -1.2683 0.7812 1.2551 -209.2889 -254.9136 -3.1056 -3.1407
0.479 0.31 300 0.5038 -0.1035 -1.4012 0.7812 1.2978 -210.6186 -255.8163 -3.0809 -3.1328
0.5052 0.41 400 0.5154 -0.1923 -1.8783 0.7969 1.6860 -215.3891 -256.7043 -2.9104 -2.9644
0.4513 0.52 500 0.4979 0.0207 -1.6562 0.7969 1.6769 -213.1682 -254.5742 -3.0061 -3.0657
0.4905 0.62 600 0.4907 -0.0944 -1.5847 0.7656 1.4903 -212.4527 -255.7256 -2.9374 -3.0170
0.5609 0.72 700 0.4928 -0.4249 -1.7238 0.7656 1.2989 -213.8441 -259.0304 -2.9475 -3.0128
0.5338 0.83 800 0.4767 -0.1567 -1.9114 0.8125 1.7547 -215.7200 -256.3484 -2.8455 -2.9183
0.5039 0.93 900 0.4854 -0.0886 -1.6900 0.75 1.6014 -213.5057 -255.6674 -2.8295 -2.9093
0.0776 1.03 1000 0.4938 -0.4848 -2.5287 0.7656 2.0438 -221.8927 -259.6300 -2.7580 -2.8437
0.0901 1.14 1100 0.5071 -1.0800 -3.2419 0.7812 2.1619 -229.0247 -265.5817 -2.8036 -2.8858
0.0828 1.24 1200 0.5159 -0.9682 -3.4087 0.7812 2.4406 -230.6935 -264.4635 -2.7961 -2.8708
0.0916 1.34 1300 0.5222 -1.0832 -3.5535 0.7969 2.4703 -232.1411 -265.6135 -2.8019 -2.8754
0.0965 1.44 1400 0.5204 -1.1951 -3.5681 0.7969 2.3731 -232.2874 -266.7324 -2.8058 -2.8884
0.0716 1.55 1500 0.5381 -1.6588 -4.0838 0.7188 2.4250 -237.4441 -271.3697 -2.7979 -2.8862
0.0957 1.65 1600 0.5151 -1.1746 -3.7477 0.75 2.5731 -234.0834 -266.5278 -2.7960 -2.8976
0.0645 1.75 1700 0.5393 -1.7591 -4.6011 0.8125 2.8419 -242.6167 -272.3728 -2.7483 -2.8592
0.0838 1.86 1800 0.5385 -1.6606 -4.4648 0.7656 2.8042 -241.2545 -271.3875 -2.7311 -2.8383
0.1106 1.96 1900 0.5322 -1.5621 -3.9779 0.7969 2.4158 -236.3850 -270.4025 -2.8194 -2.9133
0.0174 2.06 2000 0.5921 -2.4968 -5.9514 0.7969 3.4546 -256.1199 -279.7498 -2.7579 -2.8631
0.0134 2.17 2100 0.6247 -2.9002 -6.4277 0.7969 3.5275 -260.8829 -283.7838 -2.7316 -2.8319
0.0148 2.27 2200 0.6402 -3.2520 -7.0627 0.7812 3.8106 -267.2330 -287.3020 -2.6991 -2.8064
0.0142 2.37 2300 0.6563 -3.2715 -7.1303 0.8281 3.8588 -267.9088 -287.4962 -2.6871 -2.7992
0.011 2.48 2400 0.6605 -3.2996 -7.2258 0.7969 3.9262 -268.8643 -287.7776 -2.6555 -2.7717
0.0065 2.58 2500 0.6935 -3.6399 -8.0232 0.8125 4.3832 -276.8377 -291.1808 -2.6780 -2.7902
0.0089 2.68 2600 0.6773 -3.4822 -7.8182 0.8125 4.3360 -274.7881 -289.6033 -2.6885 -2.7994
0.0102 2.79 2700 0.6813 -3.5909 -7.8097 0.8281 4.2187 -274.7028 -290.6908 -2.6877 -2.7970
0.0136 2.89 2800 0.6892 -3.8236 -8.1490 0.8125 4.3254 -278.0957 -293.0175 -2.6765 -2.7862
0.0091 2.99 2900 0.6913 -3.9199 -8.3004 0.8125 4.3806 -279.6104 -293.9802 -2.6728 -2.7830

Framework versions

  • Transformers 4.35.0
  • Pytorch 2.1.0+cu118
  • Datasets 2.14.6
  • Tokenizers 0.14.1