Edit model card

openhermes-mistral-dpo-gptq

This model is a fine-tuned version of TheBloke/OpenHermes-2-Mistral-7B-GPTQ on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6751
  • Rewards/chosen: 0.0215
  • Rewards/rejected: -0.0002
  • Rewards/accuracies: 0.4375
  • Rewards/margins: 0.0217
  • Logps/rejected: -132.4150
  • Logps/chosen: -333.1984
  • Logits/rejected: -2.7074
  • Logits/chosen: -2.3899

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 2
  • training_steps: 200
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.7508 0.01 10 0.7479 -0.3566 -0.2195 0.25 -0.1371 -351.7834 -711.3196 -1.6251 -1.4448
0.982 0.01 20 0.7765 -0.5130 -0.3405 0.25 -0.1726 -472.7075 -867.7224 -1.1628 -1.0511
0.6985 0.01 30 0.6899 0.0062 0.0027 0.375 0.0036 -129.5716 -348.4551 -2.7357 -2.3605
0.6959 0.02 40 0.6935 0.0008 0.0022 0.25 -0.0014 -130.0675 -353.8832 -2.7275 -2.3561
0.6944 0.03 50 0.6892 0.0073 0.0040 0.4375 0.0033 -128.2573 -347.3910 -2.7124 -2.3589
0.7785 0.03 60 0.7361 -0.4130 -0.2866 0.375 -0.1264 -418.8091 -767.6629 -1.3320 -1.2310
0.7009 0.04 70 0.7892 -0.5637 -0.3765 0.3125 -0.1872 -508.7737 -918.3933 -1.1171 -1.0132
0.7886 0.04 80 0.7862 -0.5738 -0.3892 0.3125 -0.1845 -521.4880 -928.4485 -1.1127 -1.0064
0.7059 0.04 90 0.7127 -0.0370 -0.0108 0.4375 -0.0263 -143.0086 -391.7115 -2.6542 -2.3045
0.6793 0.05 100 0.6981 -0.0357 -0.0284 0.375 -0.0073 -160.6859 -390.4216 -2.5199 -2.2133
0.7085 0.06 110 0.7039 -0.0251 -0.0089 0.3125 -0.0162 -141.1216 -379.7617 -2.6806 -2.3312
0.6959 0.06 120 0.6974 -0.0162 -0.0077 0.375 -0.0085 -139.9174 -370.8595 -2.6925 -2.3406
0.6897 0.07 130 0.6948 -0.0122 -0.0069 0.3125 -0.0053 -139.1202 -366.9146 -2.6971 -2.3477
0.6897 0.07 140 0.6935 -0.0104 -0.0067 0.3125 -0.0038 -138.8917 -365.1371 -2.6948 -2.3576
0.7015 0.07 150 0.6864 0.0011 -0.0042 0.4375 0.0054 -136.4684 -353.5512 -2.6973 -2.3710
0.6497 0.08 160 0.6814 0.0099 -0.0023 0.4375 0.0122 -134.5819 -344.8182 -2.7048 -2.3806
0.6893 0.09 170 0.6787 0.0147 -0.0015 0.4375 0.0161 -133.7108 -340.0247 -2.7106 -2.3874
0.7002 0.09 180 0.6776 0.0168 -0.0010 0.4375 0.0178 -133.2137 -337.8709 -2.7120 -2.3888
0.6875 0.1 190 0.6755 0.0209 -0.0002 0.4375 0.0211 -132.4327 -333.8066 -2.7093 -2.3902
0.6781 0.1 200 0.6751 0.0215 -0.0002 0.4375 0.0217 -132.4150 -333.1984 -2.7074 -2.3899

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.0.1+cu117
  • Datasets 2.15.0
  • Tokenizers 0.15.0
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .

Model tree for LBusser/openhermes-mistral-dpo-gptq