Edit model card

SausageLM-7b-Instruct-v0.01-dpo-qlora

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4204
  • Rewards/chosen: -1.9644
  • Rewards/rejected: -3.5978
  • Rewards/accuracies: 0.8020
  • Rewards/margins: 1.6333
  • Logps/rejected: -778.7791
  • Logps/chosen: -552.1046
  • Logits/rejected: 1.3639
  • Logits/chosen: 0.3998

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 16
  • total_eval_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.4906 0.08 300 0.5340 -1.1814 -1.8425 0.7310 0.6611 -603.2533 -473.8014 -1.6234 -1.7536
0.4794 0.16 600 0.4701 -1.3882 -2.4799 0.7700 1.0918 -666.9945 -494.4773 1.2460 0.4450
0.4519 0.24 900 0.4566 -1.4239 -2.6724 0.7730 1.2485 -686.2431 -498.0537 1.0803 0.1979
0.4034 0.31 1200 0.4487 -1.9028 -3.5170 0.7870 1.6142 -770.7061 -545.9451 1.7156 0.7244
0.4193 0.39 1500 0.4420 -1.8864 -3.4847 0.7840 1.5983 -767.4712 -544.3021 0.9998 0.0019
0.409 0.47 1800 0.4365 -2.0591 -3.7221 0.7920 1.6630 -791.2130 -561.5723 1.4876 0.5341
0.4037 0.55 2100 0.4334 -2.1275 -3.8835 0.7970 1.7560 -807.3529 -568.4110 1.9485 0.9489
0.3829 0.63 2400 0.4248 -1.8791 -3.4902 0.8010 1.6111 -768.0193 -543.5670 1.5421 0.5047
0.47 0.71 2700 0.4211 -1.8565 -3.4027 0.8030 1.5462 -759.2699 -541.3088 1.5152 0.5343
0.3769 0.79 3000 0.4205 -1.9199 -3.5317 0.8010 1.6119 -772.1762 -547.6463 1.5142 0.5326
0.3921 0.86 3300 0.4216 -2.0430 -3.7240 0.8050 1.6810 -791.3992 -559.9616 1.5287 0.5531
0.4249 0.94 3600 0.4204 -1.9591 -3.5883 0.8000 1.6292 -777.8283 -551.5704 1.3533 0.3917

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.0
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for floleuerer/SausageLM-7b-Instruct-v0.01-dpo-qlora

Adapter
(892)
this model

Dataset used to train floleuerer/SausageLM-7b-Instruct-v0.01-dpo-qlora