Edit model card

gpt1B_DPO_model2

This model is a fine-tuned version of AI-Sweden-Models/gpt-sw3-1.3b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0068
  • Rewards/chosen: -0.0291
  • Rewards/rejected: -6.8852
  • Rewards/accuracies: 1.0
  • Rewards/margins: 6.8561
  • Logps/rejected: -290.5968
  • Logps/chosen: -127.3574
  • Logits/rejected: -2.7556
  • Logits/chosen: -2.9748

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.3903 0.1 25 0.2328 0.1244 -1.3373 0.9933 1.4618 -235.1181 -125.8223 -3.0887 -3.2517
0.0561 0.2 50 0.0585 0.0159 -3.4934 0.9933 3.5094 -256.6789 -126.9073 -2.9094 -3.1004
0.0267 0.3 75 0.0268 -0.0626 -4.9264 0.9967 4.8637 -271.0085 -127.6931 -2.8143 -3.0209
0.0141 0.4 100 0.0175 -0.0535 -5.4979 0.9967 5.4444 -276.7235 -127.6012 -2.7755 -2.9884
0.0105 0.5 125 0.0133 -0.0686 -5.9461 0.9967 5.8775 -281.2056 -127.7524 -2.7592 -2.9752
0.0093 0.6 150 0.0113 -0.0582 -6.1989 0.9967 6.1407 -283.7333 -127.6482 -2.7644 -2.9810
0.007 0.7 175 0.0097 -0.0175 -6.2570 1.0 6.2396 -284.3148 -127.2412 -2.7683 -2.9851
0.0085 0.79 200 0.0083 0.0050 -6.4220 1.0 6.4270 -285.9642 -127.0162 -2.7708 -2.9884
0.0049 0.89 225 0.0079 -0.0124 -6.5942 1.0 6.5818 -287.6865 -127.1910 -2.7644 -2.9830
0.004 0.99 250 0.0076 -0.0282 -6.7093 1.0 6.6811 -288.8376 -127.3483 -2.7587 -2.9779
0.0028 1.09 275 0.0072 -0.0372 -6.7997 1.0 6.7625 -289.7418 -127.4389 -2.7571 -2.9763
0.005 1.19 300 0.0070 -0.0326 -6.8348 1.0 6.8022 -290.0928 -127.3927 -2.7560 -2.9754
0.0038 1.29 325 0.0069 -0.0346 -6.8482 1.0 6.8137 -290.2270 -127.4126 -2.7557 -2.9749
0.004 1.39 350 0.0069 -0.0326 -6.8612 1.0 6.8285 -290.3561 -127.3931 -2.7556 -2.9747
0.0032 1.49 375 0.0069 -0.0328 -6.8697 1.0 6.8370 -290.4420 -127.3942 -2.7557 -2.9750
0.0028 1.59 400 0.0069 -0.0322 -6.8743 1.0 6.8422 -290.4877 -127.3882 -2.7558 -2.9751
0.004 1.69 425 0.0067 -0.0293 -6.8746 1.0 6.8453 -290.4905 -127.3596 -2.7557 -2.9750
0.003 1.79 450 0.0067 -0.0296 -6.8840 1.0 6.8544 -290.5845 -127.3624 -2.7553 -2.9746
0.0028 1.89 475 0.0068 -0.0285 -6.8839 1.0 6.8554 -290.5839 -127.3521 -2.7555 -2.9748
0.0028 1.99 500 0.0068 -0.0291 -6.8852 1.0 6.8561 -290.5968 -127.3574 -2.7556 -2.9748

Framework versions

  • PEFT 0.8.2
  • Transformers 4.38.1
  • Pytorch 2.2.0+cu118
  • Datasets 2.17.1
  • Tokenizers 0.15.2
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for thorirhrafn/gpt1B_DPO_model2

Adapter
(14)
this model