gpt1B_DPO_model2
This model is a fine-tuned version of AI-Sweden-Models/gpt-sw3-1.3b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.0068
- Rewards/chosen: -0.0291
- Rewards/rejected: -6.8852
- Rewards/accuracies: 1.0
- Rewards/margins: 6.8561
- Logps/rejected: -290.5968
- Logps/chosen: -127.3574
- Logits/rejected: -2.7556
- Logits/chosen: -2.9748
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 2
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.3903 | 0.1 | 25 | 0.2328 | 0.1244 | -1.3373 | 0.9933 | 1.4618 | -235.1181 | -125.8223 | -3.0887 | -3.2517 |
0.0561 | 0.2 | 50 | 0.0585 | 0.0159 | -3.4934 | 0.9933 | 3.5094 | -256.6789 | -126.9073 | -2.9094 | -3.1004 |
0.0267 | 0.3 | 75 | 0.0268 | -0.0626 | -4.9264 | 0.9967 | 4.8637 | -271.0085 | -127.6931 | -2.8143 | -3.0209 |
0.0141 | 0.4 | 100 | 0.0175 | -0.0535 | -5.4979 | 0.9967 | 5.4444 | -276.7235 | -127.6012 | -2.7755 | -2.9884 |
0.0105 | 0.5 | 125 | 0.0133 | -0.0686 | -5.9461 | 0.9967 | 5.8775 | -281.2056 | -127.7524 | -2.7592 | -2.9752 |
0.0093 | 0.6 | 150 | 0.0113 | -0.0582 | -6.1989 | 0.9967 | 6.1407 | -283.7333 | -127.6482 | -2.7644 | -2.9810 |
0.007 | 0.7 | 175 | 0.0097 | -0.0175 | -6.2570 | 1.0 | 6.2396 | -284.3148 | -127.2412 | -2.7683 | -2.9851 |
0.0085 | 0.79 | 200 | 0.0083 | 0.0050 | -6.4220 | 1.0 | 6.4270 | -285.9642 | -127.0162 | -2.7708 | -2.9884 |
0.0049 | 0.89 | 225 | 0.0079 | -0.0124 | -6.5942 | 1.0 | 6.5818 | -287.6865 | -127.1910 | -2.7644 | -2.9830 |
0.004 | 0.99 | 250 | 0.0076 | -0.0282 | -6.7093 | 1.0 | 6.6811 | -288.8376 | -127.3483 | -2.7587 | -2.9779 |
0.0028 | 1.09 | 275 | 0.0072 | -0.0372 | -6.7997 | 1.0 | 6.7625 | -289.7418 | -127.4389 | -2.7571 | -2.9763 |
0.005 | 1.19 | 300 | 0.0070 | -0.0326 | -6.8348 | 1.0 | 6.8022 | -290.0928 | -127.3927 | -2.7560 | -2.9754 |
0.0038 | 1.29 | 325 | 0.0069 | -0.0346 | -6.8482 | 1.0 | 6.8137 | -290.2270 | -127.4126 | -2.7557 | -2.9749 |
0.004 | 1.39 | 350 | 0.0069 | -0.0326 | -6.8612 | 1.0 | 6.8285 | -290.3561 | -127.3931 | -2.7556 | -2.9747 |
0.0032 | 1.49 | 375 | 0.0069 | -0.0328 | -6.8697 | 1.0 | 6.8370 | -290.4420 | -127.3942 | -2.7557 | -2.9750 |
0.0028 | 1.59 | 400 | 0.0069 | -0.0322 | -6.8743 | 1.0 | 6.8422 | -290.4877 | -127.3882 | -2.7558 | -2.9751 |
0.004 | 1.69 | 425 | 0.0067 | -0.0293 | -6.8746 | 1.0 | 6.8453 | -290.4905 | -127.3596 | -2.7557 | -2.9750 |
0.003 | 1.79 | 450 | 0.0067 | -0.0296 | -6.8840 | 1.0 | 6.8544 | -290.5845 | -127.3624 | -2.7553 | -2.9746 |
0.0028 | 1.89 | 475 | 0.0068 | -0.0285 | -6.8839 | 1.0 | 6.8554 | -290.5839 | -127.3521 | -2.7555 | -2.9748 |
0.0028 | 1.99 | 500 | 0.0068 | -0.0291 | -6.8852 | 1.0 | 6.8561 | -290.5968 | -127.3574 | -2.7556 | -2.9748 |
Framework versions
- PEFT 0.8.2
- Transformers 4.38.1
- Pytorch 2.2.0+cu118
- Datasets 2.17.1
- Tokenizers 0.15.2
- Downloads last month
- 0
Model tree for thorirhrafn/gpt1B_DPO_model2
Base model
AI-Sweden-Models/gpt-sw3-1.3b