--- license: apache-2.0 library_name: peft tags: - trl - dpo - generated_from_trainer base_model: AI-Sweden-Models/gpt-sw3-1.3b model-index: - name: gpt1B_DPO_model2 results: [] --- # gpt1B_DPO_model2 This model is a fine-tuned version of [AI-Sweden-Models/gpt-sw3-1.3b](https://huggingface.co/AI-Sweden-Models/gpt-sw3-1.3b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.0068 - Rewards/chosen: -0.0291 - Rewards/rejected: -6.8852 - Rewards/accuracies: 1.0 - Rewards/margins: 6.8561 - Logps/rejected: -290.5968 - Logps/chosen: -127.3574 - Logits/rejected: -2.7556 - Logits/chosen: -2.9748 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - gradient_accumulation_steps: 8 - total_train_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 2 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.3903 | 0.1 | 25 | 0.2328 | 0.1244 | -1.3373 | 0.9933 | 1.4618 | -235.1181 | -125.8223 | -3.0887 | -3.2517 | | 0.0561 | 0.2 | 50 | 0.0585 | 0.0159 | -3.4934 | 0.9933 | 3.5094 | -256.6789 | -126.9073 | -2.9094 | -3.1004 | | 0.0267 | 0.3 | 75 | 0.0268 | -0.0626 | -4.9264 | 0.9967 | 4.8637 | -271.0085 | -127.6931 | -2.8143 | -3.0209 | | 0.0141 | 0.4 | 100 | 0.0175 | -0.0535 | -5.4979 | 0.9967 | 5.4444 | -276.7235 | -127.6012 | -2.7755 | -2.9884 | | 0.0105 | 0.5 | 125 | 0.0133 | -0.0686 | -5.9461 | 0.9967 | 5.8775 | -281.2056 | -127.7524 | -2.7592 | -2.9752 | | 0.0093 | 0.6 | 150 | 0.0113 | -0.0582 | -6.1989 | 0.9967 | 6.1407 | -283.7333 | -127.6482 | -2.7644 | -2.9810 | | 0.007 | 0.7 | 175 | 0.0097 | -0.0175 | -6.2570 | 1.0 | 6.2396 | -284.3148 | -127.2412 | -2.7683 | -2.9851 | | 0.0085 | 0.79 | 200 | 0.0083 | 0.0050 | -6.4220 | 1.0 | 6.4270 | -285.9642 | -127.0162 | -2.7708 | -2.9884 | | 0.0049 | 0.89 | 225 | 0.0079 | -0.0124 | -6.5942 | 1.0 | 6.5818 | -287.6865 | -127.1910 | -2.7644 | -2.9830 | | 0.004 | 0.99 | 250 | 0.0076 | -0.0282 | -6.7093 | 1.0 | 6.6811 | -288.8376 | -127.3483 | -2.7587 | -2.9779 | | 0.0028 | 1.09 | 275 | 0.0072 | -0.0372 | -6.7997 | 1.0 | 6.7625 | -289.7418 | -127.4389 | -2.7571 | -2.9763 | | 0.005 | 1.19 | 300 | 0.0070 | -0.0326 | -6.8348 | 1.0 | 6.8022 | -290.0928 | -127.3927 | -2.7560 | -2.9754 | | 0.0038 | 1.29 | 325 | 0.0069 | -0.0346 | -6.8482 | 1.0 | 6.8137 | -290.2270 | -127.4126 | -2.7557 | -2.9749 | | 0.004 | 1.39 | 350 | 0.0069 | -0.0326 | -6.8612 | 1.0 | 6.8285 | -290.3561 | -127.3931 | -2.7556 | -2.9747 | | 0.0032 | 1.49 | 375 | 0.0069 | -0.0328 | -6.8697 | 1.0 | 6.8370 | -290.4420 | -127.3942 | -2.7557 | -2.9750 | | 0.0028 | 1.59 | 400 | 0.0069 | -0.0322 | -6.8743 | 1.0 | 6.8422 | -290.4877 | -127.3882 | -2.7558 | -2.9751 | | 0.004 | 1.69 | 425 | 0.0067 | -0.0293 | -6.8746 | 1.0 | 6.8453 | -290.4905 | -127.3596 | -2.7557 | -2.9750 | | 0.003 | 1.79 | 450 | 0.0067 | -0.0296 | -6.8840 | 1.0 | 6.8544 | -290.5845 | -127.3624 | -2.7553 | -2.9746 | | 0.0028 | 1.89 | 475 | 0.0068 | -0.0285 | -6.8839 | 1.0 | 6.8554 | -290.5839 | -127.3521 | -2.7555 | -2.9748 | | 0.0028 | 1.99 | 500 | 0.0068 | -0.0291 | -6.8852 | 1.0 | 6.8561 | -290.5968 | -127.3574 | -2.7556 | -2.9748 | ### Framework versions - PEFT 0.8.2 - Transformers 4.38.1 - Pytorch 2.2.0+cu118 - Datasets 2.17.1 - Tokenizers 0.15.2