Edit model card

phi-1_5-dpo

This model is a fine-tuned version of rasyosef/phi-1_5-sft on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5013
  • Rewards/chosen: -1.0250
  • Rewards/rejected: -2.3893
  • Rewards/accuracies: 0.7283
  • Rewards/margins: 1.3643
  • Logps/rejected: -162.0916
  • Logps/chosen: -128.1033
  • Logits/rejected: 5.3082
  • Logits/chosen: 5.1890

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 300
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6899 0.1241 138 0.6769 -0.0153 -0.0504 0.625 0.0351 -138.7025 -118.0066 4.5710 4.4532
0.6309 0.2482 276 0.6035 -0.2012 -0.5586 0.7120 0.3575 -143.7850 -119.8655 4.5167 4.3940
0.5756 0.3723 414 0.5669 -0.3693 -0.9842 0.7174 0.6149 -148.0405 -121.5467 4.6242 4.5060
0.5715 0.4964 552 0.5446 -0.4109 -1.1855 0.7283 0.7745 -150.0534 -121.9633 4.7324 4.6143
0.5449 0.6205 690 0.5331 -0.4666 -1.3090 0.7446 0.8424 -151.2884 -122.5196 4.8229 4.7080
0.5536 0.7446 828 0.5136 -0.4885 -1.3825 0.7446 0.8940 -152.0234 -122.7389 4.8867 4.7737
0.5253 0.8687 966 0.5057 -0.5613 -1.5446 0.7554 0.9832 -153.6442 -123.4672 4.9287 4.8080
0.5249 0.9928 1104 0.5054 -0.5101 -1.4656 0.75 0.9555 -152.8544 -122.9549 4.8704 4.7521
0.4631 1.1169 1242 0.5067 -0.6889 -1.7678 0.75 1.0789 -155.8768 -124.7426 4.8470 4.7276
0.4524 1.2410 1380 0.5006 -0.7467 -1.9049 0.7446 1.1582 -157.2474 -125.3205 4.9447 4.8239
0.424 1.3651 1518 0.5036 -0.7638 -2.0144 0.7337 1.2505 -158.3425 -125.4923 4.9235 4.8002
0.4428 1.4892 1656 0.5004 -0.7790 -2.0132 0.7446 1.2342 -158.3307 -125.6437 4.9576 4.8375
0.4424 1.6133 1794 0.4944 -0.8220 -2.0517 0.7391 1.2297 -158.7152 -126.0739 4.9736 4.8553
0.4358 1.7374 1932 0.5022 -0.8091 -1.9993 0.7228 1.1902 -158.1918 -125.9447 5.0894 4.9702
0.4426 1.8615 2070 0.4992 -0.8254 -2.0308 0.7228 1.2054 -158.5065 -126.1077 5.0943 4.9780
0.4226 1.9856 2208 0.4971 -0.8701 -2.1434 0.7283 1.2733 -159.6329 -126.5553 5.1222 5.0011
0.3684 2.1097 2346 0.5032 -0.9201 -2.2281 0.7228 1.3081 -160.4799 -127.0545 5.2209 5.1031
0.3695 2.2338 2484 0.5022 -0.9332 -2.2651 0.7228 1.3319 -160.8495 -127.1860 5.2170 5.0977
0.3693 2.3579 2622 0.5022 -0.9418 -2.2839 0.7283 1.3421 -161.0379 -127.2717 5.2390 5.1169
0.3659 2.4820 2760 0.5037 -0.9820 -2.3392 0.7228 1.3572 -161.5908 -127.6742 5.2392 5.1148
0.3557 2.6061 2898 0.5031 -1.0001 -2.3531 0.7228 1.3529 -161.7294 -127.8552 5.2704 5.1488
0.3491 2.7302 3036 0.5053 -1.0242 -2.3803 0.7228 1.3562 -162.0017 -128.0954 5.2880 5.1693
0.3512 2.8543 3174 0.5036 -1.0265 -2.3833 0.7174 1.3568 -162.0320 -128.1190 5.2965 5.1768
0.3458 2.9784 3312 0.5013 -1.0250 -2.3893 0.7283 1.3643 -162.0916 -128.1033 5.3082 5.1890

Framework versions

  • PEFT 0.11.1
  • Transformers 4.42.4
  • Pytorch 2.3.1+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
12
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for rasyosef/phi-1_5-dpo

Base model

microsoft/phi-1_5
Adapter
(1)
this model

Datasets used to train rasyosef/phi-1_5-dpo

Collection including rasyosef/phi-1_5-dpo