yihang7's picture
Model save
37f2773 verified
metadata
license: apache-2.0
base_model: princeton-nlp/Sheared-LLaMA-1.3B
tags:
  - generated_from_trainer
model-index:
  - name: Sheared-LLaMA-1.3B-dpo-full-3-epoch-hydrox-safe
    results: []

Sheared-LLaMA-1.3B-dpo-full-3-epoch-hydrox-safe

This model is a fine-tuned version of princeton-nlp/Sheared-LLaMA-1.3B on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0041
  • Rewards/chosen: 1.7270
  • Rewards/rejected: -15.3712
  • Rewards/accuracies: 0.9983
  • Rewards/margins: 17.0982
  • Logps/rejected: -656.3423
  • Logps/chosen: -371.7201
  • Logits/rejected: 2.3459
  • Logits/chosen: 0.3641

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6612 0.03 100 0.6499 0.0765 -0.0151 0.8300 0.0916 -502.7813 -388.2253 3.2379 0.9032
0.4585 0.07 200 0.4458 0.5224 -0.1242 0.9301 0.6466 -503.8723 -383.7663 3.2494 0.9081
0.2519 0.1 300 0.2540 1.2036 -0.4814 0.9470 1.6851 -507.4445 -376.9535 3.2790 0.9127
0.17 0.14 400 0.1751 1.5794 -1.0033 0.9562 2.5827 -512.6629 -373.1959 3.3007 0.9173
0.1179 0.17 500 0.1215 1.8423 -2.0791 0.9588 3.9214 -523.4217 -370.5673 3.2925 0.9104
0.1032 0.2 600 0.1078 2.0902 -2.7647 0.9596 4.8549 -530.2773 -368.0876 3.2574 0.9180
0.0614 0.24 700 0.0881 2.2830 -3.4190 0.9638 5.7021 -536.8207 -366.1595 3.2243 0.9190
0.0666 0.27 800 0.0751 2.3690 -4.0591 0.9689 6.4281 -543.2214 -365.2995 3.1788 0.9025
0.0706 0.31 900 0.0662 2.4002 -4.5254 0.9722 6.9257 -547.8843 -364.9874 3.1624 0.9102
0.0711 0.34 1000 0.0577 2.4230 -4.9179 0.9764 7.3409 -551.8096 -364.7598 3.1467 0.9093
0.0623 0.37 1100 0.0572 2.4840 -5.3620 0.9773 7.8459 -556.2499 -364.1504 3.1186 0.9011
0.0443 0.41 1200 0.0526 2.4237 -5.4784 0.9798 7.9021 -557.4146 -364.7530 3.1196 0.8961
0.0416 0.44 1300 0.0477 2.3874 -6.2247 0.9823 8.6120 -564.8768 -365.1163 3.0683 0.8720
0.0365 0.48 1400 0.0448 2.2887 -6.8360 0.9806 9.1246 -570.9899 -366.1031 3.0491 0.8667
0.0341 0.51 1500 0.0442 2.2795 -6.9547 0.9848 9.2343 -572.1777 -366.1945 3.0299 0.8500
0.0406 0.54 1600 0.0414 2.0896 -7.0003 0.9848 9.0899 -572.6334 -368.0941 3.0437 0.8442
0.0427 0.58 1700 0.0387 2.0380 -7.1141 0.9857 9.1521 -573.7712 -368.6102 3.0458 0.8383
0.0225 0.61 1800 0.0421 2.2150 -7.1052 0.9891 9.3203 -573.6826 -366.8395 3.0443 0.8362
0.0298 0.65 1900 0.0364 2.0854 -7.7136 0.9882 9.7990 -579.7668 -368.1361 3.0306 0.8392
0.0255 0.68 2000 0.0353 2.1351 -7.6852 0.9907 9.8203 -579.4824 -367.6387 3.0204 0.8292
0.019 0.71 2100 0.0296 2.1215 -8.1790 0.9916 10.3005 -584.4203 -367.7745 3.0052 0.8412
0.0198 0.75 2200 0.0248 2.1218 -8.4302 0.9907 10.5520 -586.9324 -367.7719 2.9878 0.8183
0.0192 0.78 2300 0.0238 2.0950 -8.2588 0.9924 10.3538 -585.2184 -368.0402 2.9758 0.7942
0.0191 0.82 2400 0.0213 2.1701 -8.6399 0.9941 10.8101 -589.0295 -367.2885 2.9719 0.8049
0.0215 0.85 2500 0.0224 2.1220 -9.1960 0.9933 11.3180 -594.5902 -367.7695 2.9391 0.7799
0.0579 0.88 2600 0.0193 2.0368 -9.3428 0.9933 11.3796 -596.0587 -368.6217 2.9297 0.7933
0.0163 0.92 2700 0.0180 1.9057 -9.4956 0.9941 11.4013 -597.5867 -369.9328 2.9114 0.7628
0.019 0.95 2800 0.0194 1.9915 -9.4265 0.9933 11.4179 -596.8949 -369.0752 2.9223 0.7736
0.0166 0.99 2900 0.0182 2.0770 -9.1954 0.9958 11.2724 -594.5848 -368.2201 2.9186 0.7592
0.0121 1.02 3000 0.0180 1.9094 -9.4964 0.9941 11.4059 -597.5947 -369.8957 2.8957 0.7557
0.011 1.05 3100 0.0150 2.0009 -9.9345 0.9966 11.9354 -601.9758 -368.9812 2.8560 0.7294
0.0106 1.09 3200 0.0139 2.0861 -9.6153 0.9966 11.7014 -598.7830 -368.1290 2.8565 0.7071
0.0095 1.12 3300 0.0134 1.9755 -10.3936 0.9958 12.3691 -606.5661 -369.2344 2.8290 0.7083
0.0115 1.16 3400 0.0129 1.9719 -10.3851 0.9949 12.3569 -606.4811 -369.2712 2.8212 0.7184
0.0152 1.19 3500 0.0124 2.0357 -10.2131 0.9958 12.2488 -604.7615 -368.6329 2.8217 0.7140
0.01 1.22 3600 0.0116 2.0147 -10.9243 0.9966 12.9390 -611.8733 -368.8428 2.7589 0.6517
0.0135 1.26 3700 0.0116 1.9527 -10.8649 0.9966 12.8176 -611.2795 -369.4628 2.8017 0.7064
0.0078 1.29 3800 0.0112 1.7362 -11.5598 0.9966 13.2960 -618.2281 -371.6281 2.7623 0.6879
0.0114 1.33 3900 0.0106 1.8313 -11.3667 0.9983 13.1980 -616.2969 -370.6765 2.7616 0.6728
0.0077 1.36 4000 0.0101 1.9160 -11.5484 0.9992 13.4645 -618.1147 -369.8296 2.7534 0.6694
0.0057 1.39 4100 0.0098 1.8898 -11.3187 0.9983 13.2085 -615.8172 -370.0915 2.7553 0.6617
0.0056 1.43 4200 0.0091 2.0721 -11.6748 0.9992 13.7469 -619.3782 -368.2689 2.7234 0.6265
0.006 1.46 4300 0.0088 1.8416 -12.1884 0.9983 14.0300 -624.5148 -370.5739 2.7058 0.6225
0.0071 1.5 4400 0.0083 2.0151 -11.7393 0.9983 13.7544 -620.0233 -368.8386 2.7124 0.6231
0.0101 1.53 4500 0.0083 2.0864 -11.5153 0.9992 13.6016 -617.7830 -368.1264 2.7206 0.6407
0.0054 1.56 4600 0.0083 1.9930 -11.3424 0.9975 13.3354 -616.0542 -369.0597 2.7246 0.6099
0.0116 1.6 4700 0.0080 1.9298 -11.3167 0.9975 13.2464 -615.7971 -369.6923 2.7200 0.6008
0.0116 1.63 4800 0.0074 1.8809 -11.4685 0.9975 13.3494 -617.3154 -370.1813 2.6917 0.5698
0.0087 1.67 4900 0.0073 1.8993 -11.8845 0.9983 13.7838 -621.4749 -369.9968 2.6861 0.5798
0.0031 1.7 5000 0.0072 1.8755 -12.3032 0.9975 14.1787 -625.6624 -370.2348 2.6435 0.5411
0.0115 1.73 5100 0.0076 1.9283 -11.9068 0.9958 13.8351 -621.6979 -369.7066 2.6527 0.5393
0.0065 1.77 5200 0.0074 1.9870 -11.9105 0.9949 13.8975 -621.7357 -369.1199 2.6790 0.5763
0.006 1.8 5300 0.0068 1.7994 -12.4601 0.9958 14.2595 -627.2310 -370.9959 2.6264 0.5393
0.0076 1.84 5400 0.0064 2.0449 -12.2057 0.9966 14.2506 -624.6871 -368.5407 2.6409 0.5465
0.0042 1.87 5500 0.0062 1.9941 -12.4399 0.9983 14.4340 -627.0295 -369.0491 2.6332 0.5433
0.0079 1.9 5600 0.0061 1.9119 -12.4000 0.9983 14.3118 -626.6300 -369.8711 2.6300 0.5377
0.0066 1.94 5700 0.0062 2.0544 -12.1682 0.9983 14.2226 -624.3120 -368.4457 2.6248 0.5288
0.0071 1.97 5800 0.0061 2.0943 -12.2702 0.9975 14.3645 -625.3325 -368.0468 2.6248 0.5422
0.0021 2.01 5900 0.0057 1.9195 -12.9348 0.9983 14.8543 -631.9785 -369.7946 2.5712 0.5186
0.0029 2.04 6000 0.0057 1.8384 -13.3904 0.9983 15.2288 -636.5340 -370.6057 2.5405 0.4960
0.0035 2.07 6100 0.0056 1.6150 -14.2858 0.9975 15.9009 -645.4886 -372.8395 2.4718 0.4415
0.0053 2.11 6200 0.0053 1.8268 -13.9429 0.9983 15.7696 -642.0590 -370.7222 2.4921 0.4576
0.0044 2.14 6300 0.0052 1.9443 -13.8117 0.9975 15.7560 -640.7470 -369.5464 2.5079 0.4705
0.0026 2.18 6400 0.0053 2.0456 -13.7455 0.9975 15.7911 -640.0853 -368.5343 2.5139 0.4823
0.0026 2.21 6500 0.0050 2.0028 -13.6496 0.9983 15.6524 -639.1260 -368.9618 2.5135 0.4823
0.0029 2.24 6600 0.0050 1.8856 -13.7926 0.9975 15.6782 -640.5563 -370.1337 2.4828 0.4459
0.0023 2.28 6700 0.0049 1.9422 -14.0760 0.9983 16.0182 -643.3903 -369.5678 2.4698 0.4471
0.003 2.31 6800 0.0048 1.8633 -14.4649 0.9983 16.3282 -647.2790 -370.3570 2.4646 0.4562
0.0058 2.35 6900 0.0049 1.8085 -14.8512 0.9975 16.6597 -651.1427 -370.9051 2.4275 0.4292
0.0032 2.38 7000 0.0048 1.9006 -14.6340 0.9983 16.5346 -648.9703 -369.9842 2.4387 0.4425
0.0018 2.41 7100 0.0047 1.8215 -15.0376 0.9983 16.8592 -653.0067 -370.7746 2.4153 0.4296
0.001 2.45 7200 0.0046 1.8195 -15.0112 0.9983 16.8307 -652.7422 -370.7950 2.4153 0.4248
0.0057 2.48 7300 0.0045 1.8920 -14.4156 0.9983 16.3077 -646.7868 -370.0694 2.4336 0.4234
0.004 2.52 7400 0.0044 1.7826 -14.6522 0.9983 16.4348 -649.1526 -371.1638 2.4101 0.4117
0.0025 2.55 7500 0.0044 1.8202 -14.7043 0.9983 16.5245 -649.6732 -370.7875 2.4040 0.4069
0.0035 2.58 7600 0.0044 1.8712 -14.7562 0.9983 16.6273 -650.1921 -370.2782 2.4019 0.4087
0.002 2.62 7700 0.0043 1.8406 -14.8610 0.9983 16.7017 -651.2407 -370.5836 2.3996 0.4114
0.002 2.65 7800 0.0043 1.8042 -15.0820 0.9992 16.8862 -653.4503 -370.9484 2.3936 0.4147
0.0046 2.69 7900 0.0042 1.8043 -15.2990 0.9983 17.1033 -655.6204 -370.9472 2.3757 0.3993
0.0025 2.72 8000 0.0042 1.8289 -15.3097 0.9983 17.1386 -655.7274 -370.7011 2.3634 0.3853
0.0023 2.75 8100 0.0041 1.7995 -15.2380 0.9983 17.0375 -655.0099 -370.9947 2.3619 0.3779
0.0025 2.79 8200 0.0040 1.8013 -15.2440 0.9983 17.0453 -655.0703 -370.9769 2.3668 0.3827
0.002 2.82 8300 0.0040 1.8040 -15.2101 0.9983 17.0141 -654.7317 -370.9499 2.3660 0.3834
0.0023 2.86 8400 0.0040 1.7441 -15.3132 0.9983 17.0572 -655.7621 -371.5493 2.3498 0.3680
0.002 2.89 8500 0.0040 1.7551 -15.3278 0.9983 17.0828 -655.9080 -371.4393 2.3509 0.3714
0.004 2.92 8600 0.0040 1.7500 -15.3290 0.9983 17.0790 -655.9205 -371.4897 2.3518 0.3701
0.0041 2.96 8700 0.0040 1.7294 -15.3645 0.9983 17.0940 -656.2756 -371.6956 2.3478 0.3660
0.0029 2.99 8800 0.0040 1.7305 -15.3609 0.9983 17.0914 -656.2390 -371.6845 2.3464 0.3647

Framework versions

  • Transformers 4.35.0
  • Pytorch 2.1.1+cu121
  • Datasets 2.14.6
  • Tokenizers 0.14.1