yihang7's picture
Model save
bb83397 verified
metadata
tags:
  - generated_from_trainer
model-index:
  - name: llama2-7b-chat-dpo-full-hydrox-safe
    results: []

llama2-7b-chat-dpo-full-hydrox-safe

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0013
  • Rewards/chosen: -0.0939
  • Rewards/rejected: -28.6036
  • Rewards/accuracies: 0.9992
  • Rewards/margins: 28.5097
  • Logps/rejected: -700.8997
  • Logps/chosen: -219.0951
  • Logits/rejected: -0.7196
  • Logits/chosen: -0.6433

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6247 0.03 100 0.6155 0.0158 -0.1451 0.8577 0.1608 -416.3145 -217.9982 -0.2810 -0.5546
0.3738 0.07 200 0.3507 0.1699 -0.8998 0.9184 1.0697 -423.8618 -216.4572 -0.3464 -0.5597
0.2144 0.1 300 0.2152 0.3547 -1.8023 0.9394 2.1570 -432.8871 -214.6091 -0.3537 -0.5335
0.1567 0.14 400 0.1458 0.4781 -2.9967 0.9554 3.4748 -444.8309 -213.3745 -0.3343 -0.4947
0.1121 0.17 500 0.1250 0.5327 -4.1425 0.9689 4.6752 -456.2888 -212.8291 -0.3020 -0.4604
0.1003 0.2 600 0.0926 0.3744 -5.4423 0.9697 5.8167 -469.2868 -214.4121 -0.3227 -0.4643
0.0602 0.24 700 0.0769 0.3722 -6.6327 0.9739 7.0049 -481.1906 -214.4338 -0.3134 -0.4460
0.0584 0.27 800 0.0638 0.4037 -7.6613 0.9806 8.0650 -491.4773 -214.1189 -0.2857 -0.4235
0.0555 0.31 900 0.0557 0.4281 -8.2327 0.9848 8.6608 -497.1909 -213.8745 -0.2914 -0.4270
0.0471 0.34 1000 0.0472 0.4046 -9.7769 0.9891 10.1814 -512.6325 -214.1102 -0.3491 -0.4740
0.0673 0.37 1100 0.0383 0.3282 -11.0251 0.9949 11.3534 -525.1152 -214.8733 -0.3772 -0.4955
0.031 0.41 1200 0.0325 0.1923 -11.8461 0.9958 12.0384 -533.3251 -216.2326 -0.4115 -0.5142
0.0242 0.44 1300 0.0275 0.2059 -12.9425 0.9966 13.1485 -544.2894 -216.0965 -0.4212 -0.5150
0.0143 0.48 1400 0.0215 0.0180 -11.9692 0.9958 11.9872 -534.5560 -217.9758 -0.5405 -0.5538
0.0157 0.51 1500 0.0181 -0.3479 -14.0856 0.9958 13.7377 -555.7203 -221.6349 -0.5292 -0.5576
0.0155 0.54 1600 0.0286 -0.2238 -13.8665 0.9958 13.6427 -553.5294 -220.3942 -0.4943 -0.5256
0.0148 0.58 1700 0.0251 -0.2352 -15.8803 0.9975 15.6451 -573.6669 -220.5081 -0.4799 -0.5212
0.0094 0.61 1800 0.0163 -0.1817 -16.7316 0.9975 16.5499 -582.1795 -219.9725 -0.4976 -0.5385
0.0112 0.65 1900 0.0159 -0.3917 -18.6440 0.9949 18.2523 -601.3036 -222.0726 -0.5874 -0.6174
0.007 0.68 2000 0.0106 -0.1240 -16.5280 0.9975 16.4040 -580.1437 -219.3957 -0.5555 -0.5702
0.0083 0.71 2100 0.0167 -0.3388 -18.5238 0.9975 18.1849 -600.1016 -221.5440 -0.5802 -0.5848
0.0058 0.75 2200 0.0166 0.1875 -16.4876 0.9975 16.6751 -579.7398 -216.2812 -0.5300 -0.5517
0.0031 0.78 2300 0.0167 -0.4853 -19.1077 0.9966 18.6224 -605.9405 -223.0087 -0.5945 -0.5932
0.0041 0.82 2400 0.0148 -0.1266 -19.3544 0.9983 19.2278 -608.4083 -219.4222 -0.5528 -0.5695
0.0129 0.85 2500 0.0277 -0.6526 -21.0389 0.9983 20.3863 -625.2532 -224.6820 -0.6317 -0.6223
0.0169 0.88 2600 0.0158 -0.6507 -22.0352 0.9983 21.3845 -635.2158 -224.6625 -0.6147 -0.6148
0.005 0.92 2700 0.0148 -0.7455 -22.5637 0.9983 21.8181 -640.5008 -225.6113 -0.6401 -0.6379
0.0075 0.95 2800 0.0429 -0.6179 -21.7587 0.9983 21.1408 -632.4512 -224.3349 -0.6580 -0.6338
0.0053 0.99 2900 0.0452 -0.3093 -21.3611 0.9983 21.0518 -628.4748 -221.2488 -0.6473 -0.6362
0.0033 1.02 3000 0.0399 -0.4299 -20.6185 0.9992 20.1886 -621.0488 -222.4544 -0.6812 -0.6500
0.1239 1.05 3100 0.0098 -0.4156 -21.6528 0.9992 21.2371 -631.3915 -222.3120 -0.6612 -0.6328
0.0029 1.09 3200 0.0041 -0.4823 -24.1370 0.9992 23.6547 -656.2342 -222.9791 -0.6460 -0.6310
0.0015 1.12 3300 0.0037 -0.6250 -25.4442 0.9975 24.8192 -669.3063 -224.4059 -0.6623 -0.6482
0.002 1.16 3400 0.0039 -0.1881 -23.5637 0.9983 23.3756 -650.5010 -220.0367 -0.6331 -0.6142
0.0027 1.19 3500 0.0039 -0.3251 -24.0619 0.9992 23.7368 -655.4830 -221.4067 -0.6644 -0.6402
0.0015 1.22 3600 0.0031 -0.4337 -26.8013 0.9983 26.3676 -682.8770 -222.4931 -0.6421 -0.6330
0.0067 1.26 3700 0.0030 -0.1107 -22.8513 0.9992 22.7406 -643.3767 -219.2624 -0.6412 -0.6162
0.002 1.29 3800 0.0029 -0.4330 -24.7254 0.9992 24.2925 -662.1182 -222.4855 -0.6750 -0.6447
0.004 1.33 3900 0.0026 -0.5258 -25.6407 0.9992 25.1150 -671.2714 -223.4133 -0.6613 -0.6319
0.0032 1.36 4000 0.0025 -0.8592 -27.4389 0.9975 26.5797 -689.2528 -226.7478 -0.6796 -0.6569
0.0293 1.39 4100 0.0032 -0.6286 -26.4388 0.9992 25.8102 -679.2518 -224.4421 -0.6657 -0.6341
0.002 1.43 4200 0.0026 -0.6449 -26.1156 0.9992 25.4707 -676.0200 -224.6045 -0.6907 -0.6546
0.0007 1.46 4300 0.0026 -0.4135 -25.3743 0.9992 24.9609 -668.6074 -222.2907 -0.6704 -0.6348
0.001 1.5 4400 0.0025 -0.1706 -25.4135 0.9992 25.2428 -668.9984 -219.8623 -0.6670 -0.6312
0.0018 1.53 4500 0.0026 -0.3368 -23.9768 0.9992 23.6400 -654.6318 -221.5240 -0.6866 -0.6345
0.0035 1.56 4600 0.0025 0.0146 -23.9455 0.9992 23.9602 -654.3195 -218.0095 -0.6725 -0.6253
0.003 1.6 4700 0.0024 0.0616 -23.3292 0.9992 23.3908 -648.1558 -217.5395 -0.6644 -0.6168
0.0028 1.63 4800 0.0026 -0.5134 -26.9070 0.9992 26.3937 -683.9343 -223.2894 -0.7161 -0.6634
0.0047 1.67 4900 0.0025 -0.0916 -24.8206 0.9992 24.7290 -663.0701 -219.0718 -0.6444 -0.6038
0.0003 1.7 5000 0.0025 0.1584 -23.8425 0.9992 24.0009 -653.2887 -216.5716 -0.6169 -0.5785
0.0074 1.73 5100 0.0026 0.4581 -22.1966 0.9992 22.6546 -636.8298 -213.5752 -0.6477 -0.5976
0.002 1.77 5200 0.0023 0.1663 -23.7774 0.9983 23.9437 -652.6381 -216.4931 -0.6778 -0.6312
0.0005 1.8 5300 0.0021 0.0885 -24.4639 0.9983 24.5525 -659.5032 -217.2705 -0.6907 -0.6445
0.0009 1.84 5400 0.0020 0.3259 -23.8153 0.9983 24.1412 -653.0168 -214.8967 -0.6674 -0.6177
0.0004 1.87 5500 0.0027 0.0547 -25.4516 0.9992 25.5063 -669.3798 -217.6091 -0.7239 -0.6630
0.0078 1.9 5600 0.0027 -0.2841 -27.2416 0.9992 26.9575 -687.2796 -220.9968 -0.7328 -0.6718
0.0053 1.94 5700 0.0031 0.3394 -23.3205 0.9992 23.6599 -648.0685 -214.7619 -0.7018 -0.6326
0.0028 1.97 5800 0.0022 0.3456 -23.6389 0.9992 23.9845 -651.2528 -214.7000 -0.6865 -0.6247
0.0003 2.01 5900 0.0022 0.0137 -25.1376 0.9992 25.1513 -666.2399 -218.0188 -0.7179 -0.6544
0.0003 2.04 6000 0.0022 -0.0273 -25.5899 0.9992 25.5627 -670.7634 -218.4287 -0.7175 -0.6559
0.0005 2.07 6100 0.0021 -0.0506 -26.3022 0.9992 26.2516 -677.8860 -218.6621 -0.7035 -0.6425
0.0033 2.11 6200 0.0020 -0.1977 -27.1231 0.9992 26.9254 -686.0947 -220.1329 -0.6936 -0.6406
0.0048 2.14 6300 0.0018 0.1836 -25.2467 0.9992 25.4303 -667.3306 -216.3201 -0.6888 -0.6298
0.0007 2.18 6400 0.0018 -0.0446 -26.3563 0.9992 26.3117 -678.4270 -218.6022 -0.7075 -0.6494
0.0003 2.21 6500 0.0018 -0.1020 -27.0392 0.9992 26.9372 -685.2560 -219.1755 -0.7007 -0.6418
0.0013 2.24 6600 0.0017 -0.0434 -26.1507 0.9992 26.1073 -676.3707 -218.5897 -0.7076 -0.6401
0.0002 2.28 6700 0.0018 0.1488 -25.4695 0.9992 25.6182 -669.5585 -216.6682 -0.6911 -0.6184
0.0003 2.31 6800 0.0018 -0.0762 -26.7830 0.9992 26.7068 -682.6938 -218.9181 -0.7238 -0.6530
0.0095 2.35 6900 0.0018 -0.2520 -27.9261 0.9992 27.6741 -694.1253 -220.6760 -0.7267 -0.6572
0.0012 2.38 7000 0.0017 -0.1979 -27.7144 0.9992 27.5165 -692.0080 -220.1350 -0.7207 -0.6516
0.0004 2.41 7100 0.0017 -0.2063 -28.2831 0.9992 28.0768 -697.6947 -220.2186 -0.7147 -0.6448
0.0002 2.45 7200 0.0017 -0.2423 -28.5426 0.9992 28.3004 -700.2905 -220.5785 -0.7291 -0.6572
0.0049 2.48 7300 0.0017 -0.0938 -27.3084 0.9992 27.2146 -687.9479 -219.0937 -0.7313 -0.6487
0.0024 2.52 7400 0.0016 -0.0596 -27.3730 0.9992 27.3134 -688.5939 -218.7520 -0.7289 -0.6467
0.0013 2.55 7500 0.0016 0.0102 -27.3445 0.9992 27.3547 -688.3093 -218.0539 -0.7271 -0.6462
0.0014 2.58 7600 0.0016 -0.1696 -28.7332 0.9992 28.5636 -702.1956 -219.8516 -0.7393 -0.6604
0.0002 2.62 7700 0.0015 -0.1083 -28.2952 0.9992 28.1869 -697.8158 -219.2384 -0.7264 -0.6502
0.0001 2.65 7800 0.0015 -0.0892 -28.2958 0.9992 28.2066 -697.8219 -219.0480 -0.7246 -0.6479
0.0025 2.69 7900 0.0015 -0.1066 -28.4335 0.9992 28.3270 -699.1990 -219.2214 -0.7196 -0.6447
0.0002 2.72 8000 0.0015 -0.1453 -28.8184 0.9992 28.6731 -703.0482 -219.6090 -0.7264 -0.6518
0.0002 2.75 8100 0.0015 -0.1058 -28.6632 0.9992 28.5575 -701.4964 -219.2135 -0.7190 -0.6438
0.0003 2.79 8200 0.0015 -0.1407 -28.8865 0.9992 28.7459 -703.7291 -219.5624 -0.7227 -0.6488
0.0002 2.82 8300 0.0014 -0.1528 -28.9232 0.9992 28.7704 -704.0963 -219.6839 -0.7272 -0.6534
0.0001 2.86 8400 0.0013 -0.1196 -28.7573 0.9992 28.6377 -702.4371 -219.3522 -0.7244 -0.6492
0.0003 2.89 8500 0.0013 -0.1542 -28.9522 0.9992 28.7980 -704.3861 -219.6977 -0.7276 -0.6518
0.0016 2.92 8600 0.0013 -0.0885 -28.6082 0.9992 28.5197 -700.9456 -219.0408 -0.7181 -0.6426
0.0014 2.96 8700 0.0013 -0.0904 -28.5887 0.9992 28.4983 -700.7510 -219.0594 -0.7190 -0.6429
0.0005 2.99 8800 0.0013 -0.0898 -28.5857 0.9992 28.4959 -700.7208 -219.0538 -0.7194 -0.6430

Framework versions

  • Transformers 4.35.0
  • Pytorch 2.1.1+cu121
  • Datasets 2.14.6
  • Tokenizers 0.14.1