sfulay commited on
Commit
2eaa039
1 Parent(s): e6a9deb

Model save

Browse files
README.md ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: alignment-handbook/zephyr-7b-sft-full
4
+ tags:
5
+ - trl
6
+ - dpo
7
+ - generated_from_trainer
8
+ model-index:
9
+ - name: zephyr-7b-dpo-full-gpt_consistent-high-curriculum
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ # zephyr-7b-dpo-full-gpt_consistent-high-curriculum
17
+
18
+ This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the None dataset.
19
+ It achieves the following results on the evaluation set:
20
+ - Loss: 0.4943
21
+ - Rewards/chosen: -0.9906
22
+ - Rewards/rejected: -1.9162
23
+ - Rewards/accuracies: 0.7328
24
+ - Rewards/margins: 0.9257
25
+ - Logps/rejected: -438.1469
26
+ - Logps/chosen: -384.1485
27
+ - Logits/rejected: 1.5457
28
+ - Logits/chosen: 0.3365
29
+
30
+ ## Model description
31
+
32
+ More information needed
33
+
34
+ ## Intended uses & limitations
35
+
36
+ More information needed
37
+
38
+ ## Training and evaluation data
39
+
40
+ More information needed
41
+
42
+ ## Training procedure
43
+
44
+ ### Training hyperparameters
45
+
46
+ The following hyperparameters were used during training:
47
+ - learning_rate: 5e-07
48
+ - train_batch_size: 8
49
+ - eval_batch_size: 8
50
+ - seed: 55
51
+ - distributed_type: multi-GPU
52
+ - num_devices: 8
53
+ - gradient_accumulation_steps: 2
54
+ - total_train_batch_size: 128
55
+ - total_eval_batch_size: 64
56
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
57
+ - lr_scheduler_type: cosine
58
+ - lr_scheduler_warmup_ratio: 0.1
59
+ - num_epochs: 1
60
+
61
+ ### Training results
62
+
63
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
64
+ |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
65
+ | 0.6617 | 0.1147 | 50 | 0.6429 | -0.0319 | -0.1521 | 0.6983 | 0.1202 | -261.7313 | -288.2777 | -2.4839 | -2.5612 |
66
+ | 0.5671 | 0.2294 | 100 | 0.5741 | -0.6263 | -1.0517 | 0.6897 | 0.4254 | -351.6960 | -347.7197 | -1.7192 | -1.9024 |
67
+ | 0.5328 | 0.3440 | 150 | 0.5237 | -0.5529 | -1.2131 | 0.7155 | 0.6602 | -367.8349 | -340.3821 | 0.6797 | -0.1069 |
68
+ | 0.5339 | 0.4587 | 200 | 0.5135 | -0.8694 | -1.6358 | 0.7284 | 0.7664 | -410.0984 | -372.0303 | 1.1593 | 0.1333 |
69
+ | 0.5206 | 0.5734 | 250 | 0.5051 | -1.1478 | -2.0209 | 0.7457 | 0.8731 | -448.6093 | -399.8708 | 2.1252 | 1.0597 |
70
+ | 0.5161 | 0.6881 | 300 | 0.4995 | -1.0692 | -1.9303 | 0.7414 | 0.8611 | -439.5535 | -392.0123 | 1.6077 | 0.4690 |
71
+ | 0.5113 | 0.8028 | 350 | 0.4953 | -0.9905 | -1.8979 | 0.7284 | 0.9073 | -436.3081 | -384.1431 | 1.4599 | 0.2827 |
72
+ | 0.5006 | 0.9174 | 400 | 0.4943 | -0.9906 | -1.9162 | 0.7328 | 0.9257 | -438.1469 | -384.1485 | 1.5457 | 0.3365 |
73
+
74
+
75
+ ### Framework versions
76
+
77
+ - Transformers 4.44.0.dev0
78
+ - Pytorch 2.1.2
79
+ - Datasets 2.20.0
80
+ - Tokenizers 0.19.1
all_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.5444534902178914,
5
+ "train_runtime": 11753.731,
6
+ "train_samples": 55758,
7
+ "train_samples_per_second": 4.744,
8
+ "train_steps_per_second": 0.037
9
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.44.0.dev0"
6
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.5444534902178914,
5
+ "train_runtime": 11753.731,
6
+ "train_samples": 55758,
7
+ "train_samples_per_second": 4.744,
8
+ "train_steps_per_second": 0.037
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,815 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 1.0,
5
+ "eval_steps": 50,
6
+ "global_step": 436,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.022935779816513763,
13
+ "grad_norm": 8.114700597292016,
14
+ "learning_rate": 1.1363636363636363e-07,
15
+ "logits/chosen": -2.630662441253662,
16
+ "logits/rejected": -2.588312864303589,
17
+ "logps/chosen": -251.37826538085938,
18
+ "logps/rejected": -245.56118774414062,
19
+ "loss": 0.6932,
20
+ "rewards/accuracies": 0.39375001192092896,
21
+ "rewards/chosen": 0.0003000783617608249,
22
+ "rewards/margins": 3.4782169677782804e-05,
23
+ "rewards/rejected": 0.00026529619935899973,
24
+ "step": 10
25
+ },
26
+ {
27
+ "epoch": 0.045871559633027525,
28
+ "grad_norm": 9.2292553026982,
29
+ "learning_rate": 2.2727272727272726e-07,
30
+ "logits/chosen": -2.6437935829162598,
31
+ "logits/rejected": -2.5945396423339844,
32
+ "logps/chosen": -305.3244323730469,
33
+ "logps/rejected": -288.41082763671875,
34
+ "loss": 0.692,
35
+ "rewards/accuracies": 0.5562499761581421,
36
+ "rewards/chosen": 0.0019361503655090928,
37
+ "rewards/margins": 0.0025582597590982914,
38
+ "rewards/rejected": -0.0006221095682121813,
39
+ "step": 20
40
+ },
41
+ {
42
+ "epoch": 0.06880733944954129,
43
+ "grad_norm": 8.362484362577634,
44
+ "learning_rate": 3.4090909090909085e-07,
45
+ "logits/chosen": -2.6338467597961426,
46
+ "logits/rejected": -2.5890305042266846,
47
+ "logps/chosen": -290.099365234375,
48
+ "logps/rejected": -311.11444091796875,
49
+ "loss": 0.6875,
50
+ "rewards/accuracies": 0.706250011920929,
51
+ "rewards/chosen": 0.012083572335541248,
52
+ "rewards/margins": 0.01316638570278883,
53
+ "rewards/rejected": -0.0010828140657395124,
54
+ "step": 30
55
+ },
56
+ {
57
+ "epoch": 0.09174311926605505,
58
+ "grad_norm": 8.520101675940221,
59
+ "learning_rate": 4.545454545454545e-07,
60
+ "logits/chosen": -2.6183691024780273,
61
+ "logits/rejected": -2.586637258529663,
62
+ "logps/chosen": -266.4432067871094,
63
+ "logps/rejected": -251.4285125732422,
64
+ "loss": 0.6773,
65
+ "rewards/accuracies": 0.6499999761581421,
66
+ "rewards/chosen": 0.022834882140159607,
67
+ "rewards/margins": 0.023432452231645584,
68
+ "rewards/rejected": -0.0005975713720545173,
69
+ "step": 40
70
+ },
71
+ {
72
+ "epoch": 0.11467889908256881,
73
+ "grad_norm": 9.078671253466315,
74
+ "learning_rate": 4.997110275491701e-07,
75
+ "logits/chosen": -2.6266233921051025,
76
+ "logits/rejected": -2.5520858764648438,
77
+ "logps/chosen": -304.1348571777344,
78
+ "logps/rejected": -262.32061767578125,
79
+ "loss": 0.6617,
80
+ "rewards/accuracies": 0.643750011920929,
81
+ "rewards/chosen": 0.002302716951817274,
82
+ "rewards/margins": 0.0754440575838089,
83
+ "rewards/rejected": -0.07314133644104004,
84
+ "step": 50
85
+ },
86
+ {
87
+ "epoch": 0.11467889908256881,
88
+ "eval_logits/chosen": -2.561208486557007,
89
+ "eval_logits/rejected": -2.4839041233062744,
90
+ "eval_logps/chosen": -288.2777099609375,
91
+ "eval_logps/rejected": -261.73126220703125,
92
+ "eval_loss": 0.6429266333580017,
93
+ "eval_rewards/accuracies": 0.6982758641242981,
94
+ "eval_rewards/chosen": -0.031876176595687866,
95
+ "eval_rewards/margins": 0.12021197378635406,
96
+ "eval_rewards/rejected": -0.15208815038204193,
97
+ "eval_runtime": 95.5992,
98
+ "eval_samples_per_second": 19.017,
99
+ "eval_steps_per_second": 0.303,
100
+ "step": 50
101
+ },
102
+ {
103
+ "epoch": 0.13761467889908258,
104
+ "grad_norm": 11.596073380531864,
105
+ "learning_rate": 4.979475034558115e-07,
106
+ "logits/chosen": -2.547043561935425,
107
+ "logits/rejected": -2.4718384742736816,
108
+ "logps/chosen": -296.81402587890625,
109
+ "logps/rejected": -309.17828369140625,
110
+ "loss": 0.6408,
111
+ "rewards/accuracies": 0.625,
112
+ "rewards/chosen": -0.10427121073007584,
113
+ "rewards/margins": 0.10880544036626816,
114
+ "rewards/rejected": -0.2130766659975052,
115
+ "step": 60
116
+ },
117
+ {
118
+ "epoch": 0.16055045871559634,
119
+ "grad_norm": 14.345685973653804,
120
+ "learning_rate": 4.945923025551788e-07,
121
+ "logits/chosen": -2.57369065284729,
122
+ "logits/rejected": -2.5374178886413574,
123
+ "logps/chosen": -286.39215087890625,
124
+ "logps/rejected": -286.45330810546875,
125
+ "loss": 0.6181,
126
+ "rewards/accuracies": 0.6812499761581421,
127
+ "rewards/chosen": -0.11706429719924927,
128
+ "rewards/margins": 0.20492573082447052,
129
+ "rewards/rejected": -0.321990042924881,
130
+ "step": 70
131
+ },
132
+ {
133
+ "epoch": 0.1834862385321101,
134
+ "grad_norm": 15.011490045096238,
135
+ "learning_rate": 4.896669632591651e-07,
136
+ "logits/chosen": -2.5018064975738525,
137
+ "logits/rejected": -2.4190878868103027,
138
+ "logps/chosen": -328.02960205078125,
139
+ "logps/rejected": -310.5951843261719,
140
+ "loss": 0.6089,
141
+ "rewards/accuracies": 0.71875,
142
+ "rewards/chosen": -0.22072939574718475,
143
+ "rewards/margins": 0.22173993289470673,
144
+ "rewards/rejected": -0.44246941804885864,
145
+ "step": 80
146
+ },
147
+ {
148
+ "epoch": 0.20642201834862386,
149
+ "grad_norm": 12.064286648165384,
150
+ "learning_rate": 4.832031033425662e-07,
151
+ "logits/chosen": -2.316053628921509,
152
+ "logits/rejected": -2.2170588970184326,
153
+ "logps/chosen": -309.4535217285156,
154
+ "logps/rejected": -293.6477966308594,
155
+ "loss": 0.6033,
156
+ "rewards/accuracies": 0.6937500238418579,
157
+ "rewards/chosen": -0.34011608362197876,
158
+ "rewards/margins": 0.24611127376556396,
159
+ "rewards/rejected": -0.5862273573875427,
160
+ "step": 90
161
+ },
162
+ {
163
+ "epoch": 0.22935779816513763,
164
+ "grad_norm": 18.008124319426113,
165
+ "learning_rate": 4.752422169756047e-07,
166
+ "logits/chosen": -2.297001838684082,
167
+ "logits/rejected": -2.1643242835998535,
168
+ "logps/chosen": -329.64617919921875,
169
+ "logps/rejected": -325.0845947265625,
170
+ "loss": 0.5671,
171
+ "rewards/accuracies": 0.7749999761581421,
172
+ "rewards/chosen": -0.3444996774196625,
173
+ "rewards/margins": 0.4821701645851135,
174
+ "rewards/rejected": -0.8266698718070984,
175
+ "step": 100
176
+ },
177
+ {
178
+ "epoch": 0.22935779816513763,
179
+ "eval_logits/chosen": -1.9024137258529663,
180
+ "eval_logits/rejected": -1.719163417816162,
181
+ "eval_logps/chosen": -347.71966552734375,
182
+ "eval_logps/rejected": -351.6959533691406,
183
+ "eval_loss": 0.5741031765937805,
184
+ "eval_rewards/accuracies": 0.6896551847457886,
185
+ "eval_rewards/chosen": -0.6262954473495483,
186
+ "eval_rewards/margins": 0.4254392087459564,
187
+ "eval_rewards/rejected": -1.0517346858978271,
188
+ "eval_runtime": 94.9312,
189
+ "eval_samples_per_second": 19.151,
190
+ "eval_steps_per_second": 0.305,
191
+ "step": 100
192
+ },
193
+ {
194
+ "epoch": 0.25229357798165136,
195
+ "grad_norm": 18.209810501109782,
196
+ "learning_rate": 4.658354083558188e-07,
197
+ "logits/chosen": -1.37660813331604,
198
+ "logits/rejected": -1.3109476566314697,
199
+ "logps/chosen": -306.15673828125,
200
+ "logps/rejected": -364.8440856933594,
201
+ "loss": 0.5638,
202
+ "rewards/accuracies": 0.65625,
203
+ "rewards/chosen": -0.5968034267425537,
204
+ "rewards/margins": 0.4451308250427246,
205
+ "rewards/rejected": -1.0419342517852783,
206
+ "step": 110
207
+ },
208
+ {
209
+ "epoch": 0.27522935779816515,
210
+ "grad_norm": 30.619035412465948,
211
+ "learning_rate": 4.550430636492389e-07,
212
+ "logits/chosen": -0.5667544007301331,
213
+ "logits/rejected": -0.11496512591838837,
214
+ "logps/chosen": -352.6363830566406,
215
+ "logps/rejected": -321.98138427734375,
216
+ "loss": 0.5429,
217
+ "rewards/accuracies": 0.7124999761581421,
218
+ "rewards/chosen": -0.5322494506835938,
219
+ "rewards/margins": 0.4666845202445984,
220
+ "rewards/rejected": -0.9989339709281921,
221
+ "step": 120
222
+ },
223
+ {
224
+ "epoch": 0.2981651376146789,
225
+ "grad_norm": 25.16692077643166,
226
+ "learning_rate": 4.429344633468004e-07,
227
+ "logits/chosen": 0.31375107169151306,
228
+ "logits/rejected": 0.7534220814704895,
229
+ "logps/chosen": -333.3154602050781,
230
+ "logps/rejected": -383.91571044921875,
231
+ "loss": 0.5276,
232
+ "rewards/accuracies": 0.706250011920929,
233
+ "rewards/chosen": -0.756668746471405,
234
+ "rewards/margins": 0.7230855226516724,
235
+ "rewards/rejected": -1.479754090309143,
236
+ "step": 130
237
+ },
238
+ {
239
+ "epoch": 0.3211009174311927,
240
+ "grad_norm": 21.8628335306802,
241
+ "learning_rate": 4.2958733752443187e-07,
242
+ "logits/chosen": 0.12007780373096466,
243
+ "logits/rejected": 0.32281678915023804,
244
+ "logps/chosen": -341.0147705078125,
245
+ "logps/rejected": -368.9652404785156,
246
+ "loss": 0.5337,
247
+ "rewards/accuracies": 0.71875,
248
+ "rewards/chosen": -0.6602333188056946,
249
+ "rewards/margins": 0.6314884424209595,
250
+ "rewards/rejected": -1.2917217016220093,
251
+ "step": 140
252
+ },
253
+ {
254
+ "epoch": 0.3440366972477064,
255
+ "grad_norm": 26.496344548145927,
256
+ "learning_rate": 4.150873668617898e-07,
257
+ "logits/chosen": 0.6530567407608032,
258
+ "logits/rejected": 1.3690438270568848,
259
+ "logps/chosen": -375.08856201171875,
260
+ "logps/rejected": -408.30181884765625,
261
+ "loss": 0.5328,
262
+ "rewards/accuracies": 0.731249988079071,
263
+ "rewards/chosen": -0.8673305511474609,
264
+ "rewards/margins": 0.7217624187469482,
265
+ "rewards/rejected": -1.5890929698944092,
266
+ "step": 150
267
+ },
268
+ {
269
+ "epoch": 0.3440366972477064,
270
+ "eval_logits/chosen": -0.10693416744470596,
271
+ "eval_logits/rejected": 0.6796554327011108,
272
+ "eval_logps/chosen": -340.38214111328125,
273
+ "eval_logps/rejected": -367.83489990234375,
274
+ "eval_loss": 0.5237244367599487,
275
+ "eval_rewards/accuracies": 0.7155172228813171,
276
+ "eval_rewards/chosen": -0.5529204607009888,
277
+ "eval_rewards/margins": 0.6602039933204651,
278
+ "eval_rewards/rejected": -1.2131245136260986,
279
+ "eval_runtime": 95.5215,
280
+ "eval_samples_per_second": 19.032,
281
+ "eval_steps_per_second": 0.304,
282
+ "step": 150
283
+ },
284
+ {
285
+ "epoch": 0.3669724770642202,
286
+ "grad_norm": 22.090085860189564,
287
+ "learning_rate": 3.9952763262280397e-07,
288
+ "logits/chosen": -0.6242966651916504,
289
+ "logits/rejected": 0.1017322987318039,
290
+ "logps/chosen": -337.90997314453125,
291
+ "logps/rejected": -389.50592041015625,
292
+ "loss": 0.5376,
293
+ "rewards/accuracies": 0.7124999761581421,
294
+ "rewards/chosen": -0.4958871901035309,
295
+ "rewards/margins": 0.6162235140800476,
296
+ "rewards/rejected": -1.1121107339859009,
297
+ "step": 160
298
+ },
299
+ {
300
+ "epoch": 0.38990825688073394,
301
+ "grad_norm": 29.49090849875655,
302
+ "learning_rate": 3.8300801912883414e-07,
303
+ "logits/chosen": 0.3189530670642853,
304
+ "logits/rejected": 0.7786465287208557,
305
+ "logps/chosen": -349.08013916015625,
306
+ "logps/rejected": -422.4642639160156,
307
+ "loss": 0.5223,
308
+ "rewards/accuracies": 0.6875,
309
+ "rewards/chosen": -0.786384642124176,
310
+ "rewards/margins": 0.660692036151886,
311
+ "rewards/rejected": -1.4470767974853516,
312
+ "step": 170
313
+ },
314
+ {
315
+ "epoch": 0.41284403669724773,
316
+ "grad_norm": 21.619610594454734,
317
+ "learning_rate": 3.6563457256020884e-07,
318
+ "logits/chosen": 0.5451809167861938,
319
+ "logits/rejected": 1.3867518901824951,
320
+ "logps/chosen": -417.93450927734375,
321
+ "logps/rejected": -453.64581298828125,
322
+ "loss": 0.5262,
323
+ "rewards/accuracies": 0.7749999761581421,
324
+ "rewards/chosen": -1.0858782529830933,
325
+ "rewards/margins": 0.799088180065155,
326
+ "rewards/rejected": -1.8849666118621826,
327
+ "step": 180
328
+ },
329
+ {
330
+ "epoch": 0.43577981651376146,
331
+ "grad_norm": 22.879012626203956,
332
+ "learning_rate": 3.475188202022617e-07,
333
+ "logits/chosen": 0.8325613141059875,
334
+ "logits/rejected": 1.527114748954773,
335
+ "logps/chosen": -372.27227783203125,
336
+ "logps/rejected": -433.6874084472656,
337
+ "loss": 0.5396,
338
+ "rewards/accuracies": 0.71875,
339
+ "rewards/chosen": -0.9668887853622437,
340
+ "rewards/margins": 0.787070095539093,
341
+ "rewards/rejected": -1.7539589405059814,
342
+ "step": 190
343
+ },
344
+ {
345
+ "epoch": 0.45871559633027525,
346
+ "grad_norm": 19.71033831580365,
347
+ "learning_rate": 3.287770545059052e-07,
348
+ "logits/chosen": 0.25724777579307556,
349
+ "logits/rejected": 1.0539257526397705,
350
+ "logps/chosen": -339.7835388183594,
351
+ "logps/rejected": -395.4685363769531,
352
+ "loss": 0.5339,
353
+ "rewards/accuracies": 0.75,
354
+ "rewards/chosen": -0.7703949809074402,
355
+ "rewards/margins": 0.7419241070747375,
356
+ "rewards/rejected": -1.5123189687728882,
357
+ "step": 200
358
+ },
359
+ {
360
+ "epoch": 0.45871559633027525,
361
+ "eval_logits/chosen": 0.1333327293395996,
362
+ "eval_logits/rejected": 1.1592669486999512,
363
+ "eval_logps/chosen": -372.03033447265625,
364
+ "eval_logps/rejected": -410.098388671875,
365
+ "eval_loss": 0.5135313272476196,
366
+ "eval_rewards/accuracies": 0.7284482717514038,
367
+ "eval_rewards/chosen": -0.8694021701812744,
368
+ "eval_rewards/margins": 0.7663572430610657,
369
+ "eval_rewards/rejected": -1.6357594728469849,
370
+ "eval_runtime": 94.9597,
371
+ "eval_samples_per_second": 19.145,
372
+ "eval_steps_per_second": 0.305,
373
+ "step": 200
374
+ },
375
+ {
376
+ "epoch": 0.481651376146789,
377
+ "grad_norm": 24.166340722703165,
378
+ "learning_rate": 3.0952958655864954e-07,
379
+ "logits/chosen": -0.02151186764240265,
380
+ "logits/rejected": 0.7990398406982422,
381
+ "logps/chosen": -386.0631408691406,
382
+ "logps/rejected": -439.3416442871094,
383
+ "loss": 0.5165,
384
+ "rewards/accuracies": 0.7250000238418579,
385
+ "rewards/chosen": -0.859545111656189,
386
+ "rewards/margins": 0.791865348815918,
387
+ "rewards/rejected": -1.651410460472107,
388
+ "step": 210
389
+ },
390
+ {
391
+ "epoch": 0.5045871559633027,
392
+ "grad_norm": 29.831111320846208,
393
+ "learning_rate": 2.898999737583448e-07,
394
+ "logits/chosen": 0.28069013357162476,
395
+ "logits/rejected": 0.9275471568107605,
396
+ "logps/chosen": -352.3778076171875,
397
+ "logps/rejected": -410.13238525390625,
398
+ "loss": 0.5119,
399
+ "rewards/accuracies": 0.6937500238418579,
400
+ "rewards/chosen": -0.7737405300140381,
401
+ "rewards/margins": 0.7621678113937378,
402
+ "rewards/rejected": -1.5359084606170654,
403
+ "step": 220
404
+ },
405
+ {
406
+ "epoch": 0.5275229357798165,
407
+ "grad_norm": 27.12612279862232,
408
+ "learning_rate": 2.7001422664752333e-07,
409
+ "logits/chosen": 0.31911998987197876,
410
+ "logits/rejected": 1.4982731342315674,
411
+ "logps/chosen": -380.7137145996094,
412
+ "logps/rejected": -428.25030517578125,
413
+ "loss": 0.5326,
414
+ "rewards/accuracies": 0.78125,
415
+ "rewards/chosen": -0.8266562223434448,
416
+ "rewards/margins": 0.8594551086425781,
417
+ "rewards/rejected": -1.6861114501953125,
418
+ "step": 230
419
+ },
420
+ {
421
+ "epoch": 0.5504587155963303,
422
+ "grad_norm": 30.270434326846676,
423
+ "learning_rate": 2.5e-07,
424
+ "logits/chosen": 0.08341093361377716,
425
+ "logits/rejected": 0.8582345843315125,
426
+ "logps/chosen": -345.74591064453125,
427
+ "logps/rejected": -403.52093505859375,
428
+ "loss": 0.5195,
429
+ "rewards/accuracies": 0.699999988079071,
430
+ "rewards/chosen": -0.8471673727035522,
431
+ "rewards/margins": 0.6442986726760864,
432
+ "rewards/rejected": -1.4914662837982178,
433
+ "step": 240
434
+ },
435
+ {
436
+ "epoch": 0.573394495412844,
437
+ "grad_norm": 24.62506692170475,
438
+ "learning_rate": 2.2998577335247667e-07,
439
+ "logits/chosen": 0.6792846918106079,
440
+ "logits/rejected": 1.6594598293304443,
441
+ "logps/chosen": -357.1456298828125,
442
+ "logps/rejected": -407.39794921875,
443
+ "loss": 0.5206,
444
+ "rewards/accuracies": 0.706250011920929,
445
+ "rewards/chosen": -0.9902165532112122,
446
+ "rewards/margins": 0.7551018595695496,
447
+ "rewards/rejected": -1.7453181743621826,
448
+ "step": 250
449
+ },
450
+ {
451
+ "epoch": 0.573394495412844,
452
+ "eval_logits/chosen": 1.0596588850021362,
453
+ "eval_logits/rejected": 2.125216007232666,
454
+ "eval_logps/chosen": -399.8708190917969,
455
+ "eval_logps/rejected": -448.6092529296875,
456
+ "eval_loss": 0.5051079988479614,
457
+ "eval_rewards/accuracies": 0.7456896305084229,
458
+ "eval_rewards/chosen": -1.1478071212768555,
459
+ "eval_rewards/margins": 0.8730602860450745,
460
+ "eval_rewards/rejected": -2.020867347717285,
461
+ "eval_runtime": 95.375,
462
+ "eval_samples_per_second": 19.062,
463
+ "eval_steps_per_second": 0.304,
464
+ "step": 250
465
+ },
466
+ {
467
+ "epoch": 0.5963302752293578,
468
+ "grad_norm": 25.512119180845332,
469
+ "learning_rate": 2.1010002624165524e-07,
470
+ "logits/chosen": 1.0334103107452393,
471
+ "logits/rejected": 1.9036356210708618,
472
+ "logps/chosen": -393.27923583984375,
473
+ "logps/rejected": -442.31634521484375,
474
+ "loss": 0.5225,
475
+ "rewards/accuracies": 0.7437499761581421,
476
+ "rewards/chosen": -1.1329892873764038,
477
+ "rewards/margins": 0.6801810264587402,
478
+ "rewards/rejected": -1.8131701946258545,
479
+ "step": 260
480
+ },
481
+ {
482
+ "epoch": 0.6192660550458715,
483
+ "grad_norm": 22.529726863175615,
484
+ "learning_rate": 1.9047041344135043e-07,
485
+ "logits/chosen": 0.06802092492580414,
486
+ "logits/rejected": 0.6059505343437195,
487
+ "logps/chosen": -379.96771240234375,
488
+ "logps/rejected": -420.07244873046875,
489
+ "loss": 0.5157,
490
+ "rewards/accuracies": 0.6312500238418579,
491
+ "rewards/chosen": -0.902859091758728,
492
+ "rewards/margins": 0.6471339464187622,
493
+ "rewards/rejected": -1.5499929189682007,
494
+ "step": 270
495
+ },
496
+ {
497
+ "epoch": 0.6422018348623854,
498
+ "grad_norm": 29.61705869960123,
499
+ "learning_rate": 1.7122294549409482e-07,
500
+ "logits/chosen": 0.36484724283218384,
501
+ "logits/rejected": 1.1159727573394775,
502
+ "logps/chosen": -378.2865905761719,
503
+ "logps/rejected": -446.976318359375,
504
+ "loss": 0.5034,
505
+ "rewards/accuracies": 0.6875,
506
+ "rewards/chosen": -0.9187177419662476,
507
+ "rewards/margins": 0.7557224035263062,
508
+ "rewards/rejected": -1.6744401454925537,
509
+ "step": 280
510
+ },
511
+ {
512
+ "epoch": 0.6651376146788991,
513
+ "grad_norm": 32.55159666302868,
514
+ "learning_rate": 1.524811797977383e-07,
515
+ "logits/chosen": 0.4420185089111328,
516
+ "logits/rejected": 1.4995023012161255,
517
+ "logps/chosen": -382.71185302734375,
518
+ "logps/rejected": -434.3760681152344,
519
+ "loss": 0.5136,
520
+ "rewards/accuracies": 0.75,
521
+ "rewards/chosen": -0.9455305933952332,
522
+ "rewards/margins": 0.9165040254592896,
523
+ "rewards/rejected": -1.862034559249878,
524
+ "step": 290
525
+ },
526
+ {
527
+ "epoch": 0.6880733944954128,
528
+ "grad_norm": 23.09866796839295,
529
+ "learning_rate": 1.3436542743979125e-07,
530
+ "logits/chosen": 0.40113481879234314,
531
+ "logits/rejected": 1.5598547458648682,
532
+ "logps/chosen": -423.24615478515625,
533
+ "logps/rejected": -433.6683044433594,
534
+ "loss": 0.5161,
535
+ "rewards/accuracies": 0.7437499761581421,
536
+ "rewards/chosen": -1.1671854257583618,
537
+ "rewards/margins": 0.6783391237258911,
538
+ "rewards/rejected": -1.845524549484253,
539
+ "step": 300
540
+ },
541
+ {
542
+ "epoch": 0.6880733944954128,
543
+ "eval_logits/chosen": 0.468977689743042,
544
+ "eval_logits/rejected": 1.6077337265014648,
545
+ "eval_logps/chosen": -392.01226806640625,
546
+ "eval_logps/rejected": -439.55352783203125,
547
+ "eval_loss": 0.4995412528514862,
548
+ "eval_rewards/accuracies": 0.7413793206214905,
549
+ "eval_rewards/chosen": -1.0692216157913208,
550
+ "eval_rewards/margins": 0.8610891103744507,
551
+ "eval_rewards/rejected": -1.9303104877471924,
552
+ "eval_runtime": 95.9852,
553
+ "eval_samples_per_second": 18.94,
554
+ "eval_steps_per_second": 0.302,
555
+ "step": 300
556
+ },
557
+ {
558
+ "epoch": 0.7110091743119266,
559
+ "grad_norm": 23.93680772701649,
560
+ "learning_rate": 1.1699198087116588e-07,
561
+ "logits/chosen": 0.4794091284275055,
562
+ "logits/rejected": 1.1960773468017578,
563
+ "logps/chosen": -373.32708740234375,
564
+ "logps/rejected": -457.6255798339844,
565
+ "loss": 0.4872,
566
+ "rewards/accuracies": 0.768750011920929,
567
+ "rewards/chosen": -0.9246217012405396,
568
+ "rewards/margins": 0.8957231640815735,
569
+ "rewards/rejected": -1.8203446865081787,
570
+ "step": 310
571
+ },
572
+ {
573
+ "epoch": 0.7339449541284404,
574
+ "grad_norm": 33.75818947226454,
575
+ "learning_rate": 1.00472367377196e-07,
576
+ "logits/chosen": 0.50605309009552,
577
+ "logits/rejected": 1.3813035488128662,
578
+ "logps/chosen": -395.261474609375,
579
+ "logps/rejected": -462.6114196777344,
580
+ "loss": 0.4904,
581
+ "rewards/accuracies": 0.75,
582
+ "rewards/chosen": -1.1317824125289917,
583
+ "rewards/margins": 0.8041917085647583,
584
+ "rewards/rejected": -1.93597412109375,
585
+ "step": 320
586
+ },
587
+ {
588
+ "epoch": 0.7568807339449541,
589
+ "grad_norm": 26.64078079133333,
590
+ "learning_rate": 8.49126331382102e-08,
591
+ "logits/chosen": 0.7998399138450623,
592
+ "logits/rejected": 1.5950431823730469,
593
+ "logps/chosen": -358.0443420410156,
594
+ "logps/rejected": -450.14898681640625,
595
+ "loss": 0.4977,
596
+ "rewards/accuracies": 0.731249988079071,
597
+ "rewards/chosen": -1.0693919658660889,
598
+ "rewards/margins": 0.917527973651886,
599
+ "rewards/rejected": -1.9869201183319092,
600
+ "step": 330
601
+ },
602
+ {
603
+ "epoch": 0.7798165137614679,
604
+ "grad_norm": 25.809036872331053,
605
+ "learning_rate": 7.041266247556812e-08,
606
+ "logits/chosen": 0.4978507161140442,
607
+ "logits/rejected": 1.611681580543518,
608
+ "logps/chosen": -386.6365661621094,
609
+ "logps/rejected": -486.8155212402344,
610
+ "loss": 0.4861,
611
+ "rewards/accuracies": 0.800000011920929,
612
+ "rewards/chosen": -0.9396215677261353,
613
+ "rewards/margins": 1.1603189706802368,
614
+ "rewards/rejected": -2.099940538406372,
615
+ "step": 340
616
+ },
617
+ {
618
+ "epoch": 0.8027522935779816,
619
+ "grad_norm": 25.432021542271517,
620
+ "learning_rate": 5.706553665319955e-08,
621
+ "logits/chosen": 0.37401479482650757,
622
+ "logits/rejected": 1.426959753036499,
623
+ "logps/chosen": -393.84173583984375,
624
+ "logps/rejected": -441.216796875,
625
+ "loss": 0.5113,
626
+ "rewards/accuracies": 0.7562500238418579,
627
+ "rewards/chosen": -1.0162795782089233,
628
+ "rewards/margins": 0.8663312792778015,
629
+ "rewards/rejected": -1.8826109170913696,
630
+ "step": 350
631
+ },
632
+ {
633
+ "epoch": 0.8027522935779816,
634
+ "eval_logits/chosen": 0.28273966908454895,
635
+ "eval_logits/rejected": 1.4599332809448242,
636
+ "eval_logps/chosen": -384.1430969238281,
637
+ "eval_logps/rejected": -436.3081359863281,
638
+ "eval_loss": 0.4952593147754669,
639
+ "eval_rewards/accuracies": 0.7284482717514038,
640
+ "eval_rewards/chosen": -0.9905301332473755,
641
+ "eval_rewards/margins": 0.9073269963264465,
642
+ "eval_rewards/rejected": -1.8978571891784668,
643
+ "eval_runtime": 98.0468,
644
+ "eval_samples_per_second": 18.542,
645
+ "eval_steps_per_second": 0.296,
646
+ "step": 350
647
+ },
648
+ {
649
+ "epoch": 0.8256880733944955,
650
+ "grad_norm": 24.878023970166044,
651
+ "learning_rate": 4.4956936350761005e-08,
652
+ "logits/chosen": 0.011499330401420593,
653
+ "logits/rejected": 1.0976569652557373,
654
+ "logps/chosen": -381.95184326171875,
655
+ "logps/rejected": -447.99334716796875,
656
+ "loss": 0.4998,
657
+ "rewards/accuracies": 0.78125,
658
+ "rewards/chosen": -0.8815473318099976,
659
+ "rewards/margins": 0.9280586242675781,
660
+ "rewards/rejected": -1.8096059560775757,
661
+ "step": 360
662
+ },
663
+ {
664
+ "epoch": 0.8486238532110092,
665
+ "grad_norm": 27.48602383662752,
666
+ "learning_rate": 3.416459164418123e-08,
667
+ "logits/chosen": 0.2708785831928253,
668
+ "logits/rejected": 0.9552985429763794,
669
+ "logps/chosen": -397.5840148925781,
670
+ "logps/rejected": -466.92999267578125,
671
+ "loss": 0.4821,
672
+ "rewards/accuracies": 0.78125,
673
+ "rewards/chosen": -0.9426911473274231,
674
+ "rewards/margins": 0.8755933046340942,
675
+ "rewards/rejected": -1.8182843923568726,
676
+ "step": 370
677
+ },
678
+ {
679
+ "epoch": 0.8715596330275229,
680
+ "grad_norm": 22.77493294832679,
681
+ "learning_rate": 2.475778302439524e-08,
682
+ "logits/chosen": 0.3190244436264038,
683
+ "logits/rejected": 1.6189016103744507,
684
+ "logps/chosen": -393.47637939453125,
685
+ "logps/rejected": -460.20428466796875,
686
+ "loss": 0.5039,
687
+ "rewards/accuracies": 0.78125,
688
+ "rewards/chosen": -1.0399200916290283,
689
+ "rewards/margins": 0.9450514912605286,
690
+ "rewards/rejected": -1.9849714040756226,
691
+ "step": 380
692
+ },
693
+ {
694
+ "epoch": 0.8944954128440367,
695
+ "grad_norm": 21.22789089972862,
696
+ "learning_rate": 1.6796896657433805e-08,
697
+ "logits/chosen": 0.4111763834953308,
698
+ "logits/rejected": 1.2173652648925781,
699
+ "logps/chosen": -375.0528869628906,
700
+ "logps/rejected": -447.7088317871094,
701
+ "loss": 0.5199,
702
+ "rewards/accuracies": 0.8062499761581421,
703
+ "rewards/chosen": -0.9053140878677368,
704
+ "rewards/margins": 0.9649880528450012,
705
+ "rewards/rejected": -1.8703022003173828,
706
+ "step": 390
707
+ },
708
+ {
709
+ "epoch": 0.9174311926605505,
710
+ "grad_norm": 25.945198405726252,
711
+ "learning_rate": 1.0333036740834855e-08,
712
+ "logits/chosen": 0.3489355742931366,
713
+ "logits/rejected": 1.733727216720581,
714
+ "logps/chosen": -385.9686584472656,
715
+ "logps/rejected": -407.83514404296875,
716
+ "loss": 0.5006,
717
+ "rewards/accuracies": 0.7562500238418579,
718
+ "rewards/chosen": -0.9153203964233398,
719
+ "rewards/margins": 0.9544361233711243,
720
+ "rewards/rejected": -1.8697564601898193,
721
+ "step": 400
722
+ },
723
+ {
724
+ "epoch": 0.9174311926605505,
725
+ "eval_logits/chosen": 0.33653396368026733,
726
+ "eval_logits/rejected": 1.545719861984253,
727
+ "eval_logps/chosen": -384.1485290527344,
728
+ "eval_logps/rejected": -438.1469421386719,
729
+ "eval_loss": 0.49427667260169983,
730
+ "eval_rewards/accuracies": 0.732758641242981,
731
+ "eval_rewards/chosen": -0.9905844330787659,
732
+ "eval_rewards/margins": 0.9256603717803955,
733
+ "eval_rewards/rejected": -1.9162448644638062,
734
+ "eval_runtime": 95.1799,
735
+ "eval_samples_per_second": 19.101,
736
+ "eval_steps_per_second": 0.305,
737
+ "step": 400
738
+ },
739
+ {
740
+ "epoch": 0.9403669724770642,
741
+ "grad_norm": 29.414671515939542,
742
+ "learning_rate": 5.4076974448211685e-09,
743
+ "logits/chosen": 0.3652718663215637,
744
+ "logits/rejected": 1.4336962699890137,
745
+ "logps/chosen": -407.9559326171875,
746
+ "logps/rejected": -456.4903259277344,
747
+ "loss": 0.486,
748
+ "rewards/accuracies": 0.7875000238418579,
749
+ "rewards/chosen": -0.9434130787849426,
750
+ "rewards/margins": 0.8941701650619507,
751
+ "rewards/rejected": -1.837583303451538,
752
+ "step": 410
753
+ },
754
+ {
755
+ "epoch": 0.963302752293578,
756
+ "grad_norm": 23.283884603445415,
757
+ "learning_rate": 2.052496544188487e-09,
758
+ "logits/chosen": 0.2882634997367859,
759
+ "logits/rejected": 1.3258743286132812,
760
+ "logps/chosen": -375.098876953125,
761
+ "logps/rejected": -462.24639892578125,
762
+ "loss": 0.4823,
763
+ "rewards/accuracies": 0.731249988079071,
764
+ "rewards/chosen": -1.0388104915618896,
765
+ "rewards/margins": 0.9280673861503601,
766
+ "rewards/rejected": -1.9668779373168945,
767
+ "step": 420
768
+ },
769
+ {
770
+ "epoch": 0.9862385321100917,
771
+ "grad_norm": 32.32447531053772,
772
+ "learning_rate": 2.889724508297886e-10,
773
+ "logits/chosen": 0.6452657580375671,
774
+ "logits/rejected": 1.3607791662216187,
775
+ "logps/chosen": -378.0513916015625,
776
+ "logps/rejected": -468.52301025390625,
777
+ "loss": 0.504,
778
+ "rewards/accuracies": 0.75,
779
+ "rewards/chosen": -1.1194813251495361,
780
+ "rewards/margins": 0.9433539509773254,
781
+ "rewards/rejected": -2.062835216522217,
782
+ "step": 430
783
+ },
784
+ {
785
+ "epoch": 1.0,
786
+ "step": 436,
787
+ "total_flos": 0.0,
788
+ "train_loss": 0.5444534902178914,
789
+ "train_runtime": 11753.731,
790
+ "train_samples_per_second": 4.744,
791
+ "train_steps_per_second": 0.037
792
+ }
793
+ ],
794
+ "logging_steps": 10,
795
+ "max_steps": 436,
796
+ "num_input_tokens_seen": 0,
797
+ "num_train_epochs": 1,
798
+ "save_steps": 100,
799
+ "stateful_callbacks": {
800
+ "TrainerControl": {
801
+ "args": {
802
+ "should_epoch_stop": false,
803
+ "should_evaluate": false,
804
+ "should_log": false,
805
+ "should_save": true,
806
+ "should_training_stop": true
807
+ },
808
+ "attributes": {}
809
+ }
810
+ },
811
+ "total_flos": 0.0,
812
+ "train_batch_size": 8,
813
+ "trial_name": null,
814
+ "trial_params": null
815
+ }