sfulay commited on
Commit
8e960b5
1 Parent(s): e8d5292

Model save

Browse files
README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: alignment-handbook/zephyr-7b-sft-full
4
+ tags:
5
+ - trl
6
+ - dpo
7
+ - generated_from_trainer
8
+ model-index:
9
+ - name: zephyr-7b-dpo-full-ultrabin-reward-scale-05
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ # zephyr-7b-dpo-full-ultrabin-reward-scale-05
17
+
18
+ This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on an unknown dataset.
19
+ It achieves the following results on the evaluation set:
20
+ - Loss: 0.5419
21
+ - Rewards/chosen: -2.0657
22
+ - Rewards/rejected: -3.5528
23
+ - Rewards/accuracies: 0.7812
24
+ - Rewards/margins: 1.4871
25
+ - Logps/rejected: -617.9430
26
+ - Logps/chosen: -469.2008
27
+ - Logits/rejected: 3.0322
28
+ - Logits/chosen: 2.1926
29
+
30
+ ## Model description
31
+
32
+ More information needed
33
+
34
+ ## Intended uses & limitations
35
+
36
+ More information needed
37
+
38
+ ## Training and evaluation data
39
+
40
+ More information needed
41
+
42
+ ## Training procedure
43
+
44
+ ### Training hyperparameters
45
+
46
+ The following hyperparameters were used during training:
47
+ - learning_rate: 5e-07
48
+ - train_batch_size: 8
49
+ - eval_batch_size: 8
50
+ - seed: 55
51
+ - distributed_type: multi-GPU
52
+ - num_devices: 8
53
+ - gradient_accumulation_steps: 2
54
+ - total_train_batch_size: 128
55
+ - total_eval_batch_size: 64
56
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
57
+ - lr_scheduler_type: cosine
58
+ - lr_scheduler_warmup_ratio: 0.1
59
+ - num_epochs: 1
60
+
61
+ ### Training results
62
+
63
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
64
+ |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
65
+ | 0.6746 | 0.1046 | 50 | 0.6514 | 0.0215 | -0.0844 | 0.6953 | 0.1059 | -271.1068 | -260.4849 | -2.5751 | -2.6121 |
66
+ | 0.5801 | 0.2092 | 100 | 0.5963 | -1.2413 | -2.0024 | 0.6914 | 0.7611 | -462.9021 | -386.7607 | 0.8478 | 0.5614 |
67
+ | 0.561 | 0.3138 | 150 | 0.5612 | -1.3516 | -2.3053 | 0.7422 | 0.9537 | -493.1910 | -397.7852 | 2.1227 | 1.6750 |
68
+ | 0.552 | 0.4184 | 200 | 0.5634 | -1.7910 | -3.0147 | 0.7539 | 1.2237 | -564.1274 | -441.7259 | 2.6771 | 2.0183 |
69
+ | 0.5367 | 0.5230 | 250 | 0.5404 | -1.6069 | -2.8715 | 0.7656 | 1.2646 | -549.8127 | -423.3247 | 2.8098 | 2.1736 |
70
+ | 0.5231 | 0.6276 | 300 | 0.5511 | -1.8243 | -3.2523 | 0.7656 | 1.4280 | -587.8877 | -445.0558 | 2.9864 | 2.2075 |
71
+ | 0.5092 | 0.7322 | 350 | 0.5402 | -1.9840 | -3.4024 | 0.7734 | 1.4184 | -602.9061 | -461.0307 | 2.8834 | 2.0946 |
72
+ | 0.5231 | 0.8368 | 400 | 0.5417 | -2.0950 | -3.5645 | 0.7812 | 1.4695 | -619.1116 | -472.1271 | 3.0542 | 2.2365 |
73
+ | 0.5232 | 0.9414 | 450 | 0.5419 | -2.0657 | -3.5528 | 0.7812 | 1.4871 | -617.9430 | -469.2008 | 3.0322 | 2.1926 |
74
+
75
+
76
+ ### Framework versions
77
+
78
+ - Transformers 4.44.0.dev0
79
+ - Pytorch 2.1.2
80
+ - Datasets 2.20.0
81
+ - Tokenizers 0.19.1
all_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.5571172007955767,
5
+ "train_runtime": 12724.9946,
6
+ "train_samples": 61134,
7
+ "train_samples_per_second": 4.804,
8
+ "train_steps_per_second": 0.038
9
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.44.0.dev0"
6
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.5571172007955767,
5
+ "train_runtime": 12724.9946,
6
+ "train_samples": 61134,
7
+ "train_samples_per_second": 4.804,
8
+ "train_steps_per_second": 0.038
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,891 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 1.0,
5
+ "eval_steps": 50,
6
+ "global_step": 478,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.02092050209205021,
13
+ "grad_norm": 5.176846146347138,
14
+ "learning_rate": 1.0416666666666667e-07,
15
+ "logits/chosen": -2.630842924118042,
16
+ "logits/rejected": -2.5769855976104736,
17
+ "logps/chosen": -288.64373779296875,
18
+ "logps/rejected": -275.88287353515625,
19
+ "loss": 0.6932,
20
+ "rewards/accuracies": 0.4937500059604645,
21
+ "rewards/chosen": 0.00047967396676540375,
22
+ "rewards/margins": 0.0007994862971827388,
23
+ "rewards/rejected": -0.0003198123595211655,
24
+ "step": 10
25
+ },
26
+ {
27
+ "epoch": 0.04184100418410042,
28
+ "grad_norm": 4.778746903378828,
29
+ "learning_rate": 2.0833333333333333e-07,
30
+ "logits/chosen": -2.6447551250457764,
31
+ "logits/rejected": -2.6132736206054688,
32
+ "logps/chosen": -293.56829833984375,
33
+ "logps/rejected": -259.22283935546875,
34
+ "loss": 0.6927,
35
+ "rewards/accuracies": 0.5687500238418579,
36
+ "rewards/chosen": 0.002531626494601369,
37
+ "rewards/margins": 0.0012841664720326662,
38
+ "rewards/rejected": 0.0012474602553993464,
39
+ "step": 20
40
+ },
41
+ {
42
+ "epoch": 0.06276150627615062,
43
+ "grad_norm": 4.484860702954145,
44
+ "learning_rate": 3.1249999999999997e-07,
45
+ "logits/chosen": -2.6660215854644775,
46
+ "logits/rejected": -2.589404821395874,
47
+ "logps/chosen": -294.7344970703125,
48
+ "logps/rejected": -287.273193359375,
49
+ "loss": 0.6908,
50
+ "rewards/accuracies": 0.625,
51
+ "rewards/chosen": 0.014913314953446388,
52
+ "rewards/margins": 0.008078203536570072,
53
+ "rewards/rejected": 0.006835112813860178,
54
+ "step": 30
55
+ },
56
+ {
57
+ "epoch": 0.08368200836820083,
58
+ "grad_norm": 4.460846213560835,
59
+ "learning_rate": 4.1666666666666667e-07,
60
+ "logits/chosen": -2.6350624561309814,
61
+ "logits/rejected": -2.5527660846710205,
62
+ "logps/chosen": -270.5862731933594,
63
+ "logps/rejected": -240.20895385742188,
64
+ "loss": 0.685,
65
+ "rewards/accuracies": 0.625,
66
+ "rewards/chosen": 0.033098004758358,
67
+ "rewards/margins": 0.02473345957696438,
68
+ "rewards/rejected": 0.008364550769329071,
69
+ "step": 40
70
+ },
71
+ {
72
+ "epoch": 0.10460251046025104,
73
+ "grad_norm": 5.258394523703879,
74
+ "learning_rate": 4.999733114418725e-07,
75
+ "logits/chosen": -2.5787136554718018,
76
+ "logits/rejected": -2.5705668926239014,
77
+ "logps/chosen": -264.2970886230469,
78
+ "logps/rejected": -246.346435546875,
79
+ "loss": 0.6746,
80
+ "rewards/accuracies": 0.6875,
81
+ "rewards/chosen": 0.01130986213684082,
82
+ "rewards/margins": 0.07325177639722824,
83
+ "rewards/rejected": -0.06194191053509712,
84
+ "step": 50
85
+ },
86
+ {
87
+ "epoch": 0.10460251046025104,
88
+ "eval_logits/chosen": -2.6120545864105225,
89
+ "eval_logits/rejected": -2.5751192569732666,
90
+ "eval_logps/chosen": -260.48486328125,
91
+ "eval_logps/rejected": -271.1068115234375,
92
+ "eval_loss": 0.6513926982879639,
93
+ "eval_rewards/accuracies": 0.6953125,
94
+ "eval_rewards/chosen": 0.021450327709317207,
95
+ "eval_rewards/margins": 0.10589740425348282,
96
+ "eval_rewards/rejected": -0.08444707095623016,
97
+ "eval_runtime": 104.2349,
98
+ "eval_samples_per_second": 19.187,
99
+ "eval_steps_per_second": 0.307,
100
+ "step": 50
101
+ },
102
+ {
103
+ "epoch": 0.12552301255230125,
104
+ "grad_norm": 6.387285250582947,
105
+ "learning_rate": 4.990398100856366e-07,
106
+ "logits/chosen": -2.5387370586395264,
107
+ "logits/rejected": -2.4989449977874756,
108
+ "logps/chosen": -267.24676513671875,
109
+ "logps/rejected": -259.34765625,
110
+ "loss": 0.6603,
111
+ "rewards/accuracies": 0.675000011920929,
112
+ "rewards/chosen": 0.0057476600632071495,
113
+ "rewards/margins": 0.13733819127082825,
114
+ "rewards/rejected": -0.13159053027629852,
115
+ "step": 60
116
+ },
117
+ {
118
+ "epoch": 0.14644351464435146,
119
+ "grad_norm": 10.713757122261297,
120
+ "learning_rate": 4.967775735898179e-07,
121
+ "logits/chosen": -2.6051666736602783,
122
+ "logits/rejected": -2.5439701080322266,
123
+ "logps/chosen": -302.8467712402344,
124
+ "logps/rejected": -307.2948303222656,
125
+ "loss": 0.6438,
126
+ "rewards/accuracies": 0.643750011920929,
127
+ "rewards/chosen": -0.17766565084457397,
128
+ "rewards/margins": 0.19495461881160736,
129
+ "rewards/rejected": -0.37262025475502014,
130
+ "step": 70
131
+ },
132
+ {
133
+ "epoch": 0.16736401673640167,
134
+ "grad_norm": 13.46181824781145,
135
+ "learning_rate": 4.931986719649298e-07,
136
+ "logits/chosen": -1.8741085529327393,
137
+ "logits/rejected": -1.8762668371200562,
138
+ "logps/chosen": -299.6753845214844,
139
+ "logps/rejected": -331.6051330566406,
140
+ "loss": 0.6154,
141
+ "rewards/accuracies": 0.59375,
142
+ "rewards/chosen": -0.40526852011680603,
143
+ "rewards/margins": 0.28667524456977844,
144
+ "rewards/rejected": -0.6919438242912292,
145
+ "step": 80
146
+ },
147
+ {
148
+ "epoch": 0.18828451882845187,
149
+ "grad_norm": 11.81447104504551,
150
+ "learning_rate": 4.883222001996351e-07,
151
+ "logits/chosen": -1.0113297700881958,
152
+ "logits/rejected": -0.8674372434616089,
153
+ "logps/chosen": -323.6530456542969,
154
+ "logps/rejected": -368.2422790527344,
155
+ "loss": 0.6023,
156
+ "rewards/accuracies": 0.6875,
157
+ "rewards/chosen": -0.5255545377731323,
158
+ "rewards/margins": 0.5487722158432007,
159
+ "rewards/rejected": -1.074326753616333,
160
+ "step": 90
161
+ },
162
+ {
163
+ "epoch": 0.20920502092050208,
164
+ "grad_norm": 20.253194674698854,
165
+ "learning_rate": 4.821741763807186e-07,
166
+ "logits/chosen": -0.3053513169288635,
167
+ "logits/rejected": 0.3383195400238037,
168
+ "logps/chosen": -372.4469299316406,
169
+ "logps/rejected": -375.91619873046875,
170
+ "loss": 0.5801,
171
+ "rewards/accuracies": 0.737500011920929,
172
+ "rewards/chosen": -0.784656286239624,
173
+ "rewards/margins": 0.7117811441421509,
174
+ "rewards/rejected": -1.4964375495910645,
175
+ "step": 100
176
+ },
177
+ {
178
+ "epoch": 0.20920502092050208,
179
+ "eval_logits/chosen": 0.5614331960678101,
180
+ "eval_logits/rejected": 0.8477897644042969,
181
+ "eval_logps/chosen": -386.7606506347656,
182
+ "eval_logps/rejected": -462.9020690917969,
183
+ "eval_loss": 0.5962740182876587,
184
+ "eval_rewards/accuracies": 0.69140625,
185
+ "eval_rewards/chosen": -1.241307258605957,
186
+ "eval_rewards/margins": 0.7610923647880554,
187
+ "eval_rewards/rejected": -2.0023996829986572,
188
+ "eval_runtime": 102.9081,
189
+ "eval_samples_per_second": 19.435,
190
+ "eval_steps_per_second": 0.311,
191
+ "step": 100
192
+ },
193
+ {
194
+ "epoch": 0.2301255230125523,
195
+ "grad_norm": 14.600851352900357,
196
+ "learning_rate": 4.747874028753375e-07,
197
+ "logits/chosen": 0.512942910194397,
198
+ "logits/rejected": 0.9520395398139954,
199
+ "logps/chosen": -399.2005920410156,
200
+ "logps/rejected": -453.3448791503906,
201
+ "loss": 0.5789,
202
+ "rewards/accuracies": 0.7250000238418579,
203
+ "rewards/chosen": -1.0446428060531616,
204
+ "rewards/margins": 0.7325394749641418,
205
+ "rewards/rejected": -1.7771823406219482,
206
+ "step": 110
207
+ },
208
+ {
209
+ "epoch": 0.2510460251046025,
210
+ "grad_norm": 18.76879266561917,
211
+ "learning_rate": 4.662012913161997e-07,
212
+ "logits/chosen": 0.7682480812072754,
213
+ "logits/rejected": 1.3250311613082886,
214
+ "logps/chosen": -378.8160095214844,
215
+ "logps/rejected": -432.3802795410156,
216
+ "loss": 0.5736,
217
+ "rewards/accuracies": 0.71875,
218
+ "rewards/chosen": -1.0223931074142456,
219
+ "rewards/margins": 0.9169514775276184,
220
+ "rewards/rejected": -1.9393446445465088,
221
+ "step": 120
222
+ },
223
+ {
224
+ "epoch": 0.2719665271966527,
225
+ "grad_norm": 15.069786999974797,
226
+ "learning_rate": 4.5646165232345103e-07,
227
+ "logits/chosen": 0.34132882952690125,
228
+ "logits/rejected": 0.8815134167671204,
229
+ "logps/chosen": -395.4140930175781,
230
+ "logps/rejected": -450.15667724609375,
231
+ "loss": 0.5599,
232
+ "rewards/accuracies": 0.706250011920929,
233
+ "rewards/chosen": -0.9794706106185913,
234
+ "rewards/margins": 0.8771657943725586,
235
+ "rewards/rejected": -1.856636643409729,
236
+ "step": 130
237
+ },
238
+ {
239
+ "epoch": 0.2928870292887029,
240
+ "grad_norm": 19.253522278201135,
241
+ "learning_rate": 4.456204510851956e-07,
242
+ "logits/chosen": 0.5900996327400208,
243
+ "logits/rejected": 1.196023941040039,
244
+ "logps/chosen": -394.09033203125,
245
+ "logps/rejected": -462.4150390625,
246
+ "loss": 0.5556,
247
+ "rewards/accuracies": 0.7250000238418579,
248
+ "rewards/chosen": -0.9378790855407715,
249
+ "rewards/margins": 0.9935140609741211,
250
+ "rewards/rejected": -1.931393027305603,
251
+ "step": 140
252
+ },
253
+ {
254
+ "epoch": 0.3138075313807531,
255
+ "grad_norm": 17.705226251621124,
256
+ "learning_rate": 4.337355301007335e-07,
257
+ "logits/chosen": 1.6807399988174438,
258
+ "logits/rejected": 2.5443928241729736,
259
+ "logps/chosen": -434.1851501464844,
260
+ "logps/rejected": -498.25689697265625,
261
+ "loss": 0.561,
262
+ "rewards/accuracies": 0.71875,
263
+ "rewards/chosen": -1.4313938617706299,
264
+ "rewards/margins": 0.8576656579971313,
265
+ "rewards/rejected": -2.289059638977051,
266
+ "step": 150
267
+ },
268
+ {
269
+ "epoch": 0.3138075313807531,
270
+ "eval_logits/chosen": 1.6750099658966064,
271
+ "eval_logits/rejected": 2.122723340988159,
272
+ "eval_logps/chosen": -397.78521728515625,
273
+ "eval_logps/rejected": -493.1910095214844,
274
+ "eval_loss": 0.5612272620201111,
275
+ "eval_rewards/accuracies": 0.7421875,
276
+ "eval_rewards/chosen": -1.351552963256836,
277
+ "eval_rewards/margins": 0.9537361860275269,
278
+ "eval_rewards/rejected": -2.3052892684936523,
279
+ "eval_runtime": 102.8554,
280
+ "eval_samples_per_second": 19.445,
281
+ "eval_steps_per_second": 0.311,
282
+ "step": 150
283
+ },
284
+ {
285
+ "epoch": 0.33472803347280333,
286
+ "grad_norm": 17.923487490122945,
287
+ "learning_rate": 4.2087030056579986e-07,
288
+ "logits/chosen": 1.7796026468276978,
289
+ "logits/rejected": 2.404552936553955,
290
+ "logps/chosen": -391.76666259765625,
291
+ "logps/rejected": -457.2339782714844,
292
+ "loss": 0.5577,
293
+ "rewards/accuracies": 0.7562500238418579,
294
+ "rewards/chosen": -1.2124366760253906,
295
+ "rewards/margins": 0.963293731212616,
296
+ "rewards/rejected": -2.1757304668426514,
297
+ "step": 160
298
+ },
299
+ {
300
+ "epoch": 0.35564853556485354,
301
+ "grad_norm": 22.93702368800413,
302
+ "learning_rate": 4.070934040463998e-07,
303
+ "logits/chosen": 2.2401363849639893,
304
+ "logits/rejected": 3.037454128265381,
305
+ "logps/chosen": -417.4873962402344,
306
+ "logps/rejected": -499.13262939453125,
307
+ "loss": 0.5345,
308
+ "rewards/accuracies": 0.71875,
309
+ "rewards/chosen": -1.4775826930999756,
310
+ "rewards/margins": 0.9278723001480103,
311
+ "rewards/rejected": -2.4054548740386963,
312
+ "step": 170
313
+ },
314
+ {
315
+ "epoch": 0.37656903765690375,
316
+ "grad_norm": 22.059850949763494,
317
+ "learning_rate": 3.9247834624635404e-07,
318
+ "logits/chosen": 2.308513879776001,
319
+ "logits/rejected": 3.129546642303467,
320
+ "logps/chosen": -479.5143127441406,
321
+ "logps/rejected": -558.447509765625,
322
+ "loss": 0.5486,
323
+ "rewards/accuracies": 0.7749999761581421,
324
+ "rewards/chosen": -1.7520792484283447,
325
+ "rewards/margins": 1.1397490501403809,
326
+ "rewards/rejected": -2.8918280601501465,
327
+ "step": 180
328
+ },
329
+ {
330
+ "epoch": 0.39748953974895396,
331
+ "grad_norm": 21.54003656099098,
332
+ "learning_rate": 3.7710310482256523e-07,
333
+ "logits/chosen": 1.515995979309082,
334
+ "logits/rejected": 2.3186123371124268,
335
+ "logps/chosen": -405.1570129394531,
336
+ "logps/rejected": -478.882080078125,
337
+ "loss": 0.531,
338
+ "rewards/accuracies": 0.6875,
339
+ "rewards/chosen": -1.5137742757797241,
340
+ "rewards/margins": 0.8394115567207336,
341
+ "rewards/rejected": -2.3531858921051025,
342
+ "step": 190
343
+ },
344
+ {
345
+ "epoch": 0.41841004184100417,
346
+ "grad_norm": 21.26002965858666,
347
+ "learning_rate": 3.610497133404795e-07,
348
+ "logits/chosen": 2.5295510292053223,
349
+ "logits/rejected": 3.3073439598083496,
350
+ "logps/chosen": -462.2782287597656,
351
+ "logps/rejected": -518.1025390625,
352
+ "loss": 0.552,
353
+ "rewards/accuracies": 0.6875,
354
+ "rewards/chosen": -2.019491672515869,
355
+ "rewards/margins": 0.858087420463562,
356
+ "rewards/rejected": -2.8775787353515625,
357
+ "step": 200
358
+ },
359
+ {
360
+ "epoch": 0.41841004184100417,
361
+ "eval_logits/chosen": 2.018301010131836,
362
+ "eval_logits/rejected": 2.6770527362823486,
363
+ "eval_logps/chosen": -441.72589111328125,
364
+ "eval_logps/rejected": -564.1273803710938,
365
+ "eval_loss": 0.563401997089386,
366
+ "eval_rewards/accuracies": 0.75390625,
367
+ "eval_rewards/chosen": -1.7909597158432007,
368
+ "eval_rewards/margins": 1.2236928939819336,
369
+ "eval_rewards/rejected": -3.0146522521972656,
370
+ "eval_runtime": 104.172,
371
+ "eval_samples_per_second": 19.199,
372
+ "eval_steps_per_second": 0.307,
373
+ "step": 200
374
+ },
375
+ {
376
+ "epoch": 0.4393305439330544,
377
+ "grad_norm": 21.451368111326985,
378
+ "learning_rate": 3.4440382358952115e-07,
379
+ "logits/chosen": 1.9598217010498047,
380
+ "logits/rejected": 2.9120290279388428,
381
+ "logps/chosen": -447.69476318359375,
382
+ "logps/rejected": -529.9307861328125,
383
+ "loss": 0.5413,
384
+ "rewards/accuracies": 0.731249988079071,
385
+ "rewards/chosen": -1.7580368518829346,
386
+ "rewards/margins": 0.9606950879096985,
387
+ "rewards/rejected": -2.7187318801879883,
388
+ "step": 210
389
+ },
390
+ {
391
+ "epoch": 0.4602510460251046,
392
+ "grad_norm": 20.160436671538378,
393
+ "learning_rate": 3.272542485937368e-07,
394
+ "logits/chosen": 1.119225025177002,
395
+ "logits/rejected": 1.9501537084579468,
396
+ "logps/chosen": -417.14422607421875,
397
+ "logps/rejected": -513.2694091796875,
398
+ "loss": 0.555,
399
+ "rewards/accuracies": 0.75,
400
+ "rewards/chosen": -1.3289070129394531,
401
+ "rewards/margins": 1.0918917655944824,
402
+ "rewards/rejected": -2.4207987785339355,
403
+ "step": 220
404
+ },
405
+ {
406
+ "epoch": 0.4811715481171548,
407
+ "grad_norm": 20.190647127947944,
408
+ "learning_rate": 3.096924887558854e-07,
409
+ "logits/chosen": 2.0996615886688232,
410
+ "logits/rejected": 3.000412940979004,
411
+ "logps/chosen": -453.54473876953125,
412
+ "logps/rejected": -568.1104125976562,
413
+ "loss": 0.5188,
414
+ "rewards/accuracies": 0.737500011920929,
415
+ "rewards/chosen": -1.7147760391235352,
416
+ "rewards/margins": 1.3763706684112549,
417
+ "rewards/rejected": -3.09114670753479,
418
+ "step": 230
419
+ },
420
+ {
421
+ "epoch": 0.502092050209205,
422
+ "grad_norm": 20.668039994089078,
423
+ "learning_rate": 2.9181224366319943e-07,
424
+ "logits/chosen": 3.0839335918426514,
425
+ "logits/rejected": 4.029969215393066,
426
+ "logps/chosen": -446.9358825683594,
427
+ "logps/rejected": -518.8782958984375,
428
+ "loss": 0.5411,
429
+ "rewards/accuracies": 0.699999988079071,
430
+ "rewards/chosen": -1.922851800918579,
431
+ "rewards/margins": 1.0858973264694214,
432
+ "rewards/rejected": -3.00874924659729,
433
+ "step": 240
434
+ },
435
+ {
436
+ "epoch": 0.5230125523012552,
437
+ "grad_norm": 21.91013779471945,
438
+ "learning_rate": 2.7370891215954565e-07,
439
+ "logits/chosen": 2.500542163848877,
440
+ "logits/rejected": 3.4184327125549316,
441
+ "logps/chosen": -480.619873046875,
442
+ "logps/rejected": -543.4830322265625,
443
+ "loss": 0.5367,
444
+ "rewards/accuracies": 0.75,
445
+ "rewards/chosen": -1.8631795644760132,
446
+ "rewards/margins": 1.1071960926055908,
447
+ "rewards/rejected": -2.9703755378723145,
448
+ "step": 250
449
+ },
450
+ {
451
+ "epoch": 0.5230125523012552,
452
+ "eval_logits/chosen": 2.173550844192505,
453
+ "eval_logits/rejected": 2.8097872734069824,
454
+ "eval_logps/chosen": -423.32470703125,
455
+ "eval_logps/rejected": -549.812744140625,
456
+ "eval_loss": 0.5404338836669922,
457
+ "eval_rewards/accuracies": 0.765625,
458
+ "eval_rewards/chosen": -1.6069477796554565,
459
+ "eval_rewards/margins": 1.2645587921142578,
460
+ "eval_rewards/rejected": -2.871506690979004,
461
+ "eval_runtime": 104.7771,
462
+ "eval_samples_per_second": 19.088,
463
+ "eval_steps_per_second": 0.305,
464
+ "step": 250
465
+ },
466
+ {
467
+ "epoch": 0.5439330543933054,
468
+ "grad_norm": 22.08171649508119,
469
+ "learning_rate": 2.55479083351317e-07,
470
+ "logits/chosen": 2.1905927658081055,
471
+ "logits/rejected": 3.057525873184204,
472
+ "logps/chosen": -468.181884765625,
473
+ "logps/rejected": -552.5390014648438,
474
+ "loss": 0.5252,
475
+ "rewards/accuracies": 0.793749988079071,
476
+ "rewards/chosen": -1.5366401672363281,
477
+ "rewards/margins": 1.3390202522277832,
478
+ "rewards/rejected": -2.8756604194641113,
479
+ "step": 260
480
+ },
481
+ {
482
+ "epoch": 0.5648535564853556,
483
+ "grad_norm": 18.888110325544,
484
+ "learning_rate": 2.3722002126275822e-07,
485
+ "logits/chosen": 2.343186140060425,
486
+ "logits/rejected": 3.283693790435791,
487
+ "logps/chosen": -480.1078186035156,
488
+ "logps/rejected": -560.9808349609375,
489
+ "loss": 0.5243,
490
+ "rewards/accuracies": 0.737500011920929,
491
+ "rewards/chosen": -1.8669660091400146,
492
+ "rewards/margins": 1.2250339984893799,
493
+ "rewards/rejected": -3.0920000076293945,
494
+ "step": 270
495
+ },
496
+ {
497
+ "epoch": 0.5857740585774058,
498
+ "grad_norm": 27.244776233826,
499
+ "learning_rate": 2.19029145890313e-07,
500
+ "logits/chosen": 2.048271656036377,
501
+ "logits/rejected": 2.768385887145996,
502
+ "logps/chosen": -433.47479248046875,
503
+ "logps/rejected": -546.029296875,
504
+ "loss": 0.5272,
505
+ "rewards/accuracies": 0.731249988079071,
506
+ "rewards/chosen": -1.7612041234970093,
507
+ "rewards/margins": 1.1764047145843506,
508
+ "rewards/rejected": -2.937608242034912,
509
+ "step": 280
510
+ },
511
+ {
512
+ "epoch": 0.606694560669456,
513
+ "grad_norm": 21.547725491962094,
514
+ "learning_rate": 2.0100351342479216e-07,
515
+ "logits/chosen": 2.24129056930542,
516
+ "logits/rejected": 2.8873372077941895,
517
+ "logps/chosen": -460.38690185546875,
518
+ "logps/rejected": -570.9521484375,
519
+ "loss": 0.523,
520
+ "rewards/accuracies": 0.7562500238418579,
521
+ "rewards/chosen": -1.6904579401016235,
522
+ "rewards/margins": 1.2296130657196045,
523
+ "rewards/rejected": -2.9200711250305176,
524
+ "step": 290
525
+ },
526
+ {
527
+ "epoch": 0.6276150627615062,
528
+ "grad_norm": 21.34579167577593,
529
+ "learning_rate": 1.8323929841460178e-07,
530
+ "logits/chosen": 2.917755603790283,
531
+ "logits/rejected": 3.5216128826141357,
532
+ "logps/chosen": -471.07965087890625,
533
+ "logps/rejected": -600.7293701171875,
534
+ "loss": 0.5231,
535
+ "rewards/accuracies": 0.737500011920929,
536
+ "rewards/chosen": -2.0175368785858154,
537
+ "rewards/margins": 1.2063273191452026,
538
+ "rewards/rejected": -3.2238643169403076,
539
+ "step": 300
540
+ },
541
+ {
542
+ "epoch": 0.6276150627615062,
543
+ "eval_logits/chosen": 2.2075393199920654,
544
+ "eval_logits/rejected": 2.986379384994507,
545
+ "eval_logps/chosen": -445.05584716796875,
546
+ "eval_logps/rejected": -587.8876953125,
547
+ "eval_loss": 0.5511458516120911,
548
+ "eval_rewards/accuracies": 0.765625,
549
+ "eval_rewards/chosen": -1.8242592811584473,
550
+ "eval_rewards/margins": 1.42799711227417,
551
+ "eval_rewards/rejected": -3.252256393432617,
552
+ "eval_runtime": 103.1559,
553
+ "eval_samples_per_second": 19.388,
554
+ "eval_steps_per_second": 0.31,
555
+ "step": 300
556
+ },
557
+ {
558
+ "epoch": 0.6485355648535565,
559
+ "grad_norm": 18.70711648745981,
560
+ "learning_rate": 1.6583128063291573e-07,
561
+ "logits/chosen": 1.9323720932006836,
562
+ "logits/rejected": 2.7933290004730225,
563
+ "logps/chosen": -479.4503479003906,
564
+ "logps/rejected": -588.09228515625,
565
+ "loss": 0.5303,
566
+ "rewards/accuracies": 0.7124999761581421,
567
+ "rewards/chosen": -1.8548892736434937,
568
+ "rewards/margins": 1.2552098035812378,
569
+ "rewards/rejected": -3.1100995540618896,
570
+ "step": 310
571
+ },
572
+ {
573
+ "epoch": 0.6694560669456067,
574
+ "grad_norm": 24.633583083411633,
575
+ "learning_rate": 1.488723393865766e-07,
576
+ "logits/chosen": 1.4087620973587036,
577
+ "logits/rejected": 2.465496778488159,
578
+ "logps/chosen": -487.93646240234375,
579
+ "logps/rejected": -566.53466796875,
580
+ "loss": 0.5063,
581
+ "rewards/accuracies": 0.7124999761581421,
582
+ "rewards/chosen": -1.9570667743682861,
583
+ "rewards/margins": 1.188623070716858,
584
+ "rewards/rejected": -3.1456902027130127,
585
+ "step": 320
586
+ },
587
+ {
588
+ "epoch": 0.6903765690376569,
589
+ "grad_norm": 20.927701473521733,
590
+ "learning_rate": 1.3245295796480788e-07,
591
+ "logits/chosen": 1.6705989837646484,
592
+ "logits/rejected": 2.787672758102417,
593
+ "logps/chosen": -478.430908203125,
594
+ "logps/rejected": -544.5813598632812,
595
+ "loss": 0.5178,
596
+ "rewards/accuracies": 0.6875,
597
+ "rewards/chosen": -1.948035478591919,
598
+ "rewards/margins": 1.0792946815490723,
599
+ "rewards/rejected": -3.027329921722412,
600
+ "step": 330
601
+ },
602
+ {
603
+ "epoch": 0.7112970711297071,
604
+ "grad_norm": 21.58725862881672,
605
+ "learning_rate": 1.1666074087171627e-07,
606
+ "logits/chosen": 2.2645626068115234,
607
+ "logits/rejected": 3.0073537826538086,
608
+ "logps/chosen": -433.1025390625,
609
+ "logps/rejected": -569.9879150390625,
610
+ "loss": 0.5331,
611
+ "rewards/accuracies": 0.7124999761581421,
612
+ "rewards/chosen": -1.8666340112686157,
613
+ "rewards/margins": 1.2376598119735718,
614
+ "rewards/rejected": -3.1042943000793457,
615
+ "step": 340
616
+ },
617
+ {
618
+ "epoch": 0.7322175732217573,
619
+ "grad_norm": 22.053584645202807,
620
+ "learning_rate": 1.0157994641835734e-07,
621
+ "logits/chosen": 2.3491084575653076,
622
+ "logits/rejected": 3.309586763381958,
623
+ "logps/chosen": -490.469970703125,
624
+ "logps/rejected": -593.7388916015625,
625
+ "loss": 0.5092,
626
+ "rewards/accuracies": 0.699999988079071,
627
+ "rewards/chosen": -2.0824320316314697,
628
+ "rewards/margins": 1.1687943935394287,
629
+ "rewards/rejected": -3.2512269020080566,
630
+ "step": 350
631
+ },
632
+ {
633
+ "epoch": 0.7322175732217573,
634
+ "eval_logits/chosen": 2.094608783721924,
635
+ "eval_logits/rejected": 2.88336443901062,
636
+ "eval_logps/chosen": -461.0306701660156,
637
+ "eval_logps/rejected": -602.9060668945312,
638
+ "eval_loss": 0.5401915311813354,
639
+ "eval_rewards/accuracies": 0.7734375,
640
+ "eval_rewards/chosen": -1.9840072393417358,
641
+ "eval_rewards/margins": 1.4184322357177734,
642
+ "eval_rewards/rejected": -3.402439594268799,
643
+ "eval_runtime": 104.739,
644
+ "eval_samples_per_second": 19.095,
645
+ "eval_steps_per_second": 0.306,
646
+ "step": 350
647
+ },
648
+ {
649
+ "epoch": 0.7531380753138075,
650
+ "grad_norm": 18.40926814309719,
651
+ "learning_rate": 8.729103716819111e-08,
652
+ "logits/chosen": 2.376570701599121,
653
+ "logits/rejected": 3.0961055755615234,
654
+ "logps/chosen": -463.7430725097656,
655
+ "logps/rejected": -592.4307861328125,
656
+ "loss": 0.5261,
657
+ "rewards/accuracies": 0.737500011920929,
658
+ "rewards/chosen": -2.168846368789673,
659
+ "rewards/margins": 1.2989672422409058,
660
+ "rewards/rejected": -3.467813491821289,
661
+ "step": 360
662
+ },
663
+ {
664
+ "epoch": 0.7740585774058577,
665
+ "grad_norm": 20.93465319965544,
666
+ "learning_rate": 7.387025063449081e-08,
667
+ "logits/chosen": 1.8760063648223877,
668
+ "logits/rejected": 3.0332190990448,
669
+ "logps/chosen": -485.4361267089844,
670
+ "logps/rejected": -611.39599609375,
671
+ "loss": 0.512,
672
+ "rewards/accuracies": 0.7124999761581421,
673
+ "rewards/chosen": -1.975916862487793,
674
+ "rewards/margins": 1.4137961864471436,
675
+ "rewards/rejected": -3.3897128105163574,
676
+ "step": 370
677
+ },
678
+ {
679
+ "epoch": 0.7949790794979079,
680
+ "grad_norm": 21.53735300502108,
681
+ "learning_rate": 6.138919252022435e-08,
682
+ "logits/chosen": 2.2272870540618896,
683
+ "logits/rejected": 2.9403111934661865,
684
+ "logps/chosen": -529.391845703125,
685
+ "logps/rejected": -598.6041870117188,
686
+ "loss": 0.5243,
687
+ "rewards/accuracies": 0.7124999761581421,
688
+ "rewards/chosen": -2.129279613494873,
689
+ "rewards/margins": 1.0328638553619385,
690
+ "rewards/rejected": -3.1621437072753906,
691
+ "step": 380
692
+ },
693
+ {
694
+ "epoch": 0.8158995815899581,
695
+ "grad_norm": 22.856314943850858,
696
+ "learning_rate": 4.991445467064689e-08,
697
+ "logits/chosen": 2.5150341987609863,
698
+ "logits/rejected": 3.433137893676758,
699
+ "logps/chosen": -523.1361694335938,
700
+ "logps/rejected": -647.6205444335938,
701
+ "loss": 0.5212,
702
+ "rewards/accuracies": 0.762499988079071,
703
+ "rewards/chosen": -2.1657605171203613,
704
+ "rewards/margins": 1.4086949825286865,
705
+ "rewards/rejected": -3.5744547843933105,
706
+ "step": 390
707
+ },
708
+ {
709
+ "epoch": 0.8368200836820083,
710
+ "grad_norm": 20.459584660793766,
711
+ "learning_rate": 3.9507259776993954e-08,
712
+ "logits/chosen": 2.373227596282959,
713
+ "logits/rejected": 3.0179686546325684,
714
+ "logps/chosen": -520.0594482421875,
715
+ "logps/rejected": -614.9906005859375,
716
+ "loss": 0.5231,
717
+ "rewards/accuracies": 0.71875,
718
+ "rewards/chosen": -2.2612462043762207,
719
+ "rewards/margins": 1.1028645038604736,
720
+ "rewards/rejected": -3.3641109466552734,
721
+ "step": 400
722
+ },
723
+ {
724
+ "epoch": 0.8368200836820083,
725
+ "eval_logits/chosen": 2.2364866733551025,
726
+ "eval_logits/rejected": 3.0541534423828125,
727
+ "eval_logps/chosen": -472.12713623046875,
728
+ "eval_logps/rejected": -619.1116333007812,
729
+ "eval_loss": 0.5417460799217224,
730
+ "eval_rewards/accuracies": 0.78125,
731
+ "eval_rewards/chosen": -2.0949723720550537,
732
+ "eval_rewards/margins": 1.4695227146148682,
733
+ "eval_rewards/rejected": -3.564495086669922,
734
+ "eval_runtime": 103.2544,
735
+ "eval_samples_per_second": 19.37,
736
+ "eval_steps_per_second": 0.31,
737
+ "step": 400
738
+ },
739
+ {
740
+ "epoch": 0.8577405857740585,
741
+ "grad_norm": 20.75509691260348,
742
+ "learning_rate": 3.022313472693447e-08,
743
+ "logits/chosen": 1.8815155029296875,
744
+ "logits/rejected": 2.752415657043457,
745
+ "logps/chosen": -530.4354248046875,
746
+ "logps/rejected": -645.202392578125,
747
+ "loss": 0.514,
748
+ "rewards/accuracies": 0.8062499761581421,
749
+ "rewards/chosen": -2.134329319000244,
750
+ "rewards/margins": 1.3350379467010498,
751
+ "rewards/rejected": -3.469367504119873,
752
+ "step": 410
753
+ },
754
+ {
755
+ "epoch": 0.8786610878661087,
756
+ "grad_norm": 21.72917985290927,
757
+ "learning_rate": 2.2111614344599684e-08,
758
+ "logits/chosen": 2.2683472633361816,
759
+ "logits/rejected": 3.545788526535034,
760
+ "logps/chosen": -509.83563232421875,
761
+ "logps/rejected": -617.6029052734375,
762
+ "loss": 0.4959,
763
+ "rewards/accuracies": 0.75,
764
+ "rewards/chosen": -2.1364758014678955,
765
+ "rewards/margins": 1.329615831375122,
766
+ "rewards/rejected": -3.4660911560058594,
767
+ "step": 420
768
+ },
769
+ {
770
+ "epoch": 0.899581589958159,
771
+ "grad_norm": 20.573623570397274,
772
+ "learning_rate": 1.521597710086439e-08,
773
+ "logits/chosen": 2.243511438369751,
774
+ "logits/rejected": 3.2753005027770996,
775
+ "logps/chosen": -517.85498046875,
776
+ "logps/rejected": -601.6973876953125,
777
+ "loss": 0.5284,
778
+ "rewards/accuracies": 0.7124999761581421,
779
+ "rewards/chosen": -2.218738317489624,
780
+ "rewards/margins": 1.2048834562301636,
781
+ "rewards/rejected": -3.4236221313476562,
782
+ "step": 430
783
+ },
784
+ {
785
+ "epoch": 0.9205020920502092,
786
+ "grad_norm": 17.236420967349105,
787
+ "learning_rate": 9.57301420397924e-09,
788
+ "logits/chosen": 2.407790422439575,
789
+ "logits/rejected": 3.659886598587036,
790
+ "logps/chosen": -511.99688720703125,
791
+ "logps/rejected": -602.4998168945312,
792
+ "loss": 0.5239,
793
+ "rewards/accuracies": 0.7124999761581421,
794
+ "rewards/chosen": -2.2972302436828613,
795
+ "rewards/margins": 1.267395257949829,
796
+ "rewards/rejected": -3.5646255016326904,
797
+ "step": 440
798
+ },
799
+ {
800
+ "epoch": 0.9414225941422594,
801
+ "grad_norm": 18.669490237353997,
802
+ "learning_rate": 5.212833302556258e-09,
803
+ "logits/chosen": 2.2820138931274414,
804
+ "logits/rejected": 3.1481757164001465,
805
+ "logps/chosen": -492.22052001953125,
806
+ "logps/rejected": -632.2754516601562,
807
+ "loss": 0.5232,
808
+ "rewards/accuracies": 0.768750011920929,
809
+ "rewards/chosen": -2.074744462966919,
810
+ "rewards/margins": 1.4727141857147217,
811
+ "rewards/rejected": -3.5474586486816406,
812
+ "step": 450
813
+ },
814
+ {
815
+ "epoch": 0.9414225941422594,
816
+ "eval_logits/chosen": 2.1926114559173584,
817
+ "eval_logits/rejected": 3.032156467437744,
818
+ "eval_logps/chosen": -469.2007751464844,
819
+ "eval_logps/rejected": -617.9429931640625,
820
+ "eval_loss": 0.5418744087219238,
821
+ "eval_rewards/accuracies": 0.78125,
822
+ "eval_rewards/chosen": -2.0657081604003906,
823
+ "eval_rewards/margins": 1.487100601196289,
824
+ "eval_rewards/rejected": -3.552809000015259,
825
+ "eval_runtime": 103.486,
826
+ "eval_samples_per_second": 19.326,
827
+ "eval_steps_per_second": 0.309,
828
+ "step": 450
829
+ },
830
+ {
831
+ "epoch": 0.9623430962343096,
832
+ "grad_norm": 21.564710597734962,
833
+ "learning_rate": 2.158697848236607e-09,
834
+ "logits/chosen": 2.6190943717956543,
835
+ "logits/rejected": 3.548638105392456,
836
+ "logps/chosen": -499.64208984375,
837
+ "logps/rejected": -623.9893188476562,
838
+ "loss": 0.5095,
839
+ "rewards/accuracies": 0.7250000238418579,
840
+ "rewards/chosen": -2.363645076751709,
841
+ "rewards/margins": 1.3193647861480713,
842
+ "rewards/rejected": -3.683009624481201,
843
+ "step": 460
844
+ },
845
+ {
846
+ "epoch": 0.9832635983263598,
847
+ "grad_norm": 17.345861357972396,
848
+ "learning_rate": 4.269029751107489e-10,
849
+ "logits/chosen": 2.4917562007904053,
850
+ "logits/rejected": 2.865358591079712,
851
+ "logps/chosen": -476.9666442871094,
852
+ "logps/rejected": -609.1448974609375,
853
+ "loss": 0.5372,
854
+ "rewards/accuracies": 0.7124999761581421,
855
+ "rewards/chosen": -2.138411045074463,
856
+ "rewards/margins": 1.2895748615264893,
857
+ "rewards/rejected": -3.427985668182373,
858
+ "step": 470
859
+ },
860
+ {
861
+ "epoch": 1.0,
862
+ "step": 478,
863
+ "total_flos": 0.0,
864
+ "train_loss": 0.5571172007955767,
865
+ "train_runtime": 12724.9946,
866
+ "train_samples_per_second": 4.804,
867
+ "train_steps_per_second": 0.038
868
+ }
869
+ ],
870
+ "logging_steps": 10,
871
+ "max_steps": 478,
872
+ "num_input_tokens_seen": 0,
873
+ "num_train_epochs": 1,
874
+ "save_steps": 100,
875
+ "stateful_callbacks": {
876
+ "TrainerControl": {
877
+ "args": {
878
+ "should_epoch_stop": false,
879
+ "should_evaluate": false,
880
+ "should_log": false,
881
+ "should_save": true,
882
+ "should_training_stop": true
883
+ },
884
+ "attributes": {}
885
+ }
886
+ },
887
+ "total_flos": 0.0,
888
+ "train_batch_size": 8,
889
+ "trial_name": null,
890
+ "trial_params": null
891
+ }