metadata

license: apache-2.0
base_model: alignment-handbook/zephyr-7b-sft-full
tags:
  - generated_from_trainer
model-index:
  - name: zephyr-7b-dpo-full
    results: []

zephyr-7b-dpo-full

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.6929
Rewards/chosen: -2.2624
Rewards/rejected: -5.6900
Rewards/accuracies: 0.7619
Rewards/margins: 3.4275
Logps/rejected: -348.8656
Logps/chosen: -389.8162
Logits/rejected: -2.8188
Logits/chosen: -2.8149

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.5504	0.1	100	0.5407	0.5287	-0.1810	0.7579	0.7098	-293.7762	-361.9044	-2.9360	-2.9366
0.541	0.21	200	0.5221	0.6692	-0.5569	0.7698	1.2261	-297.5352	-360.5003	-2.9786	-2.9802
0.6034	0.31	300	0.5459	0.7375	-0.4578	0.7619	1.1953	-296.5442	-359.8170	-3.0234	-3.0360
0.5944	0.41	400	0.5573	0.4979	-0.8938	0.7698	1.3917	-300.9036	-362.2126	-2.9639	-2.9621
0.5512	0.52	500	0.5257	0.4355	-1.0167	0.7579	1.4522	-302.1330	-362.8364	-3.0485	-3.0406
0.5879	0.62	600	0.5288	0.4707	-0.9291	0.7579	1.3998	-301.2572	-362.4848	-2.9911	-2.9869
0.6773	0.72	700	0.5853	0.0472	-0.9185	0.7460	0.9657	-301.1505	-366.7194	-3.0564	-3.0418
0.5263	0.83	800	0.5151	0.2246	-1.1914	0.7619	1.4160	-303.8796	-364.9458	-2.9662	-2.9637
0.5366	0.93	900	0.5134	0.2511	-1.0873	0.75	1.3384	-302.8385	-364.6808	-2.9824	-2.9907
0.1034	1.03	1000	0.5107	0.3073	-1.4321	0.7619	1.7394	-306.2867	-364.1185	-2.9096	-2.9202
0.1114	1.14	1100	0.5344	0.1332	-1.8449	0.7460	1.9781	-310.4148	-365.8598	-2.9561	-2.9666
0.1338	1.24	1200	0.5350	-0.0814	-2.1418	0.7738	2.0604	-313.3835	-368.0058	-2.9460	-2.9508
0.0979	1.34	1300	0.5474	-0.0945	-2.2500	0.7659	2.1554	-314.4657	-368.1371	-2.9172	-2.9201
0.1366	1.44	1400	0.5440	-0.4749	-2.3968	0.7579	1.9219	-315.9338	-371.9403	-2.9134	-2.9144
0.1042	1.55	1500	0.5524	-0.5014	-2.6803	0.7698	2.1789	-318.7686	-372.2054	-2.9361	-2.9306
0.1313	1.65	1600	0.5333	-0.2234	-2.1867	0.75	1.9634	-313.8333	-369.4255	-2.9060	-2.8999
0.1629	1.75	1700	0.5655	-0.3904	-2.7591	0.75	2.3687	-319.5572	-371.0959	-2.9182	-2.9096
0.0993	1.86	1800	0.5605	-0.7117	-2.9701	0.7460	2.2584	-321.6668	-374.3084	-2.8602	-2.8477
0.1116	1.96	1900	0.5649	-0.6379	-2.7259	0.7540	2.0880	-319.2250	-373.5707	-2.9277	-2.9150
0.0193	2.06	2000	0.6122	-0.9412	-3.7861	0.7619	2.8449	-329.8275	-376.6041	-2.8919	-2.8825
0.0175	2.17	2100	0.6523	-1.6027	-4.6832	0.7659	3.0805	-338.7977	-383.2186	-2.8474	-2.8393
0.0131	2.27	2200	0.6702	-1.8899	-5.0304	0.7421	3.1406	-342.2704	-386.0904	-2.8128	-2.8069
0.0243	2.37	2300	0.6559	-1.6715	-4.7369	0.7698	3.0654	-339.3347	-383.9066	-2.8547	-2.8490
0.0142	2.48	2400	0.6734	-1.9463	-5.1224	0.7579	3.1761	-343.1900	-386.6547	-2.8394	-2.8352
0.0211	2.58	2500	0.6890	-2.1114	-5.5608	0.7698	3.4494	-347.5744	-388.3059	-2.8369	-2.8333
0.011	2.68	2600	0.6999	-2.3020	-5.8073	0.7659	3.5053	-350.0389	-390.2114	-2.8299	-2.8258
0.0114	2.79	2700	0.6951	-2.2382	-5.6885	0.7698	3.4503	-348.8512	-389.5739	-2.8207	-2.8172
0.0437	2.89	2800	0.6911	-2.2294	-5.6156	0.7659	3.3861	-348.1217	-389.4860	-2.8151	-2.8117
0.0109	2.99	2900	0.6909	-2.2776	-5.6932	0.7659	3.4156	-348.8980	-389.9677	-2.8187	-2.8148

Framework versions

Transformers 4.35.0
Pytorch 2.1.1+cu121
Datasets 2.14.6
Tokenizers 0.14.1