metadata

license: apache-2.0
base_model: alignment-handbook/zephyr-7b-sft-full
tags:
  - generated_from_trainer
  - alignment-handbook
model-index:
  - name: zephyr-7b-dpo-full
    results: []

zephyr-7b-dpo-full

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 0.6910
Rewards/chosen: -3.9218
Rewards/rejected: -8.2942
Rewards/accuracies: 0.8125
Rewards/margins: 4.3724
Logps/rejected: -279.5480
Logps/chosen: -293.9998
Logits/rejected: -2.6725
Logits/chosen: -2.7826

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 2
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 32
total_train_batch_size: 64
total_eval_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.5386	0.1	100	0.5208	0.0564	-0.7521	0.7188	0.8085	-204.1269	-254.2179	-3.0136	-3.0550
0.4931	0.21	200	0.4882	-0.0132	-1.2683	0.7812	1.2551	-209.2889	-254.9136	-3.1056	-3.1407
0.479	0.31	300	0.5038	-0.1035	-1.4012	0.7812	1.2978	-210.6186	-255.8163	-3.0809	-3.1328
0.5052	0.41	400	0.5154	-0.1923	-1.8783	0.7969	1.6860	-215.3891	-256.7043	-2.9104	-2.9644
0.4513	0.52	500	0.4979	0.0207	-1.6562	0.7969	1.6769	-213.1682	-254.5742	-3.0061	-3.0657
0.4905	0.62	600	0.4907	-0.0944	-1.5847	0.7656	1.4903	-212.4527	-255.7256	-2.9374	-3.0170
0.5609	0.72	700	0.4928	-0.4249	-1.7238	0.7656	1.2989	-213.8441	-259.0304	-2.9475	-3.0128
0.5338	0.83	800	0.4767	-0.1567	-1.9114	0.8125	1.7547	-215.7200	-256.3484	-2.8455	-2.9183
0.5039	0.93	900	0.4854	-0.0886	-1.6900	0.75	1.6014	-213.5057	-255.6674	-2.8295	-2.9093
0.0776	1.03	1000	0.4938	-0.4848	-2.5287	0.7656	2.0438	-221.8927	-259.6300	-2.7580	-2.8437
0.0901	1.14	1100	0.5071	-1.0800	-3.2419	0.7812	2.1619	-229.0247	-265.5817	-2.8036	-2.8858
0.0828	1.24	1200	0.5159	-0.9682	-3.4087	0.7812	2.4406	-230.6935	-264.4635	-2.7961	-2.8708
0.0916	1.34	1300	0.5222	-1.0832	-3.5535	0.7969	2.4703	-232.1411	-265.6135	-2.8019	-2.8754
0.0965	1.44	1400	0.5204	-1.1951	-3.5681	0.7969	2.3731	-232.2874	-266.7324	-2.8058	-2.8884
0.0716	1.55	1500	0.5381	-1.6588	-4.0838	0.7188	2.4250	-237.4441	-271.3697	-2.7979	-2.8862
0.0957	1.65	1600	0.5151	-1.1746	-3.7477	0.75	2.5731	-234.0834	-266.5278	-2.7960	-2.8976
0.0645	1.75	1700	0.5393	-1.7591	-4.6011	0.8125	2.8419	-242.6167	-272.3728	-2.7483	-2.8592
0.0838	1.86	1800	0.5385	-1.6606	-4.4648	0.7656	2.8042	-241.2545	-271.3875	-2.7311	-2.8383
0.1106	1.96	1900	0.5322	-1.5621	-3.9779	0.7969	2.4158	-236.3850	-270.4025	-2.8194	-2.9133
0.0174	2.06	2000	0.5921	-2.4968	-5.9514	0.7969	3.4546	-256.1199	-279.7498	-2.7579	-2.8631
0.0134	2.17	2100	0.6247	-2.9002	-6.4277	0.7969	3.5275	-260.8829	-283.7838	-2.7316	-2.8319
0.0148	2.27	2200	0.6402	-3.2520	-7.0627	0.7812	3.8106	-267.2330	-287.3020	-2.6991	-2.8064
0.0142	2.37	2300	0.6563	-3.2715	-7.1303	0.8281	3.8588	-267.9088	-287.4962	-2.6871	-2.7992
0.011	2.48	2400	0.6605	-3.2996	-7.2258	0.7969	3.9262	-268.8643	-287.7776	-2.6555	-2.7717
0.0065	2.58	2500	0.6935	-3.6399	-8.0232	0.8125	4.3832	-276.8377	-291.1808	-2.6780	-2.7902
0.0089	2.68	2600	0.6773	-3.4822	-7.8182	0.8125	4.3360	-274.7881	-289.6033	-2.6885	-2.7994
0.0102	2.79	2700	0.6813	-3.5909	-7.8097	0.8281	4.2187	-274.7028	-290.6908	-2.6877	-2.7970
0.0136	2.89	2800	0.6892	-3.8236	-8.1490	0.8125	4.3254	-278.0957	-293.0175	-2.6765	-2.7862
0.0091	2.99	2900	0.6913	-3.9199	-8.3004	0.8125	4.3806	-279.6104	-293.9802	-2.6728	-2.7830

Framework versions

Transformers 4.35.0
Pytorch 2.1.0+cu118
Datasets 2.14.6
Tokenizers 0.14.1