zephyr-7b-uf-rlced-conifer-group-dpo-2e

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the data/zephyr_uf_rlced_conifer_ref dataset. It achieves the following results on the evaluation set:

Loss: 0.2410
Rewards/chosen: -3.4514
Rewards/rejected: -8.7503
Rewards/accuracies: 0.8778
Rewards/margins: 5.2989
Logps/rejected: -1278.7679
Logps/chosen: -737.6100
Logits/rejected: 3.0512
Logits/chosen: 0.9415
Alpha0: 0.1957
Alpha1: 0.8043
Task Loss1: 0.1724
Task Excess Loss1: 0.0378
Excess Loss: 0.0340
Task Loss0: 0.5295
Task Excess Loss0: 0.0879

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 4
total_train_batch_size: 256
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen	Alpha0	Alpha1	Task Loss1	Task Excess Loss1	Excess Loss	Task Loss0	Task Excess Loss0
0.3541	0.1388	100	0.4194	-1.3743	-2.6267	0.8102	1.2524	-666.4093	-529.9026	-2.7580	-2.7843	0.8214	0.1786	0.3373	0.1973	0.1899	0.6883	0.2655
0.2214	0.2776	200	0.3480	-1.2450	-2.9488	0.8412	1.7038	-698.6146	-516.9692	0.1216	-0.2174	0.8786	0.1214	0.2866	0.1517	0.1250	0.5355	0.0929
0.2284	0.4164	300	0.3271	-1.7298	-3.6279	0.8515	1.8981	-766.5247	-565.4502	1.3769	0.5823	0.6417	0.3583	0.2721	0.1383	0.1130	0.5406	0.0794
0.1837	0.5552	400	0.3040	-1.7232	-4.0037	0.8553	2.2805	-804.1021	-564.7872	1.8300	0.7862	0.7891	0.2109	0.2517	0.1159	0.0949	0.5490	0.0796
0.1749	0.6940	500	0.2966	-1.7976	-4.1927	0.8637	2.3951	-823.0039	-572.2305	1.7164	0.5785	0.8057	0.1943	0.2448	0.1097	0.0856	0.5124	0.0570
0.1823	0.8328	600	0.3030	-1.7187	-3.9261	0.8647	2.2074	-796.3432	-564.3366	2.4921	1.3988	0.9053	0.0947	0.2541	0.1193	0.0922	0.5047	0.0596
0.1766	0.9715	700	0.2895	-1.6400	-4.2369	0.8647	2.5969	-827.4293	-556.4711	1.6749	0.1680	0.9622	0.0378	0.2417	0.1057	0.0812	0.5020	0.0532
0.1131	1.1103	800	0.2646	-2.7794	-6.7040	0.8647	3.9245	-1074.1326	-670.4117	2.3249	0.3844	0.0325	0.9675	0.1990	0.0653	0.0567	0.5372	0.0871
0.1006	1.2491	900	0.2490	-3.6465	-8.6692	0.8712	5.0227	-1270.6554	-757.1147	3.3211	1.0777	0.4760	0.5240	0.1852	0.0492	0.0420	0.5341	0.0967
0.0951	1.3879	1000	0.2470	-3.0354	-7.7369	0.8797	4.7015	-1177.4214	-696.0082	3.1614	0.9199	0.0150	0.9850	0.1756	0.0450	0.0382	0.5249	0.0834
0.0885	1.5267	1100	0.2435	-3.4543	-8.4740	0.8731	5.0197	-1251.1321	-737.8961	3.4589	1.3892	0.0151	0.9849	0.1747	0.0421	0.0368	0.5310	0.0887
0.1003	1.6655	1200	0.2416	-3.3615	-8.4285	0.875	5.0670	-1246.5889	-728.6184	2.9341	0.9100	0.0721	0.9279	0.1730	0.0396	0.0352	0.5285	0.0863
0.0865	1.8043	1300	0.2412	-3.3114	-8.4737	0.8769	5.1623	-1251.1091	-723.6140	2.9432	0.8628	0.0755	0.9245	0.1734	0.0388	0.0343	0.5272	0.0847
0.0893	1.9431	1400	0.2410	-3.4515	-8.7505	0.8769	5.2990	-1278.7848	-737.6204	3.0507	0.9407	0.6369	0.3631	0.1726	0.0379	0.0341	0.5306	0.0889

Framework versions

Transformers 4.44.1
Pytorch 2.1.2+cu121
Datasets 2.21.0
Tokenizers 0.19.1

NicholasCorrado
/

zephyr-7b-uf-rlced-conifer-group-dpo-2e

zephyr-7b-uf-rlced-conifer-group-dpo-2e

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for NicholasCorrado/zephyr-7b-uf-rlced-conifer-group-dpo-2e

Evaluation results