phi-1_5-dpo

This model is a fine-tuned version of rasyosef/phi-1_5-sft on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.5013
Rewards/chosen: -1.0250
Rewards/rejected: -2.3893
Rewards/accuracies: 0.7283
Rewards/margins: 1.3643
Logps/rejected: -162.0916
Logps/chosen: -128.1033
Logits/rejected: 5.3082
Logits/chosen: 5.1890

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 300
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6899	0.1241	138	0.6769	-0.0153	-0.0504	0.625	0.0351	-138.7025	-118.0066	4.5710	4.4532
0.6309	0.2482	276	0.6035	-0.2012	-0.5586	0.7120	0.3575	-143.7850	-119.8655	4.5167	4.3940
0.5756	0.3723	414	0.5669	-0.3693	-0.9842	0.7174	0.6149	-148.0405	-121.5467	4.6242	4.5060
0.5715	0.4964	552	0.5446	-0.4109	-1.1855	0.7283	0.7745	-150.0534	-121.9633	4.7324	4.6143
0.5449	0.6205	690	0.5331	-0.4666	-1.3090	0.7446	0.8424	-151.2884	-122.5196	4.8229	4.7080
0.5536	0.7446	828	0.5136	-0.4885	-1.3825	0.7446	0.8940	-152.0234	-122.7389	4.8867	4.7737
0.5253	0.8687	966	0.5057	-0.5613	-1.5446	0.7554	0.9832	-153.6442	-123.4672	4.9287	4.8080
0.5249	0.9928	1104	0.5054	-0.5101	-1.4656	0.75	0.9555	-152.8544	-122.9549	4.8704	4.7521
0.4631	1.1169	1242	0.5067	-0.6889	-1.7678	0.75	1.0789	-155.8768	-124.7426	4.8470	4.7276
0.4524	1.2410	1380	0.5006	-0.7467	-1.9049	0.7446	1.1582	-157.2474	-125.3205	4.9447	4.8239
0.424	1.3651	1518	0.5036	-0.7638	-2.0144	0.7337	1.2505	-158.3425	-125.4923	4.9235	4.8002
0.4428	1.4892	1656	0.5004	-0.7790	-2.0132	0.7446	1.2342	-158.3307	-125.6437	4.9576	4.8375
0.4424	1.6133	1794	0.4944	-0.8220	-2.0517	0.7391	1.2297	-158.7152	-126.0739	4.9736	4.8553
0.4358	1.7374	1932	0.5022	-0.8091	-1.9993	0.7228	1.1902	-158.1918	-125.9447	5.0894	4.9702
0.4426	1.8615	2070	0.4992	-0.8254	-2.0308	0.7228	1.2054	-158.5065	-126.1077	5.0943	4.9780
0.4226	1.9856	2208	0.4971	-0.8701	-2.1434	0.7283	1.2733	-159.6329	-126.5553	5.1222	5.0011
0.3684	2.1097	2346	0.5032	-0.9201	-2.2281	0.7228	1.3081	-160.4799	-127.0545	5.2209	5.1031
0.3695	2.2338	2484	0.5022	-0.9332	-2.2651	0.7228	1.3319	-160.8495	-127.1860	5.2170	5.0977
0.3693	2.3579	2622	0.5022	-0.9418	-2.2839	0.7283	1.3421	-161.0379	-127.2717	5.2390	5.1169
0.3659	2.4820	2760	0.5037	-0.9820	-2.3392	0.7228	1.3572	-161.5908	-127.6742	5.2392	5.1148
0.3557	2.6061	2898	0.5031	-1.0001	-2.3531	0.7228	1.3529	-161.7294	-127.8552	5.2704	5.1488
0.3491	2.7302	3036	0.5053	-1.0242	-2.3803	0.7228	1.3562	-162.0017	-128.0954	5.2880	5.1693
0.3512	2.8543	3174	0.5036	-1.0265	-2.3833	0.7174	1.3568	-162.0320	-128.1190	5.2965	5.1768
0.3458	2.9784	3312	0.5013	-1.0250	-2.3893	0.7283	1.3643	-162.0916	-128.1033	5.3082	5.1890

Framework versions

PEFT 0.11.1
Transformers 4.42.4
Pytorch 2.3.1+cu121
Datasets 2.20.0
Tokenizers 0.19.1

rasyosef
/

phi-1_5-dpo

phi-1_5-dpo

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for rasyosef/phi-1_5-dpo

Datasets used to train rasyosef/phi-1_5-dpo

Collection including rasyosef/phi-1_5-dpo

Phi 1.5 Chat Models

Evaluation results