openhermes-mistral-dpo-gptq

This model is a fine-tuned version of TheBloke/OpenHermes-2-Mistral-7B-GPTQ on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.6751
Rewards/chosen: 0.0215
Rewards/rejected: -0.0002
Rewards/accuracies: 0.4375
Rewards/margins: 0.0217
Logps/rejected: -132.4150
Logps/chosen: -333.1984
Logits/rejected: -2.7074
Logits/chosen: -2.3899

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 1
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2
training_steps: 200
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.7508	0.01	10	0.7479	-0.3566	-0.2195	0.25	-0.1371	-351.7834	-711.3196	-1.6251	-1.4448
0.982	0.01	20	0.7765	-0.5130	-0.3405	0.25	-0.1726	-472.7075	-867.7224	-1.1628	-1.0511
0.6985	0.01	30	0.6899	0.0062	0.0027	0.375	0.0036	-129.5716	-348.4551	-2.7357	-2.3605
0.6959	0.02	40	0.6935	0.0008	0.0022	0.25	-0.0014	-130.0675	-353.8832	-2.7275	-2.3561
0.6944	0.03	50	0.6892	0.0073	0.0040	0.4375	0.0033	-128.2573	-347.3910	-2.7124	-2.3589
0.7785	0.03	60	0.7361	-0.4130	-0.2866	0.375	-0.1264	-418.8091	-767.6629	-1.3320	-1.2310
0.7009	0.04	70	0.7892	-0.5637	-0.3765	0.3125	-0.1872	-508.7737	-918.3933	-1.1171	-1.0132
0.7886	0.04	80	0.7862	-0.5738	-0.3892	0.3125	-0.1845	-521.4880	-928.4485	-1.1127	-1.0064
0.7059	0.04	90	0.7127	-0.0370	-0.0108	0.4375	-0.0263	-143.0086	-391.7115	-2.6542	-2.3045
0.6793	0.05	100	0.6981	-0.0357	-0.0284	0.375	-0.0073	-160.6859	-390.4216	-2.5199	-2.2133
0.7085	0.06	110	0.7039	-0.0251	-0.0089	0.3125	-0.0162	-141.1216	-379.7617	-2.6806	-2.3312
0.6959	0.06	120	0.6974	-0.0162	-0.0077	0.375	-0.0085	-139.9174	-370.8595	-2.6925	-2.3406
0.6897	0.07	130	0.6948	-0.0122	-0.0069	0.3125	-0.0053	-139.1202	-366.9146	-2.6971	-2.3477
0.6897	0.07	140	0.6935	-0.0104	-0.0067	0.3125	-0.0038	-138.8917	-365.1371	-2.6948	-2.3576
0.7015	0.07	150	0.6864	0.0011	-0.0042	0.4375	0.0054	-136.4684	-353.5512	-2.6973	-2.3710
0.6497	0.08	160	0.6814	0.0099	-0.0023	0.4375	0.0122	-134.5819	-344.8182	-2.7048	-2.3806
0.6893	0.09	170	0.6787	0.0147	-0.0015	0.4375	0.0161	-133.7108	-340.0247	-2.7106	-2.3874
0.7002	0.09	180	0.6776	0.0168	-0.0010	0.4375	0.0178	-133.2137	-337.8709	-2.7120	-2.3888
0.6875	0.1	190	0.6755	0.0209	-0.0002	0.4375	0.0211	-132.4327	-333.8066	-2.7093	-2.3902
0.6781	0.1	200	0.6751	0.0215	-0.0002	0.4375	0.0217	-132.4150	-333.1984	-2.7074	-2.3899

Framework versions

Transformers 4.35.2
Pytorch 2.0.1+cu117
Datasets 2.15.0
Tokenizers 0.15.0

LBusser
/

openhermes-mistral-dpo-gptq

openhermes-mistral-dpo-gptq

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for LBusser/openhermes-mistral-dpo-gptq

Evaluation results