spin-v-diverse

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.0027
Rewards/real: -2.6757
Rewards/generated: -21.8763
Rewards/accuracies: 1.0
Rewards/margins: 19.2006
Logps/generated: -346.5988
Logps/real: -161.4224
Logits/generated: -2.5880
Logits/real: -2.4315

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/real	Rewards/generated	Rewards/accuracies	Rewards/margins	Logps/generated	Logps/real	Logits/generated	Logits/real
0.0257	0.06	100	0.0288	1.0058	-5.7769	0.9928	6.7828	-185.6055	-124.6072	-2.8843	-2.6520
0.0096	0.13	200	0.0126	-0.1554	-12.6258	0.9984	12.4704	-254.0941	-136.2193	-2.5945	-2.2413
0.024	0.19	300	0.0126	0.1173	-11.0946	0.9968	11.2119	-238.7820	-133.4925	-2.7227	-2.5040
0.0065	0.26	400	0.0082	-0.1964	-13.6305	0.9984	13.4341	-264.1411	-136.6298	-2.7028	-2.4738
0.0073	0.32	500	0.0081	0.0850	-13.4368	0.9984	13.5218	-262.2040	-133.8156	-2.6477	-2.4285
0.0035	0.38	600	0.0071	-2.8739	-18.4641	1.0	15.5902	-312.4772	-163.4043	-2.5956	-2.3811
0.0097	0.45	700	0.0077	-2.2908	-16.9898	0.9984	14.6989	-297.7338	-157.5739	-2.5210	-2.2045
0.0052	0.51	800	0.0065	-1.6983	-19.8323	0.9992	18.1340	-326.1593	-151.6484	-2.7183	-2.5409
0.0037	0.58	900	0.0067	-1.2826	-16.6590	0.9984	15.3763	-294.4258	-147.4920	-2.6881	-2.5334
0.0023	0.64	1000	0.0047	-1.9423	-19.2263	1.0	17.2840	-320.0990	-154.0886	-2.6404	-2.4694
0.0041	0.7	1100	0.0050	-2.4756	-19.3047	1.0	16.8290	-320.8827	-159.4218	-2.6368	-2.4329
0.0033	0.77	1200	0.0037	-2.8600	-20.2625	1.0	17.4025	-330.4614	-163.2654	-2.6240	-2.4681
0.0042	0.83	1300	0.0032	-2.6738	-20.7669	1.0	18.0931	-335.5057	-161.4039	-2.5974	-2.4463
0.0031	0.9	1400	0.0030	-2.1767	-20.6456	0.9992	18.4690	-334.2925	-156.4323	-2.6144	-2.4595
0.0015	0.96	1500	0.0027	-2.6757	-21.8763	1.0	19.2006	-346.5988	-161.4224	-2.5880	-2.4315

Framework versions

Transformers 4.37.0
Pytorch 2.1.2+cu121
Datasets 2.14.6
Tokenizers 0.15.2

AmberYifan
/

spin-v-diverse

spin-v-diverse

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for AmberYifan/spin-v-diverse

Evaluation results