OpenELM-1_1B-IPO

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Logits/chosen: -0.6367
Logits/rejected: 0.8008
Logps/chosen: -49.75
Logps/rejected: -62.75
Loss: 1943.3600
Rewards/accuracies: 0.6953
Rewards/chosen: -0.4863
Rewards/margins: 0.1309
Rewards/rejected: -0.6172

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Logits/chosen	Logits/rejected	Logps/chosen	Logps/rejected	Validation Loss	Rewards/accuracies	Rewards/chosen	Rewards/margins	Rewards/rejected
2322.6	0.1047	100	-8.875	-8.375	-13.1875	-15.875	2317.6321	0.625	-0.1211	0.0258	-0.1465
2118.6	0.2093	200	-10.125	-9.75	-30.5	-37.25	2150.9761	0.6738	-0.2930	0.0664	-0.3594
2172.1	0.3140	300	-8.4375	-7.8438	-37.0	-44.0	2062.5920	0.6895	-0.3594	0.0674	-0.4277
2039.3	0.4186	400	-6.0938	-5.4375	-28.5	-37.0	1999.0400	0.6914	-0.2734	0.0850	-0.3594
1938.55	0.5233	500	-6.2812	-5.25	-40.0	-51.25	1975.6801	0.6953	-0.3906	0.1113	-0.5
1949.6	0.6279	600	-6.3438	-4.9062	-34.5	-44.0	1962.8800	0.7051	-0.3340	0.0942	-0.4277
1951.75	0.7326	700	-8.6875	-7.0625	-30.625	-41.25	1956.0959	0.7090	-0.2949	0.1055	-0.4004
1869.7	0.8373	800	-1.2031	0.3184	-37.0	-48.75	1889.7280	0.7207	-0.3594	0.1147	-0.4746
1905.45	0.9419	900	-6.0625	-4.2188	-42.5	-54.25	1903.8400	0.7070	-0.4141	0.1167	-0.5312
1301.1	1.0466	1000	-0.8906	0.2236	-40.0	-54.25	1946.8480	0.7109	-0.3887	0.1416	-0.5312
1193.05	1.1512	1100	-1.6094	-0.3926	-45.0	-59.25	1939.2321	0.7031	-0.4395	0.1406	-0.5781
1162.575	1.2559	1200	-2.0938	-0.7109	-45.5	-59.75	1908.4800	0.7070	-0.4434	0.1406	-0.5859
1153.3	1.3605	1300	-2.8281	-1.3594	-41.25	-54.75	1974.0800	0.6973	-0.4004	0.1357	-0.5352
1084.875	1.4652	1400	-1.5078	0.0021	-48.0	-61.5	1926.9440	0.7051	-0.4688	0.1338	-0.6016
1031.2313	1.5699	1500	-1.6641	-0.1064	-42.0	-56.75	1931.5840	0.7031	-0.4082	0.1465	-0.5547
1090.75	1.6745	1600	-1.375	0.0486	-44.25	-58.25	1936.1281	0.6973	-0.4316	0.1396	-0.5703
1097.5375	1.7792	1700	-2.2344	-0.6602	-47.5	-62.0	1975.2960	0.7070	-0.4648	0.1445	-0.6094
1031.15	1.8838	1800	-0.8125	0.4512	-48.0	-62.25	1964.5120	0.7090	-0.4668	0.1416	-0.6094
1012.0125	1.9885	1900	-0.7578	0.6133	-46.25	-60.25	1937.0240	0.7031	-0.4512	0.1406	-0.5898
262.0437	2.0931	2000	-0.875	0.5430	-47.75	-60.75	1950.9440	0.6895	-0.4668	0.1309	-0.5977
266.8375	2.1978	2100	-1.25	0.2207	-47.25	-60.25	1943.8719	0.7090	-0.4609	0.1279	-0.5898
284.8125	2.3025	2200	-0.5508	0.8164	-49.75	-62.75	1946.7520	0.6934	-0.4883	0.1289	-0.6172
303.8625	2.4071	2300	-0.4082	0.9297	-50.25	-63.0	1945.9840	0.6973	-0.4902	0.1279	-0.6172
266.5266	2.5118	2400	-0.6602	0.7578	-49.25	-62.25	1952.0640	0.6914	-0.4805	0.1289	-0.6094
220.4344	2.6164	2500	-0.5625	0.8672	-49.25	-62.25	1944.1281	0.6973	-0.4805	0.1309	-0.6094
253.4812	2.7211	2600	-0.5469	0.8789	-50.0	-63.0	1938.1121	0.6914	-0.4883	0.1299	-0.6172
271.3984	2.8257	2700	-0.6328	0.8047	-49.75	-63.0	1943.8719	0.6953	-0.4863	0.1299	-0.6172
292.8133	2.9304	2800	-0.6367	0.8008	-49.75	-62.75	1943.3600	0.6953	-0.4863	0.1309	-0.6172

Framework versions

Transformers 4.44.2
Pytorch 2.3.0
Datasets 3.0.0
Tokenizers 0.19.1

CharlesLi
/

OpenELM-1_1B-IPO

OpenELM-1_1B-IPO

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results