Saiga_timelist_task200steps

This model is a fine-tuned version of TheBloke/Llama-2-7B-fp16 on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 2.4521

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 2
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 10
total_train_batch_size: 20
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
training_steps: 200

Training results

Training Loss	Epoch	Step	Validation Loss
2.2298	0.37	2	2.2020
2.0975	0.74	4	2.1478
2.0243	1.11	6	2.1123
1.988	1.48	8	2.0857
1.9585	1.85	10	2.0692
1.883	2.22	12	2.0570
1.9078	2.59	14	2.0477
1.9179	2.96	16	2.0408
1.8663	3.33	18	2.0366
1.8191	3.7	20	2.0325
1.8515	4.07	22	2.0280
1.8189	4.44	24	2.0246
1.8478	4.81	26	2.0215
1.7767	5.19	28	2.0198
1.7685	5.56	30	2.0190
1.7895	5.93	32	2.0189
1.7285	6.3	34	2.0191
1.7609	6.67	36	2.0174
1.7138	7.04	38	2.0156
1.7112	7.41	40	2.0187
1.7029	7.78	42	2.0216
1.6787	8.15	44	2.0203
1.646	8.52	46	2.0243
1.5996	8.89	48	2.0294
1.6838	9.26	50	2.0280
1.6057	9.63	52	2.0254
1.574	10.0	54	2.0310
1.51	10.37	56	2.0547
1.5951	10.74	58	2.0420
1.5455	11.11	60	2.0350
1.5424	11.48	62	2.0612
1.4933	11.85	64	2.0652
1.5766	12.22	66	2.0537
1.4453	12.59	68	2.0732
1.4683	12.96	70	2.0763
1.4734	13.33	72	2.0805
1.4314	13.7	74	2.0908
1.3921	14.07	76	2.0815
1.4099	14.44	78	2.1134
1.4389	14.81	80	2.0955
1.3114	15.19	82	2.1153
1.3093	15.56	84	2.1303
1.3984	15.93	86	2.1246
1.2831	16.3	88	2.1564
1.2971	16.67	90	2.1284
1.3052	17.04	92	2.1608
1.2421	17.41	94	2.1556
1.1835	17.78	96	2.1734
1.283	18.15	98	2.1773
1.2311	18.52	100	2.1992
1.2428	18.89	102	2.1954
1.1959	19.26	104	2.2065
1.2376	19.63	106	2.2124
1.0689	20.0	108	2.2266
1.1471	20.37	110	2.2266
1.0068	20.74	112	2.2451
1.161	21.11	114	2.2501
1.1252	21.48	116	2.2579
1.0683	21.85	118	2.2595
1.1279	22.22	120	2.2904
0.9923	22.59	122	2.2693
1.0139	22.96	124	2.3008
0.9924	23.33	126	2.3036
1.0418	23.7	128	2.3277
1.0463	24.07	130	2.3043
1.0556	24.44	132	2.3262
0.9991	24.81	134	2.3299
0.96	25.19	136	2.3481
0.9677	25.56	138	2.3458
0.9107	25.93	140	2.3607
0.8962	26.3	142	2.3644
0.916	26.67	144	2.3700
0.9284	27.04	146	2.3726
0.99	27.41	148	2.3860
0.8308	27.78	150	2.3918
0.9459	28.15	152	2.3971
0.9283	28.52	154	2.4030
0.863	28.89	156	2.4024
0.9068	29.26	158	2.4083
0.8623	29.63	160	2.4179
0.8359	30.0	162	2.4262
0.953	30.37	164	2.4281
0.7937	30.74	166	2.4381
0.8274	31.11	168	2.4255
0.8862	31.48	170	2.4330
0.7913	31.85	172	2.4511
0.8436	32.22	174	2.4522
0.8519	32.59	176	2.4413
0.8089	32.96	178	2.4371
0.8876	33.33	180	2.4434
0.7836	33.7	182	2.4532
0.8232	34.07	184	2.4566
0.8299	34.44	186	2.4582
0.7977	34.81	188	2.4553
0.8635	35.19	190	2.4522
0.883	35.56	192	2.4518
0.8158	35.93	194	2.4513
0.8732	36.3	196	2.4518
0.8112	36.67	198	2.4522
0.7869	37.04	200	2.4521

Framework versions

PEFT 0.10.0
Transformers 4.39.3
Pytorch 2.2.2+cu121
Datasets 2.18.0
Tokenizers 0.15.2

marcus2000
/

Saiga_timelist_task200steps

Saiga_timelist_task200steps

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for marcus2000/Saiga_timelist_task200steps

Evaluation results