phi-3-mini-LoRA-mergedatafilter3_split

This model is a fine-tuned version of microsoft/Phi-3-mini-4k-instruct on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.3387

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 12
eval_batch_size: 12
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 4
total_train_batch_size: 192
total_eval_batch_size: 48
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss
0.4848	0.1462	800	0.4776
0.4265	0.2924	1600	0.4238
0.3995	0.4386	2400	0.3986
0.3848	0.5848	3200	0.3839
0.3742	0.7310	4000	0.3742
0.3679	0.8772	4800	0.3669
0.3616	1.0233	5600	0.3625
0.3574	1.1695	6400	0.3569
0.3537	1.3157	7200	0.3537
0.3523	1.4619	8000	0.3516
0.3491	1.6081	8800	0.3495
0.3478	1.7543	9600	0.3483
0.3467	1.9005	10400	0.3470
0.3455	2.0467	11200	0.3459
0.3455	2.1929	12000	0.3451
0.3442	2.3391	12800	0.3444
0.3424	2.4853	13600	0.3436
0.3431	2.6315	14400	0.3432
0.3427	2.7777	15200	0.3426
0.3424	2.9238	16000	0.3423
0.3419	3.0700	16800	0.3418
0.3413	3.2162	17600	0.3415
0.3417	3.3624	18400	0.3412
0.3406	3.5086	19200	0.3408
0.34	3.6548	20000	0.3407
0.341	3.8010	20800	0.3406
0.3395	3.9472	21600	0.3403
0.3415	4.0934	22400	0.3401
0.3398	4.2396	23200	0.3400
0.3395	4.3858	24000	0.3398
0.3405	4.5320	24800	0.3396
0.3385	4.6781	25600	0.3396
0.339	4.8243	26400	0.3395
0.3391	4.9705	27200	0.3395
0.3397	5.1167	28000	0.3393
0.337	5.2629	28800	0.3393
0.3383	5.4091	29600	0.3392
0.3384	5.5553	30400	0.3391
0.3383	5.7015	31200	0.3391
0.3386	5.8477	32000	0.3390
0.3391	5.9939	32800	0.3390
0.3384	6.1401	33600	0.3390
0.3391	6.2863	34400	0.3390
0.3385	6.4325	35200	0.3389
0.338	6.5786	36000	0.3389
0.3384	6.7248	36800	0.3389
0.3377	6.8710	37600	0.3388
0.338	7.0172	38400	0.3388
0.3385	7.1634	39200	0.3388
0.3393	7.3096	40000	0.3388
0.3377	7.4558	40800	0.3388
0.3382	7.6020	41600	0.3387
0.3387	7.7482	42400	0.3387
0.3391	7.8944	43200	0.3387
0.338	8.0406	44000	0.3387
0.3386	8.1868	44800	0.3387
0.3385	8.3330	45600	0.3387
0.3372	8.4791	46400	0.3387
0.338	8.6253	47200	0.3387
0.3387	8.7715	48000	0.3387
0.3391	8.9177	48800	0.3387
0.3379	9.0639	49600	0.3387
0.3386	9.2101	50400	0.3387
0.3385	9.3563	51200	0.3387
0.3385	9.5025	52000	0.3387
0.3385	9.6487	52800	0.3387
0.3386	9.7949	53600	0.3387
0.3373	9.9411	54400	0.3387

Framework versions

PEFT 0.11.1
Transformers 4.43.2
Pytorch 2.3.1+cu121
Datasets 2.20.0
Tokenizers 0.19.1

minimini99
/

phi-3-mini-LoRA-mergedatafilter3_split

phi-3-mini-LoRA-mergedatafilter3_split

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for minimini99/phi-3-mini-LoRA-mergedatafilter3_split

Evaluation results