bge_large_zh_llama3_70

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.4411
Precision: 0.4917
Recall: 0.3601
F1 Macro: 0.3861
Accuracy: 0.6012

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 256
eval_batch_size: 128
seed: 0
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Precision	Recall	F1 Macro	Accuracy
No log	0	0	2.4917	0.2058	0.1667	0.0633	0.2346
0.4645	0.4675	1000	0.4725	0.4242	0.3207	0.3278	0.5832
0.4538	0.9350	2000	0.4588	0.4952	0.3350	0.3487	0.5928
0.4609	1.4025	3000	0.4521	0.4340	0.3255	0.3377	0.5968
0.4483	1.8700	4000	0.4453	0.4377	0.3187	0.3359	0.6007
0.4487	2.3375	5000	0.4523	0.4829	0.3605	0.3733	0.5948
0.4358	2.8050	6000	0.4433	0.4733	0.3450	0.3664	0.5999
0.4326	3.2726	7000	0.4404	0.5050	0.3289	0.3474	0.5987
0.4328	3.7401	8000	0.4387	0.4298	0.3303	0.3498	0.6030
0.4245	4.2076	9000	0.4373	0.4849	0.3293	0.3494	0.6030
0.4313	4.6751	10000	0.4358	0.4929	0.3370	0.3561	0.6041
0.418	5.1426	11000	0.4358	0.5159	0.3347	0.3553	0.6031
0.414	5.6101	12000	0.4357	0.4830	0.3458	0.3654	0.6035
0.405	6.0776	13000	0.4348	0.4887	0.3558	0.3816	0.6026
0.4174	6.5451	14000	0.4351	0.5129	0.3538	0.3778	0.6044
0.4116	7.0126	15000	0.4374	0.5291	0.3695	0.3913	0.6032
0.4103	7.4801	16000	0.4383	0.4998	0.3698	0.3954	0.6046
0.4065	7.9476	17000	0.4388	0.4989	0.3691	0.3905	0.6025
0.3953	8.4151	18000	0.4393	0.4867	0.3674	0.3880	0.6024
0.4075	8.8827	19000	0.4360	0.4846	0.3661	0.3881	0.6033
0.3915	9.3502	20000	0.4351	0.5292	0.3510	0.3785	0.6021
0.3929	9.8177	21000	0.4376	0.5093	0.3291	0.3491	0.6026
0.3852	10.2852	22000	0.4371	0.5267	0.3504	0.3738	0.6023
0.3847	10.7527	23000	0.4364	0.4964	0.3428	0.3668	0.6023
0.3925	11.2202	24000	0.4397	0.4851	0.3588	0.3850	0.5989
0.3824	11.6877	25000	0.4370	0.5076	0.3594	0.3881	0.6029
0.3741	12.1552	26000	0.4383	0.4942	0.3581	0.3836	0.5997
0.3778	12.6227	27000	0.4412	0.5137	0.3669	0.3907	0.6007
0.3737	13.0902	28000	0.4386	0.4916	0.3511	0.3780	0.6007
0.3785	13.5577	29000	0.4387	0.4919	0.3532	0.3785	0.6025
0.374	14.0252	30000	0.4385	0.4850	0.3469	0.3700	0.6025
0.3759	14.4928	31000	0.4391	0.5011	0.3589	0.3857	0.6015
0.3711	14.9603	32000	0.4398	0.4812	0.3468	0.3704	0.6001
0.3662	15.4278	33000	0.4415	0.4911	0.3549	0.3769	0.6003
0.3672	15.8953	34000	0.4399	0.4855	0.3566	0.3819	0.6012
0.3622	16.3628	35000	0.4407	0.4824	0.3624	0.3874	0.6015
0.3628	16.8303	36000	0.4409	0.4931	0.3574	0.3818	0.6020
0.3589	17.2978	37000	0.4423	0.4916	0.3704	0.3954	0.6012
0.3589	17.7653	38000	0.4419	0.4931	0.3696	0.3961	0.6012
0.3646	18.2328	39000	0.4410	0.4852	0.3510	0.3736	0.6003
0.3611	18.7003	40000	0.4409	0.4878	0.3593	0.3851	0.6007
0.3555	19.1678	41000	0.4416	0.4874	0.3613	0.3882	0.5999
0.3559	19.6353	42000	0.4411	0.4917	0.3601	0.3861	0.6012

Framework versions

Transformers 4.43.3
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

Snowkcon
/

bge_large_zh_llama3_70

bge_large_zh_llama3_70

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results