bge_large_zh_llama3_70
This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.4411
- Precision: 0.4917
- Recall: 0.3601
- F1 Macro: 0.3861
- Accuracy: 0.6012
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 256
- eval_batch_size: 128
- seed: 0
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 20
Training results
Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 Macro | Accuracy |
---|---|---|---|---|---|---|---|
No log | 0 | 0 | 2.4917 | 0.2058 | 0.1667 | 0.0633 | 0.2346 |
0.4645 | 0.4675 | 1000 | 0.4725 | 0.4242 | 0.3207 | 0.3278 | 0.5832 |
0.4538 | 0.9350 | 2000 | 0.4588 | 0.4952 | 0.3350 | 0.3487 | 0.5928 |
0.4609 | 1.4025 | 3000 | 0.4521 | 0.4340 | 0.3255 | 0.3377 | 0.5968 |
0.4483 | 1.8700 | 4000 | 0.4453 | 0.4377 | 0.3187 | 0.3359 | 0.6007 |
0.4487 | 2.3375 | 5000 | 0.4523 | 0.4829 | 0.3605 | 0.3733 | 0.5948 |
0.4358 | 2.8050 | 6000 | 0.4433 | 0.4733 | 0.3450 | 0.3664 | 0.5999 |
0.4326 | 3.2726 | 7000 | 0.4404 | 0.5050 | 0.3289 | 0.3474 | 0.5987 |
0.4328 | 3.7401 | 8000 | 0.4387 | 0.4298 | 0.3303 | 0.3498 | 0.6030 |
0.4245 | 4.2076 | 9000 | 0.4373 | 0.4849 | 0.3293 | 0.3494 | 0.6030 |
0.4313 | 4.6751 | 10000 | 0.4358 | 0.4929 | 0.3370 | 0.3561 | 0.6041 |
0.418 | 5.1426 | 11000 | 0.4358 | 0.5159 | 0.3347 | 0.3553 | 0.6031 |
0.414 | 5.6101 | 12000 | 0.4357 | 0.4830 | 0.3458 | 0.3654 | 0.6035 |
0.405 | 6.0776 | 13000 | 0.4348 | 0.4887 | 0.3558 | 0.3816 | 0.6026 |
0.4174 | 6.5451 | 14000 | 0.4351 | 0.5129 | 0.3538 | 0.3778 | 0.6044 |
0.4116 | 7.0126 | 15000 | 0.4374 | 0.5291 | 0.3695 | 0.3913 | 0.6032 |
0.4103 | 7.4801 | 16000 | 0.4383 | 0.4998 | 0.3698 | 0.3954 | 0.6046 |
0.4065 | 7.9476 | 17000 | 0.4388 | 0.4989 | 0.3691 | 0.3905 | 0.6025 |
0.3953 | 8.4151 | 18000 | 0.4393 | 0.4867 | 0.3674 | 0.3880 | 0.6024 |
0.4075 | 8.8827 | 19000 | 0.4360 | 0.4846 | 0.3661 | 0.3881 | 0.6033 |
0.3915 | 9.3502 | 20000 | 0.4351 | 0.5292 | 0.3510 | 0.3785 | 0.6021 |
0.3929 | 9.8177 | 21000 | 0.4376 | 0.5093 | 0.3291 | 0.3491 | 0.6026 |
0.3852 | 10.2852 | 22000 | 0.4371 | 0.5267 | 0.3504 | 0.3738 | 0.6023 |
0.3847 | 10.7527 | 23000 | 0.4364 | 0.4964 | 0.3428 | 0.3668 | 0.6023 |
0.3925 | 11.2202 | 24000 | 0.4397 | 0.4851 | 0.3588 | 0.3850 | 0.5989 |
0.3824 | 11.6877 | 25000 | 0.4370 | 0.5076 | 0.3594 | 0.3881 | 0.6029 |
0.3741 | 12.1552 | 26000 | 0.4383 | 0.4942 | 0.3581 | 0.3836 | 0.5997 |
0.3778 | 12.6227 | 27000 | 0.4412 | 0.5137 | 0.3669 | 0.3907 | 0.6007 |
0.3737 | 13.0902 | 28000 | 0.4386 | 0.4916 | 0.3511 | 0.3780 | 0.6007 |
0.3785 | 13.5577 | 29000 | 0.4387 | 0.4919 | 0.3532 | 0.3785 | 0.6025 |
0.374 | 14.0252 | 30000 | 0.4385 | 0.4850 | 0.3469 | 0.3700 | 0.6025 |
0.3759 | 14.4928 | 31000 | 0.4391 | 0.5011 | 0.3589 | 0.3857 | 0.6015 |
0.3711 | 14.9603 | 32000 | 0.4398 | 0.4812 | 0.3468 | 0.3704 | 0.6001 |
0.3662 | 15.4278 | 33000 | 0.4415 | 0.4911 | 0.3549 | 0.3769 | 0.6003 |
0.3672 | 15.8953 | 34000 | 0.4399 | 0.4855 | 0.3566 | 0.3819 | 0.6012 |
0.3622 | 16.3628 | 35000 | 0.4407 | 0.4824 | 0.3624 | 0.3874 | 0.6015 |
0.3628 | 16.8303 | 36000 | 0.4409 | 0.4931 | 0.3574 | 0.3818 | 0.6020 |
0.3589 | 17.2978 | 37000 | 0.4423 | 0.4916 | 0.3704 | 0.3954 | 0.6012 |
0.3589 | 17.7653 | 38000 | 0.4419 | 0.4931 | 0.3696 | 0.3961 | 0.6012 |
0.3646 | 18.2328 | 39000 | 0.4410 | 0.4852 | 0.3510 | 0.3736 | 0.6003 |
0.3611 | 18.7003 | 40000 | 0.4409 | 0.4878 | 0.3593 | 0.3851 | 0.6007 |
0.3555 | 19.1678 | 41000 | 0.4416 | 0.4874 | 0.3613 | 0.3882 | 0.5999 |
0.3559 | 19.6353 | 42000 | 0.4411 | 0.4917 | 0.3601 | 0.3861 | 0.6012 |
Framework versions
- Transformers 4.43.3
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 3
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.