Edit model card

bge_large_zh_llama3_70

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4411
  • Precision: 0.4917
  • Recall: 0.3601
  • F1 Macro: 0.3861
  • Accuracy: 0.6012

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 256
  • eval_batch_size: 128
  • seed: 0
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Precision Recall F1 Macro Accuracy
No log 0 0 2.4917 0.2058 0.1667 0.0633 0.2346
0.4645 0.4675 1000 0.4725 0.4242 0.3207 0.3278 0.5832
0.4538 0.9350 2000 0.4588 0.4952 0.3350 0.3487 0.5928
0.4609 1.4025 3000 0.4521 0.4340 0.3255 0.3377 0.5968
0.4483 1.8700 4000 0.4453 0.4377 0.3187 0.3359 0.6007
0.4487 2.3375 5000 0.4523 0.4829 0.3605 0.3733 0.5948
0.4358 2.8050 6000 0.4433 0.4733 0.3450 0.3664 0.5999
0.4326 3.2726 7000 0.4404 0.5050 0.3289 0.3474 0.5987
0.4328 3.7401 8000 0.4387 0.4298 0.3303 0.3498 0.6030
0.4245 4.2076 9000 0.4373 0.4849 0.3293 0.3494 0.6030
0.4313 4.6751 10000 0.4358 0.4929 0.3370 0.3561 0.6041
0.418 5.1426 11000 0.4358 0.5159 0.3347 0.3553 0.6031
0.414 5.6101 12000 0.4357 0.4830 0.3458 0.3654 0.6035
0.405 6.0776 13000 0.4348 0.4887 0.3558 0.3816 0.6026
0.4174 6.5451 14000 0.4351 0.5129 0.3538 0.3778 0.6044
0.4116 7.0126 15000 0.4374 0.5291 0.3695 0.3913 0.6032
0.4103 7.4801 16000 0.4383 0.4998 0.3698 0.3954 0.6046
0.4065 7.9476 17000 0.4388 0.4989 0.3691 0.3905 0.6025
0.3953 8.4151 18000 0.4393 0.4867 0.3674 0.3880 0.6024
0.4075 8.8827 19000 0.4360 0.4846 0.3661 0.3881 0.6033
0.3915 9.3502 20000 0.4351 0.5292 0.3510 0.3785 0.6021
0.3929 9.8177 21000 0.4376 0.5093 0.3291 0.3491 0.6026
0.3852 10.2852 22000 0.4371 0.5267 0.3504 0.3738 0.6023
0.3847 10.7527 23000 0.4364 0.4964 0.3428 0.3668 0.6023
0.3925 11.2202 24000 0.4397 0.4851 0.3588 0.3850 0.5989
0.3824 11.6877 25000 0.4370 0.5076 0.3594 0.3881 0.6029
0.3741 12.1552 26000 0.4383 0.4942 0.3581 0.3836 0.5997
0.3778 12.6227 27000 0.4412 0.5137 0.3669 0.3907 0.6007
0.3737 13.0902 28000 0.4386 0.4916 0.3511 0.3780 0.6007
0.3785 13.5577 29000 0.4387 0.4919 0.3532 0.3785 0.6025
0.374 14.0252 30000 0.4385 0.4850 0.3469 0.3700 0.6025
0.3759 14.4928 31000 0.4391 0.5011 0.3589 0.3857 0.6015
0.3711 14.9603 32000 0.4398 0.4812 0.3468 0.3704 0.6001
0.3662 15.4278 33000 0.4415 0.4911 0.3549 0.3769 0.6003
0.3672 15.8953 34000 0.4399 0.4855 0.3566 0.3819 0.6012
0.3622 16.3628 35000 0.4407 0.4824 0.3624 0.3874 0.6015
0.3628 16.8303 36000 0.4409 0.4931 0.3574 0.3818 0.6020
0.3589 17.2978 37000 0.4423 0.4916 0.3704 0.3954 0.6012
0.3589 17.7653 38000 0.4419 0.4931 0.3696 0.3961 0.6012
0.3646 18.2328 39000 0.4410 0.4852 0.3510 0.3736 0.6003
0.3611 18.7003 40000 0.4409 0.4878 0.3593 0.3851 0.6007
0.3555 19.1678 41000 0.4416 0.4874 0.3613 0.3882 0.5999
0.3559 19.6353 42000 0.4411 0.4917 0.3601 0.3861 0.6012

Framework versions

  • Transformers 4.43.3
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
326M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.