antonpolishko's picture
End of training
f64a21e verified
---
base_model: PolyAgent/mistral-7b-v0.3-ua-tokenizer-v2-focus-base
tags:
- alignment-handbook
- trl
- sft
- generated_from_trainer
- trl
- sft
- generated_from_trainer
datasets:
- PolyAgent/wiki_uk_en_parallel
model-index:
- name: mistral-v0.3-tokV2-gentle-train
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# mistral-v0.3-tokV2-gentle-train
This model is a fine-tuned version of [PolyAgent/mistral-7b-v0.3-ua-tokenizer-v2-focus-base](https://huggingface.co/PolyAgent/mistral-7b-v0.3-ua-tokenizer-v2-focus-base) on the PolyAgent/wiki_uk_en_parallel dataset.
It achieves the following results on the evaluation set:
- Loss: 1.0765
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 7.5e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
### Training results
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:------:|:------:|:---------------:|
| 2.0964 | 0.0147 | 500 | 2.0850 |
| 1.8225 | 0.0295 | 1000 | 1.6206 |
| 1.705 | 0.0442 | 1500 | 1.4585 |
| 1.6304 | 0.0590 | 2000 | 1.3826 |
| 1.5948 | 0.0737 | 2500 | 1.3408 |
| 1.6226 | 0.0885 | 3000 | 1.3155 |
| 1.5106 | 0.1032 | 3500 | 1.2971 |
| 1.5651 | 0.1179 | 4000 | 1.2854 |
| 1.4974 | 0.1327 | 4500 | 1.2763 |
| 1.5642 | 0.1474 | 5000 | 1.2688 |
| 1.519 | 0.1622 | 5500 | 1.2617 |
| 1.4893 | 0.1769 | 6000 | 1.2575 |
| 1.5136 | 0.1917 | 6500 | 1.2555 |
| 1.5036 | 0.2064 | 7000 | 1.2513 |
| 1.5536 | 0.2211 | 7500 | 1.2497 |
| 1.5096 | 0.2359 | 8000 | 1.2488 |
| 1.5529 | 0.2506 | 8500 | 1.2481 |
| 1.5534 | 0.2654 | 9000 | 1.2464 |
| 1.5335 | 0.2801 | 9500 | 1.2467 |
| 1.5538 | 0.2949 | 10000 | 1.2465 |
| 1.5036 | 0.3096 | 10500 | 1.2422 |
| 1.5709 | 0.3243 | 11000 | 1.2401 |
| 1.625 | 0.3391 | 11500 | 1.2361 |
| 1.5534 | 0.3538 | 12000 | 1.2318 |
| 1.467 | 0.3686 | 12500 | 1.2277 |
| 1.4511 | 0.3833 | 13000 | 1.2229 |
| 1.5157 | 0.3981 | 13500 | 1.2189 |
| 1.4941 | 0.4128 | 14000 | 1.2161 |
| 1.5154 | 0.4275 | 14500 | 1.2133 |
| 1.5121 | 0.4423 | 15000 | 1.2090 |
| 1.4698 | 0.4570 | 15500 | 1.2060 |
| 1.5629 | 0.4718 | 16000 | 1.2029 |
| 1.5336 | 0.4865 | 16500 | 1.2004 |
| 1.5355 | 0.5013 | 17000 | 1.1981 |
| 1.4291 | 0.5160 | 17500 | 1.1945 |
| 1.5137 | 0.5307 | 18000 | 1.1933 |
| 1.5303 | 0.5455 | 18500 | 1.1906 |
| 1.5045 | 0.5602 | 19000 | 1.1881 |
| 1.4674 | 0.5750 | 19500 | 1.1854 |
| 1.518 | 0.5897 | 20000 | 1.1823 |
| 1.5104 | 0.6045 | 20500 | 1.1794 |
| 1.4874 | 0.6192 | 21000 | 1.1786 |
| 1.5025 | 0.6339 | 21500 | 1.1764 |
| 1.4493 | 0.6487 | 22000 | 1.1728 |
| 1.5114 | 0.6634 | 22500 | 1.1722 |
| 1.5394 | 0.6782 | 23000 | 1.1707 |
| 1.5466 | 0.6929 | 23500 | 1.1679 |
| 1.5046 | 0.7077 | 24000 | 1.1660 |
| 1.5397 | 0.7224 | 24500 | 1.1631 |
| 1.5111 | 0.7371 | 25000 | 1.1623 |
| 1.4707 | 0.7519 | 25500 | 1.1605 |
| 1.5201 | 0.7666 | 26000 | 1.1586 |
| 1.5511 | 0.7814 | 26500 | 1.1568 |
| 1.4773 | 0.7961 | 27000 | 1.1550 |
| 1.5146 | 0.8109 | 27500 | 1.1533 |
| 1.4789 | 0.8256 | 28000 | 1.1513 |
| 1.4949 | 0.8403 | 28500 | 1.1488 |
| 1.5116 | 0.8551 | 29000 | 1.1471 |
| 1.4338 | 0.8698 | 29500 | 1.1453 |
| 1.4656 | 0.8846 | 30000 | 1.1446 |
| 1.4542 | 0.8993 | 30500 | 1.1427 |
| 1.5095 | 0.9140 | 31000 | 1.1415 |
| 1.5156 | 0.9288 | 31500 | 1.1399 |
| 1.4379 | 0.9435 | 32000 | 1.1390 |
| 1.4185 | 0.9583 | 32500 | 1.1373 |
| 1.4765 | 0.9730 | 33000 | 1.1355 |
| 1.453 | 0.9878 | 33500 | 1.1345 |
| 1.2859 | 1.0025 | 34000 | 1.1370 |
| 1.3039 | 1.0172 | 34500 | 1.1342 |
| 1.2991 | 1.0320 | 35000 | 1.1314 |
| 1.3258 | 1.0467 | 35500 | 1.1301 |
| 1.3229 | 1.0615 | 36000 | 1.1295 |
| 1.2872 | 1.0762 | 36500 | 1.1290 |
| 1.346 | 1.0910 | 37000 | 1.1260 |
| 1.3494 | 1.1057 | 37500 | 1.1255 |
| 1.3234 | 1.1204 | 38000 | 1.1247 |
| 1.2964 | 1.1352 | 38500 | 1.1405 |
| 1.34 | 1.1499 | 39000 | 1.1226 |
| 1.316 | 1.1647 | 39500 | 1.1214 |
| 1.3232 | 1.1794 | 40000 | 1.1206 |
| 1.3175 | 1.1942 | 40500 | 1.1212 |
| 1.2516 | 1.2089 | 41000 | 1.1191 |
| 1.3323 | 1.2236 | 41500 | 1.1180 |
| 1.3046 | 1.2384 | 42000 | 1.1174 |
| 1.3659 | 1.2531 | 42500 | 1.1151 |
| 1.3582 | 1.2679 | 43000 | 1.1137 |
| 1.2981 | 1.2826 | 43500 | 1.1128 |
| 1.3262 | 1.2974 | 44000 | 1.1116 |
| 1.326 | 1.3121 | 44500 | 1.1101 |
| 1.3025 | 1.3268 | 45000 | 1.1106 |
| 1.271 | 1.3416 | 45500 | 1.1087 |
| 1.2566 | 1.3563 | 46000 | 1.1075 |
| 1.3671 | 1.3711 | 46500 | 1.1071 |
| 1.2847 | 1.3858 | 47000 | 1.1040 |
| 1.3066 | 1.4006 | 47500 | 1.1036 |
| 1.2868 | 1.4153 | 48000 | 1.1024 |
| 1.326 | 1.4300 | 48500 | 1.1016 |
| 1.35 | 1.4448 | 49000 | 1.1009 |
| 1.3054 | 1.4595 | 49500 | 1.0998 |
| 1.3156 | 1.4743 | 50000 | 1.0976 |
| 1.333 | 1.4890 | 50500 | 1.0963 |
| 1.3592 | 1.5038 | 51000 | 1.0959 |
| 1.2748 | 1.5185 | 51500 | 1.0946 |
| 1.369 | 1.5332 | 52000 | 1.0936 |
| 1.3058 | 1.5480 | 52500 | 1.0922 |
| 1.3611 | 1.5627 | 53000 | 1.0916 |
| 1.331 | 1.5775 | 53500 | 1.0906 |
| 1.2905 | 1.5922 | 54000 | 1.0888 |
| 1.294 | 1.6070 | 54500 | 1.0879 |
| 1.3102 | 1.6217 | 55000 | 1.0868 |
| 1.2641 | 1.6364 | 55500 | 1.0858 |
| 1.2797 | 1.6512 | 56000 | 1.0845 |
| 1.2672 | 1.6659 | 56500 | 1.0836 |
| 1.3044 | 1.6807 | 57000 | 1.0823 |
| 1.2694 | 1.6954 | 57500 | 1.0815 |
| 1.2786 | 1.7102 | 58000 | 1.0809 |
| 1.2908 | 1.7249 | 58500 | 1.0790 |
| 1.3049 | 1.7396 | 59000 | 1.0785 |
| 1.2632 | 1.7544 | 59500 | 1.0772 |
| 1.2836 | 1.7691 | 60000 | 1.0755 |
| 1.3261 | 1.7839 | 60500 | 1.0741 |
| 1.3267 | 1.7986 | 61000 | 1.0727 |
| 1.2277 | 1.8134 | 61500 | 1.0722 |
| 1.2635 | 1.8281 | 62000 | 1.0711 |
| 1.249 | 1.8428 | 62500 | 1.0709 |
| 1.2996 | 1.8576 | 63000 | 1.0699 |
| 1.2934 | 1.8723 | 63500 | 1.0687 |
| 1.3182 | 1.8871 | 64000 | 1.0675 |
| 1.3103 | 1.9018 | 64500 | 1.0659 |
| 1.2764 | 1.9166 | 65000 | 1.0651 |
| 1.2848 | 1.9313 | 65500 | 1.0638 |
| 1.2924 | 1.9460 | 66000 | 1.0627 |
| 1.2897 | 1.9608 | 66500 | 1.0617 |
| 1.2819 | 1.9755 | 67000 | 1.0606 |
| 1.2331 | 1.9903 | 67500 | 1.0603 |
| 1.0402 | 2.0050 | 68000 | 1.0837 |
| 1.0995 | 2.0198 | 68500 | 1.0853 |
| 1.063 | 2.0345 | 69000 | 1.0859 |
| 1.0377 | 2.0492 | 69500 | 1.0869 |
| 1.0493 | 2.0640 | 70000 | 1.0864 |
| 1.0835 | 2.0787 | 70500 | 1.0869 |
| 1.0013 | 2.0935 | 71000 | 1.0877 |
| 1.0327 | 2.1082 | 71500 | 1.0870 |
| 1.0615 | 2.1230 | 72000 | 1.0855 |
| 1.043 | 2.1377 | 72500 | 1.0864 |
| 1.0476 | 2.1524 | 73000 | 1.0853 |
| 1.0105 | 2.1672 | 73500 | 1.0860 |
| 1.0314 | 2.1819 | 74000 | 1.0860 |
| 1.0527 | 2.1967 | 74500 | 1.0856 |
| 1.0589 | 2.2114 | 75000 | 1.0861 |
| 1.1094 | 2.2262 | 75500 | 1.0856 |
| 1.0562 | 2.2409 | 76000 | 1.0846 |
| 1.0623 | 2.2556 | 76500 | 1.0846 |
| 1.0518 | 2.2704 | 77000 | 1.0847 |
| 1.0461 | 2.2851 | 77500 | 1.0842 |
| 1.0185 | 2.2999 | 78000 | 1.0835 |
| 1.0673 | 2.3146 | 78500 | 1.0838 |
| 1.0243 | 2.3294 | 79000 | 1.0838 |
| 1.0381 | 2.3441 | 79500 | 1.0831 |
| 1.0179 | 2.3588 | 80000 | 1.0822 |
| 1.0524 | 2.3736 | 80500 | 1.0821 |
| 1.0364 | 2.3883 | 81000 | 1.0819 |
| 1.0421 | 2.4031 | 81500 | 1.0810 |
| 1.0878 | 2.4178 | 82000 | 1.0809 |
| 1.061 | 2.4326 | 82500 | 1.0807 |
| 1.004 | 2.4473 | 83000 | 1.0811 |
| 1.0488 | 2.4620 | 83500 | 1.0803 |
| 1.0406 | 2.4768 | 84000 | 1.0802 |
| 1.0586 | 2.4915 | 84500 | 1.0799 |
| 1.0113 | 2.5063 | 85000 | 1.0797 |
| 1.0478 | 2.5210 | 85500 | 1.0796 |
| 1.0241 | 2.5358 | 86000 | 1.0794 |
| 1.0523 | 2.5505 | 86500 | 1.0790 |
| 1.0275 | 2.5652 | 87000 | 1.0787 |
| 1.0601 | 2.5800 | 87500 | 1.0787 |
| 1.0519 | 2.5947 | 88000 | 1.0785 |
| 1.0243 | 2.6095 | 88500 | 1.0783 |
| 1.0071 | 2.6242 | 89000 | 1.0782 |
| 1.0251 | 2.6390 | 89500 | 1.0779 |
| 1.0334 | 2.6537 | 90000 | 1.0776 |
| 1.0144 | 2.6684 | 90500 | 1.0777 |
| 1.0212 | 2.6832 | 91000 | 1.0775 |
| 1.0367 | 2.6979 | 91500 | 1.0774 |
| 1.0551 | 2.7127 | 92000 | 1.0771 |
| 1.0576 | 2.7274 | 92500 | 1.0772 |
| 1.0058 | 2.7421 | 93000 | 1.0772 |
| 1.061 | 2.7569 | 93500 | 1.0768 |
| 1.0237 | 2.7716 | 94000 | 1.0767 |
| 1.0262 | 2.7864 | 94500 | 1.0767 |
| 1.0558 | 2.8011 | 95000 | 1.0767 |
| 1.0223 | 2.8159 | 95500 | 1.0768 |
| 1.0122 | 2.8306 | 96000 | 1.0769 |
| 1.0324 | 2.8453 | 96500 | 1.0765 |
| 1.0924 | 2.8601 | 97000 | 1.0766 |
| 1.0757 | 2.8748 | 97500 | 1.0765 |
| 1.0703 | 2.8896 | 98000 | 1.0766 |
| 1.0424 | 2.9043 | 98500 | 1.0766 |
| 1.055 | 2.9191 | 99000 | 1.0765 |
| 1.0556 | 2.9338 | 99500 | 1.0765 |
| 1.0383 | 2.9485 | 100000 | 1.0765 |
| 1.0245 | 2.9633 | 100500 | 1.0765 |
| 1.0212 | 2.9780 | 101000 | 1.0765 |
| 1.0432 | 2.9928 | 101500 | 1.0765 |
### Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1