mistral-v0.3-tokV2-very-gentle-train
This model is a fine-tuned version of PolyAgent/mistral-7b-v0.3-ua-tokenizer-v2-focus-base on the PolyAgent/wiki_uk_en_parallel dataset. It achieves the following results on the evaluation set:
- Loss: 1.0794
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
2.4656 | 0.0147 | 500 | 2.6964 |
2.1663 | 0.0295 | 1000 | 2.0577 |
1.9718 | 0.0442 | 1500 | 1.7605 |
1.8011 | 0.0590 | 2000 | 1.5963 |
1.7187 | 0.0737 | 2500 | 1.4980 |
1.7068 | 0.0885 | 3000 | 1.4345 |
1.5762 | 0.1032 | 3500 | 1.3901 |
1.6034 | 0.1179 | 4000 | 1.3583 |
1.5307 | 0.1327 | 4500 | 1.3345 |
1.5797 | 0.1474 | 5000 | 1.3149 |
1.5406 | 0.1622 | 5500 | 1.2981 |
1.4967 | 0.1769 | 6000 | 1.2842 |
1.5091 | 0.1917 | 6500 | 1.2743 |
1.4983 | 0.2064 | 7000 | 1.2644 |
1.5383 | 0.2211 | 7500 | 1.2551 |
1.4808 | 0.2359 | 8000 | 1.2488 |
1.5209 | 0.2506 | 8500 | 1.2422 |
1.5084 | 0.2654 | 9000 | 1.2368 |
1.4966 | 0.2801 | 9500 | 1.2311 |
1.502 | 0.2949 | 10000 | 1.2265 |
1.4551 | 0.3096 | 10500 | 1.2209 |
1.5133 | 0.3243 | 11000 | 1.2160 |
1.5641 | 0.3391 | 11500 | 1.2110 |
1.4971 | 0.3538 | 12000 | 1.2062 |
1.4083 | 0.3686 | 12500 | 1.2011 |
1.4006 | 0.3833 | 13000 | 1.1972 |
1.4621 | 0.3981 | 13500 | 1.1933 |
1.4309 | 0.4128 | 14000 | 1.1899 |
1.4525 | 0.4275 | 14500 | 1.1866 |
1.4487 | 0.4423 | 15000 | 1.1822 |
1.4194 | 0.4570 | 15500 | 1.1797 |
1.5015 | 0.4718 | 16000 | 1.1755 |
1.4728 | 0.4865 | 16500 | 1.1733 |
1.4835 | 0.5013 | 17000 | 1.1709 |
1.3745 | 0.5160 | 17500 | 1.1676 |
1.4538 | 0.5307 | 18000 | 1.1660 |
1.4727 | 0.5455 | 18500 | 1.1634 |
1.4433 | 0.5602 | 19000 | 1.1610 |
1.4041 | 0.5750 | 19500 | 1.1586 |
1.453 | 0.5897 | 20000 | 1.1558 |
1.4468 | 0.6045 | 20500 | 1.1540 |
1.4255 | 0.6192 | 21000 | 1.1527 |
1.4433 | 0.6339 | 21500 | 1.1499 |
1.3966 | 0.6487 | 22000 | 1.1479 |
1.444 | 0.6634 | 22500 | 1.1460 |
1.4755 | 0.6782 | 23000 | 1.1445 |
1.4847 | 0.6929 | 23500 | 1.1422 |
1.4396 | 0.7077 | 24000 | 1.1403 |
1.4788 | 0.7224 | 24500 | 1.1385 |
1.4549 | 0.7371 | 25000 | 1.1371 |
1.4158 | 0.7519 | 25500 | 1.1354 |
1.4644 | 0.7666 | 26000 | 1.1339 |
1.4829 | 0.7814 | 26500 | 1.1321 |
1.4192 | 0.7961 | 27000 | 1.1311 |
1.4543 | 0.8109 | 27500 | 1.1294 |
1.421 | 0.8256 | 28000 | 1.1273 |
1.4356 | 0.8403 | 28500 | 1.1257 |
1.4578 | 0.8551 | 29000 | 1.1244 |
1.3806 | 0.8698 | 29500 | 1.1230 |
1.4083 | 0.8846 | 30000 | 1.1216 |
1.4052 | 0.8993 | 30500 | 1.1201 |
1.4548 | 0.9140 | 31000 | 1.1192 |
1.4565 | 0.9288 | 31500 | 1.1181 |
1.379 | 0.9435 | 32000 | 1.1170 |
1.3712 | 0.9583 | 32500 | 1.1159 |
1.421 | 0.9730 | 33000 | 1.1142 |
1.4017 | 0.9878 | 33500 | 1.1132 |
1.2819 | 1.0025 | 34000 | 1.1160 |
1.31 | 1.0172 | 34500 | 1.1152 |
1.2922 | 1.0320 | 35000 | 1.1132 |
1.3231 | 1.0467 | 35500 | 1.1123 |
1.311 | 1.0615 | 36000 | 1.1114 |
1.2839 | 1.0762 | 36500 | 1.1114 |
1.3421 | 1.0910 | 37000 | 1.1093 |
1.3418 | 1.1057 | 37500 | 1.1096 |
1.3195 | 1.1204 | 38000 | 1.1085 |
1.2889 | 1.1352 | 38500 | 1.1077 |
1.3366 | 1.1499 | 39000 | 1.1073 |
1.3086 | 1.1647 | 39500 | 1.1061 |
1.3088 | 1.1794 | 40000 | 1.1050 |
1.3026 | 1.1942 | 40500 | 1.1047 |
1.2416 | 1.2089 | 41000 | 1.1034 |
1.3181 | 1.2236 | 41500 | 1.1029 |
1.2928 | 1.2384 | 42000 | 1.1027 |
1.3548 | 1.2531 | 42500 | 1.1018 |
1.35 | 1.2679 | 43000 | 1.1010 |
1.2819 | 1.2826 | 43500 | 1.1005 |
1.3093 | 1.2974 | 44000 | 1.0994 |
1.315 | 1.3121 | 44500 | 1.0984 |
1.2927 | 1.3268 | 45000 | 1.0984 |
1.26 | 1.3416 | 45500 | 1.0976 |
1.2492 | 1.3563 | 46000 | 1.0968 |
1.3627 | 1.3711 | 46500 | 1.0964 |
1.2716 | 1.3858 | 47000 | 1.0952 |
1.2866 | 1.4006 | 47500 | 1.0949 |
1.2783 | 1.4153 | 48000 | 1.0938 |
1.3093 | 1.4300 | 48500 | 1.0933 |
1.3374 | 1.4448 | 49000 | 1.0925 |
1.2955 | 1.4595 | 49500 | 1.0912 |
1.3025 | 1.4743 | 50000 | 1.0904 |
1.3209 | 1.4890 | 50500 | 1.0895 |
1.3444 | 1.5038 | 51000 | 1.0898 |
1.2695 | 1.5185 | 51500 | 1.0888 |
1.3587 | 1.5332 | 52000 | 1.0881 |
1.2888 | 1.5480 | 52500 | 1.0878 |
1.3551 | 1.5627 | 53000 | 1.0867 |
1.3188 | 1.5775 | 53500 | 1.0860 |
1.2849 | 1.5922 | 54000 | 1.0850 |
1.2896 | 1.6070 | 54500 | 1.0847 |
1.3035 | 1.6217 | 55000 | 1.0834 |
1.2531 | 1.6364 | 55500 | 1.0827 |
1.2748 | 1.6512 | 56000 | 1.0825 |
1.2596 | 1.6659 | 56500 | 1.0820 |
1.3008 | 1.6807 | 57000 | 1.0809 |
1.2632 | 1.6954 | 57500 | 1.0808 |
1.2664 | 1.7102 | 58000 | 1.0802 |
1.2913 | 1.7249 | 58500 | 1.0791 |
1.3078 | 1.7396 | 59000 | 1.0789 |
1.2613 | 1.7544 | 59500 | 1.0779 |
1.279 | 1.7691 | 60000 | 1.0770 |
1.3258 | 1.7839 | 60500 | 1.0769 |
1.3265 | 1.7986 | 61000 | 1.0760 |
1.2333 | 1.8134 | 61500 | 1.0756 |
1.2606 | 1.8281 | 62000 | 1.0749 |
1.2564 | 1.8428 | 62500 | 1.0746 |
1.2927 | 1.8576 | 63000 | 1.0744 |
1.2963 | 1.8723 | 63500 | 1.0736 |
1.3131 | 1.8871 | 64000 | 1.0737 |
1.3077 | 1.9018 | 64500 | 1.0722 |
1.2838 | 1.9166 | 65000 | 1.0718 |
1.2938 | 1.9313 | 65500 | 1.0709 |
1.2985 | 1.9460 | 66000 | 1.0705 |
1.294 | 1.9608 | 66500 | 1.0698 |
1.2834 | 1.9755 | 67000 | 1.0694 |
1.2378 | 1.9903 | 67500 | 1.0688 |
1.1446 | 2.0050 | 68000 | 1.0834 |
1.2019 | 2.0198 | 68500 | 1.0854 |
1.1706 | 2.0345 | 69000 | 1.0853 |
1.153 | 2.0492 | 69500 | 1.0852 |
1.1548 | 2.0640 | 70000 | 1.0855 |
1.1994 | 2.0787 | 70500 | 1.0852 |
1.1133 | 2.0935 | 71000 | 1.0855 |
1.1443 | 2.1082 | 71500 | 1.0850 |
1.1754 | 2.1230 | 72000 | 1.0851 |
1.1594 | 2.1377 | 72500 | 1.0851 |
1.1668 | 2.1524 | 73000 | 1.0844 |
1.1224 | 2.1672 | 73500 | 1.0847 |
1.1438 | 2.1819 | 74000 | 1.0842 |
1.1648 | 2.1967 | 74500 | 1.0843 |
1.1736 | 2.2114 | 75000 | 1.0843 |
1.2279 | 2.2262 | 75500 | 1.0842 |
1.1687 | 2.2409 | 76000 | 1.0837 |
1.1709 | 2.2556 | 76500 | 1.0840 |
1.1656 | 2.2704 | 77000 | 1.0839 |
1.1596 | 2.2851 | 77500 | 1.0841 |
1.1357 | 2.2999 | 78000 | 1.0834 |
1.1963 | 2.3146 | 78500 | 1.0833 |
1.1363 | 2.3294 | 79000 | 1.0834 |
1.1509 | 2.3441 | 79500 | 1.0830 |
1.1404 | 2.3588 | 80000 | 1.0828 |
1.1689 | 2.3736 | 80500 | 1.0826 |
1.1538 | 2.3883 | 81000 | 1.0824 |
1.1658 | 2.4031 | 81500 | 1.0823 |
1.2027 | 2.4178 | 82000 | 1.0819 |
1.1737 | 2.4326 | 82500 | 1.0812 |
1.1255 | 2.4473 | 83000 | 1.0816 |
1.1715 | 2.4620 | 83500 | 1.0816 |
1.1595 | 2.4768 | 84000 | 1.0814 |
1.1823 | 2.4915 | 84500 | 1.0811 |
1.1286 | 2.5063 | 85000 | 1.0808 |
1.1662 | 2.5210 | 85500 | 1.0810 |
1.1452 | 2.5358 | 86000 | 1.0808 |
1.1721 | 2.5505 | 86500 | 1.0804 |
1.1431 | 2.5652 | 87000 | 1.0808 |
1.1765 | 2.5800 | 87500 | 1.0804 |
1.174 | 2.5947 | 88000 | 1.0803 |
1.14 | 2.6095 | 88500 | 1.0803 |
1.1254 | 2.6242 | 89000 | 1.0802 |
1.1414 | 2.6390 | 89500 | 1.0803 |
1.1475 | 2.6537 | 90000 | 1.0802 |
1.1376 | 2.6684 | 90500 | 1.0799 |
1.1467 | 2.6832 | 91000 | 1.0799 |
1.1584 | 2.6979 | 91500 | 1.0799 |
1.1765 | 2.7127 | 92000 | 1.0795 |
1.1756 | 2.7274 | 92500 | 1.0797 |
1.1257 | 2.7421 | 93000 | 1.0797 |
1.1762 | 2.7569 | 93500 | 1.0795 |
1.149 | 2.7716 | 94000 | 1.0796 |
1.1385 | 2.7864 | 94500 | 1.0794 |
1.1809 | 2.8011 | 95000 | 1.0794 |
1.1464 | 2.8159 | 95500 | 1.0796 |
1.127 | 2.8306 | 96000 | 1.0796 |
1.1517 | 2.8453 | 96500 | 1.0792 |
1.2209 | 2.8601 | 97000 | 1.0794 |
1.1988 | 2.8748 | 97500 | 1.0793 |
1.2008 | 2.8896 | 98000 | 1.0794 |
1.1616 | 2.9043 | 98500 | 1.0793 |
1.1783 | 2.9191 | 99000 | 1.0793 |
1.1751 | 2.9338 | 99500 | 1.0794 |
1.1662 | 2.9485 | 100000 | 1.0794 |
1.1464 | 2.9633 | 100500 | 1.0794 |
1.1439 | 2.9780 | 101000 | 1.0793 |
1.1611 | 2.9928 | 101500 | 1.0793 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1
- Downloads last month
- 2
Model tree for antonpolishko/mistral-v0.3-tokV2-very-gentle-train
Finetuned
this model