metadata
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd1
results: []
collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd1
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1150
- Num Input Tokens Seen: 46709976
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3956 | 0 |
1.6431 | 0.0058 | 5 | 1.3921 | 264984 |
1.6241 | 0.0116 | 10 | 1.3641 | 535648 |
1.4051 | 0.0175 | 15 | 1.2992 | 800240 |
1.5876 | 0.0233 | 20 | 1.2507 | 1073112 |
1.3807 | 0.0291 | 25 | 1.1996 | 1347016 |
1.2183 | 0.0349 | 30 | 1.1734 | 1615616 |
1.1851 | 0.0407 | 35 | 1.1704 | 1880656 |
1.0742 | 0.0466 | 40 | 1.1514 | 2153152 |
1.0297 | 0.0524 | 45 | 1.1704 | 2424296 |
0.9492 | 0.0582 | 50 | 1.1980 | 2695960 |
0.7815 | 0.0640 | 55 | 1.2180 | 2962216 |
0.7623 | 0.0698 | 60 | 1.2289 | 3227360 |
0.7636 | 0.0756 | 65 | 1.2421 | 3500744 |
0.6613 | 0.0815 | 70 | 1.2315 | 3779408 |
0.6356 | 0.0873 | 75 | 1.2200 | 4053928 |
0.6672 | 0.0931 | 80 | 1.2323 | 4331008 |
0.5271 | 0.0989 | 85 | 1.2129 | 4602472 |
0.5037 | 0.1047 | 90 | 1.2189 | 4873224 |
0.4633 | 0.1106 | 95 | 1.2236 | 5147608 |
0.4207 | 0.1164 | 100 | 1.2128 | 5417824 |
0.4687 | 0.1222 | 105 | 1.2353 | 5696952 |
0.4262 | 0.1280 | 110 | 1.2156 | 5969312 |
0.337 | 0.1338 | 115 | 1.2143 | 6237376 |
0.4211 | 0.1397 | 120 | 1.2174 | 6503136 |
0.4425 | 0.1455 | 125 | 1.2196 | 6778056 |
0.3559 | 0.1513 | 130 | 1.2159 | 7044456 |
0.3527 | 0.1571 | 135 | 1.2179 | 7319944 |
0.3773 | 0.1629 | 140 | 1.2227 | 7595752 |
0.3808 | 0.1688 | 145 | 1.2085 | 7861112 |
0.2374 | 0.1746 | 150 | 1.2166 | 8132616 |
0.3131 | 0.1804 | 155 | 1.2032 | 8401248 |
0.3532 | 0.1862 | 160 | 1.2054 | 8674528 |
0.3413 | 0.1920 | 165 | 1.2023 | 8944928 |
0.2576 | 0.1978 | 170 | 1.2011 | 9216296 |
0.3644 | 0.2037 | 175 | 1.1936 | 9485720 |
0.3706 | 0.2095 | 180 | 1.1979 | 9756120 |
0.3878 | 0.2153 | 185 | 1.1899 | 10024952 |
0.2939 | 0.2211 | 190 | 1.1965 | 10299896 |
0.2989 | 0.2269 | 195 | 1.1915 | 10572560 |
0.2456 | 0.2328 | 200 | 1.1873 | 10846784 |
0.1816 | 0.2386 | 205 | 1.1903 | 11108672 |
0.239 | 0.2444 | 210 | 1.1874 | 11383016 |
0.2948 | 0.2502 | 215 | 1.1812 | 11661568 |
0.3583 | 0.2560 | 220 | 1.1855 | 11932480 |
0.2711 | 0.2619 | 225 | 1.1820 | 12199352 |
0.2866 | 0.2677 | 230 | 1.1806 | 12473000 |
0.3271 | 0.2735 | 235 | 1.1784 | 12744776 |
0.2437 | 0.2793 | 240 | 1.1869 | 13016800 |
0.2826 | 0.2851 | 245 | 1.1758 | 13288768 |
0.3314 | 0.2910 | 250 | 1.1756 | 13553368 |
0.2358 | 0.2968 | 255 | 1.1795 | 13821840 |
0.3026 | 0.3026 | 260 | 1.1727 | 14098640 |
0.2495 | 0.3084 | 265 | 1.1736 | 14374192 |
0.2872 | 0.3142 | 270 | 1.1759 | 14652616 |
0.2568 | 0.3200 | 275 | 1.1668 | 14938544 |
0.328 | 0.3259 | 280 | 1.1726 | 15213896 |
0.2527 | 0.3317 | 285 | 1.1672 | 15489920 |
0.3146 | 0.3375 | 290 | 1.1697 | 15759584 |
0.1983 | 0.3433 | 295 | 1.1637 | 16022096 |
0.2168 | 0.3491 | 300 | 1.1648 | 16293072 |
0.221 | 0.3550 | 305 | 1.1634 | 16561968 |
0.2198 | 0.3608 | 310 | 1.1575 | 16832872 |
0.2786 | 0.3666 | 315 | 1.1648 | 17105808 |
0.2173 | 0.3724 | 320 | 1.1617 | 17374320 |
0.2877 | 0.3782 | 325 | 1.1557 | 17647448 |
0.2808 | 0.3841 | 330 | 1.1635 | 17917328 |
0.2804 | 0.3899 | 335 | 1.1539 | 18193776 |
0.2334 | 0.3957 | 340 | 1.1550 | 18461696 |
0.2657 | 0.4015 | 345 | 1.1561 | 18724640 |
0.1737 | 0.4073 | 350 | 1.1495 | 19003296 |
0.2516 | 0.4132 | 355 | 1.1541 | 19280768 |
0.3132 | 0.4190 | 360 | 1.1517 | 19550224 |
0.2641 | 0.4248 | 365 | 1.1507 | 19819248 |
0.2591 | 0.4306 | 370 | 1.1512 | 20094128 |
0.215 | 0.4364 | 375 | 1.1495 | 20367488 |
0.2797 | 0.4422 | 380 | 1.1492 | 20642144 |
0.2126 | 0.4481 | 385 | 1.1504 | 20917808 |
0.3087 | 0.4539 | 390 | 1.1475 | 21184408 |
0.1771 | 0.4597 | 395 | 1.1451 | 21452160 |
0.2377 | 0.4655 | 400 | 1.1483 | 21718312 |
0.2599 | 0.4713 | 405 | 1.1438 | 21994088 |
0.2298 | 0.4772 | 410 | 1.1431 | 22268376 |
0.2435 | 0.4830 | 415 | 1.1492 | 22538448 |
0.2934 | 0.4888 | 420 | 1.1446 | 22814416 |
0.2063 | 0.4946 | 425 | 1.1420 | 23086096 |
0.2981 | 0.5004 | 430 | 1.1409 | 23356416 |
0.3038 | 0.5063 | 435 | 1.1370 | 23619720 |
0.2098 | 0.5121 | 440 | 1.1403 | 23892528 |
0.2594 | 0.5179 | 445 | 1.1364 | 24168608 |
0.2006 | 0.5237 | 450 | 1.1346 | 24440016 |
0.2499 | 0.5295 | 455 | 1.1387 | 24709424 |
0.2211 | 0.5354 | 460 | 1.1360 | 24990024 |
0.1828 | 0.5412 | 465 | 1.1355 | 25265688 |
0.2653 | 0.5470 | 470 | 1.1351 | 25542568 |
0.2151 | 0.5528 | 475 | 1.1322 | 25813784 |
0.1926 | 0.5586 | 480 | 1.1341 | 26086664 |
0.157 | 0.5644 | 485 | 1.1341 | 26356544 |
0.1892 | 0.5703 | 490 | 1.1333 | 26635400 |
0.3106 | 0.5761 | 495 | 1.1305 | 26904360 |
0.1614 | 0.5819 | 500 | 1.1326 | 27184040 |
0.167 | 0.5877 | 505 | 1.1315 | 27455864 |
0.3046 | 0.5935 | 510 | 1.1295 | 27728928 |
0.2692 | 0.5994 | 515 | 1.1292 | 27996232 |
0.2264 | 0.6052 | 520 | 1.1299 | 28263184 |
0.1975 | 0.6110 | 525 | 1.1311 | 28537608 |
0.2387 | 0.6168 | 530 | 1.1294 | 28813056 |
0.2272 | 0.6226 | 535 | 1.1277 | 29084176 |
0.2716 | 0.6285 | 540 | 1.1277 | 29362624 |
0.2671 | 0.6343 | 545 | 1.1302 | 29628688 |
0.2361 | 0.6401 | 550 | 1.1273 | 29902968 |
0.1919 | 0.6459 | 555 | 1.1264 | 30172120 |
0.2404 | 0.6517 | 560 | 1.1290 | 30445752 |
0.2735 | 0.6576 | 565 | 1.1292 | 30716584 |
0.2106 | 0.6634 | 570 | 1.1254 | 30985528 |
0.1771 | 0.6692 | 575 | 1.1271 | 31253904 |
0.261 | 0.6750 | 580 | 1.1262 | 31526560 |
0.2022 | 0.6808 | 585 | 1.1257 | 31801248 |
0.239 | 0.6866 | 590 | 1.1262 | 32080248 |
0.1411 | 0.6925 | 595 | 1.1238 | 32355240 |
0.1716 | 0.6983 | 600 | 1.1274 | 32629448 |
0.2373 | 0.7041 | 605 | 1.1242 | 32897784 |
0.1741 | 0.7099 | 610 | 1.1234 | 33176000 |
0.259 | 0.7157 | 615 | 1.1246 | 33445920 |
0.2467 | 0.7216 | 620 | 1.1224 | 33721408 |
0.2188 | 0.7274 | 625 | 1.1226 | 33995480 |
0.1498 | 0.7332 | 630 | 1.1243 | 34263184 |
0.2158 | 0.7390 | 635 | 1.1216 | 34536136 |
0.2308 | 0.7448 | 640 | 1.1227 | 34806344 |
0.2639 | 0.7507 | 645 | 1.1227 | 35081560 |
0.2219 | 0.7565 | 650 | 1.1216 | 35351968 |
0.2636 | 0.7623 | 655 | 1.1224 | 35634136 |
0.1625 | 0.7681 | 660 | 1.1211 | 35901584 |
0.2168 | 0.7739 | 665 | 1.1206 | 36168800 |
0.2333 | 0.7797 | 670 | 1.1210 | 36440928 |
0.2245 | 0.7856 | 675 | 1.1210 | 36707552 |
0.2281 | 0.7914 | 680 | 1.1211 | 36972984 |
0.1866 | 0.7972 | 685 | 1.1199 | 37242600 |
0.1528 | 0.8030 | 690 | 1.1214 | 37508736 |
0.1361 | 0.8088 | 695 | 1.1217 | 37776072 |
0.156 | 0.8147 | 700 | 1.1211 | 38051040 |
0.2196 | 0.8205 | 705 | 1.1203 | 38327440 |
0.1363 | 0.8263 | 710 | 1.1184 | 38599216 |
0.2088 | 0.8321 | 715 | 1.1204 | 38873840 |
0.1602 | 0.8379 | 720 | 1.1188 | 39142536 |
0.2504 | 0.8438 | 725 | 1.1176 | 39419640 |
0.212 | 0.8496 | 730 | 1.1182 | 39686288 |
0.2011 | 0.8554 | 735 | 1.1174 | 39958408 |
0.2094 | 0.8612 | 740 | 1.1175 | 40237848 |
0.2247 | 0.8670 | 745 | 1.1185 | 40505448 |
0.2153 | 0.8729 | 750 | 1.1166 | 40764696 |
0.1854 | 0.8787 | 755 | 1.1185 | 41037240 |
0.1969 | 0.8845 | 760 | 1.1189 | 41307488 |
0.158 | 0.8903 | 765 | 1.1161 | 41580360 |
0.2089 | 0.8961 | 770 | 1.1168 | 41856912 |
0.2437 | 0.9019 | 775 | 1.1183 | 42129208 |
0.1061 | 0.9078 | 780 | 1.1141 | 42409088 |
0.2042 | 0.9136 | 785 | 1.1142 | 42679096 |
0.2322 | 0.9194 | 790 | 1.1196 | 42942296 |
0.1988 | 0.9252 | 795 | 1.1157 | 43218264 |
0.2296 | 0.9310 | 800 | 1.1132 | 43483408 |
0.2049 | 0.9369 | 805 | 1.1147 | 43751320 |
0.2268 | 0.9427 | 810 | 1.1136 | 44025576 |
0.1569 | 0.9485 | 815 | 1.1144 | 44300560 |
0.2396 | 0.9543 | 820 | 1.1137 | 44569304 |
0.1751 | 0.9601 | 825 | 1.1126 | 44852352 |
0.2868 | 0.9660 | 830 | 1.1159 | 45129736 |
0.2112 | 0.9718 | 835 | 1.1127 | 45402976 |
0.2482 | 0.9776 | 840 | 1.1118 | 45672168 |
0.1128 | 0.9834 | 845 | 1.1142 | 45945280 |
0.2071 | 0.9892 | 850 | 1.1129 | 46216976 |
0.1574 | 0.9951 | 855 | 1.1140 | 46496856 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1