INFO: 2024-10-16 01:02:49,290: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom'] INFO: 2024-10-16 01:02:49,293: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-16 01:02:49,293: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-16 01:02:49,570: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu'] INFO: 2024-10-16 01:02:49,571: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-16 01:02:49,571: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-16 01:02:51,977: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] INFO: 2024-10-16 01:02:51,979: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-16 01:02:51,979: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-16 01:02:53,943: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] INFO: 2024-10-16 01:02:53,943: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-16 01:02:53,943: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-16 01:02:55,483: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] INFO: 2024-10-16 01:02:55,483: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-16 01:02:55,483: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-16 01:02:57,009: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] INFO: 2024-10-16 01:02:57,010: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-16 01:02:57,010: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-16 01:02:58,829: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en'] INFO: 2024-10-16 01:02:58,830: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-16 01:02:58,830: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-16 01:03:03,414: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 4.58s INFO: 2024-10-16 01:03:10,009: llmtf.base.daru/treewayextractive: Loading Dataset: 13.00s INFO: 2024-10-16 01:03:12,441: llmtf.base.darumeru/MultiQ: Loading Dataset: 23.15s INFO: 2024-10-16 01:03:14,674: llmtf.base.daru/treewayabstractive: Loading Dataset: 19.19s INFO: 2024-10-16 01:04:20,981: llmtf.base.darumeru/ruMMLU: Loading Dataset: 91.41s INFO: 2024-10-16 01:06:08,702: llmtf.base.daru/treewayextractive: Processing Dataset: 178.69s INFO: 2024-10-16 01:06:08,705: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: INFO: 2024-10-16 01:06:08,931: llmtf.base.daru/treewayextractive: {'r-prec': 0.392920202020202} INFO: 2024-10-16 01:06:08,968: llmtf.base.evaluator: Ended eval INFO: 2024-10-16 01:06:08,972: llmtf.base.evaluator: mean daru/treewayextractive 0.393 0.393 INFO: 2024-10-16 01:06:21,426: llmtf.base.darumeru/MultiQ: Processing Dataset: 188.98s INFO: 2024-10-16 01:06:21,427: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: INFO: 2024-10-16 01:06:21,432: llmtf.base.darumeru/MultiQ: {'f1': 0.49389044741748994, 'em': 0.3738049713193117} INFO: 2024-10-16 01:06:21,442: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-16 01:06:21,442: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-16 01:06:24,486: llmtf.base.darumeru/PARus: Loading Dataset: 3.04s INFO: 2024-10-16 01:06:30,785: llmtf.base.darumeru/PARus: Processing Dataset: 6.30s INFO: 2024-10-16 01:06:30,787: llmtf.base.darumeru/PARus: Results for darumeru/PARus: INFO: 2024-10-16 01:06:30,815: llmtf.base.darumeru/PARus: {'acc': 0.42} INFO: 2024-10-16 01:06:30,816: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-16 01:06:30,816: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-16 01:06:34,602: llmtf.base.darumeru/RCB: Loading Dataset: 3.79s INFO: 2024-10-16 01:06:35,622: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 221.68s INFO: 2024-10-16 01:06:45,442: llmtf.base.darumeru/RCB: Processing Dataset: 10.84s INFO: 2024-10-16 01:06:45,444: llmtf.base.darumeru/RCB: Results for darumeru/RCB: INFO: 2024-10-16 01:06:45,451: llmtf.base.darumeru/RCB: {'acc': 0.5, 'f1_macro': 0.4788148636316176} INFO: 2024-10-16 01:06:45,459: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-16 01:06:45,460: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-16 01:07:00,710: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 15.25s INFO: 2024-10-16 01:07:31,366: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 279.39s INFO: 2024-10-16 01:08:12,346: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 71.63s INFO: 2024-10-16 01:08:12,347: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: INFO: 2024-10-16 01:08:12,361: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.5451030927835051, 'f1_macro': 0.5357593897587198} INFO: 2024-10-16 01:08:12,377: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-16 01:08:12,378: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-16 01:08:15,157: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.78s INFO: 2024-10-16 01:08:19,267: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 4.11s INFO: 2024-10-16 01:08:19,268: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: INFO: 2024-10-16 01:08:19,273: llmtf.base.darumeru/ruWorldTree: {'acc': 0.7238095238095238, 'f1_macro': 0.719567254381039} INFO: 2024-10-16 01:08:19,274: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-16 01:08:19,274: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-16 01:08:23,048: llmtf.base.darumeru/RWSD: Loading Dataset: 3.77s INFO: 2024-10-16 01:08:32,809: llmtf.base.darumeru/RWSD: Processing Dataset: 9.76s INFO: 2024-10-16 01:08:32,811: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: INFO: 2024-10-16 01:08:32,815: llmtf.base.darumeru/RWSD: {'acc': 0.43137254901960786} INFO: 2024-10-16 01:08:32,817: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-16 01:08:32,817: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-16 01:08:48,749: llmtf.base.darumeru/USE: Loading Dataset: 15.93s INFO: 2024-10-16 01:09:43,090: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 399.68s INFO: 2024-10-16 01:09:43,093: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru: INFO: 2024-10-16 01:09:43,097: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 3.900423523440511, 'len': 0.9417878040943446, 'lcs': 0.6981519507186859} INFO: 2024-10-16 01:09:43,101: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-16 01:09:43,101: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-16 01:09:46,551: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 3.45s INFO: 2024-10-16 01:12:13,957: llmtf.base.darumeru/USE: Processing Dataset: 205.20s INFO: 2024-10-16 01:12:13,963: llmtf.base.darumeru/USE: Results for darumeru/USE: INFO: 2024-10-16 01:12:13,968: llmtf.base.darumeru/USE: {'grade_norm': 0.049999999999999996} INFO: 2024-10-16 01:12:13,975: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-16 01:12:13,975: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-16 01:12:35,082: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 21.11s INFO: 2024-10-16 01:14:03,862: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 88.78s INFO: 2024-10-16 01:14:03,866: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom: INFO: 2024-10-16 01:14:03,877: llmtf.base.russiannlp/rucola_custom: {'acc': 0.4628632938643703, 'mcc': 0.14354674065192544} INFO: 2024-10-16 01:14:03,888: llmtf.base.evaluator: Ended eval INFO: 2024-10-16 01:14:03,934: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_ru darumeru/ruOpenBookQA darumeru/ruWorldTree russiannlp/rucola_custom 0.472 0.393 0.434 0.420 0.489 0.431 0.050 0.942 0.540 0.722 0.303 INFO: 2024-10-16 01:14:37,979: llmtf.base.darumeru/ruMMLU: Processing Dataset: 616.99s INFO: 2024-10-16 01:14:37,996: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU: INFO: 2024-10-16 01:14:38,020: llmtf.base.darumeru/ruMMLU: {'acc': 0.37793075925371644} INFO: 2024-10-16 01:14:38,092: llmtf.base.evaluator: Ended eval INFO: 2024-10-16 01:14:38,102: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree russiannlp/rucola_custom 0.464 0.393 0.434 0.420 0.489 0.431 0.050 0.942 0.378 0.540 0.722 0.303 INFO: 2024-10-16 01:16:29,537: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 402.97s INFO: 2024-10-16 01:16:29,540: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en: INFO: 2024-10-16 01:16:29,544: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.420697143481598, 'len': 0.9613784781803254, 'lcs': 1.0} INFO: 2024-10-16 01:16:29,547: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-16 01:16:29,547: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-16 01:16:32,809: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 3.26s INFO: 2024-10-16 01:17:34,153: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 658.53s INFO: 2024-10-16 01:17:34,155: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: INFO: 2024-10-16 01:17:34,201: llmtf.base.nlpcoreteam/enMMLU: metric subject abstract_algebra 0.340000 anatomy 0.592593 astronomy 0.684211 business_ethics 0.690000 clinical_knowledge 0.720755 college_biology 0.722222 college_chemistry 0.420000 college_computer_science 0.560000 college_mathematics 0.460000 college_medicine 0.682081 college_physics 0.519608 computer_security 0.800000 conceptual_physics 0.625532 econometrics 0.473684 electrical_engineering 0.682759 elementary_mathematics 0.595238 formal_logic 0.365079 global_facts 0.340000 high_school_biology 0.800000 high_school_chemistry 0.571429 high_school_computer_science 0.760000 high_school_european_history 0.775758 high_school_geography 0.797980 high_school_government_and_politics 0.860104 high_school_macroeconomics 0.702564 high_school_mathematics 0.481481 high_school_microeconomics 0.810924 high_school_physics 0.430464 high_school_psychology 0.840367 high_school_statistics 0.625000 high_school_us_history 0.799020 high_school_world_history 0.827004 human_aging 0.672646 human_sexuality 0.763359 international_law 0.760331 jurisprudence 0.814815 logical_fallacies 0.760736 machine_learning 0.482143 management 0.825243 marketing 0.897436 medical_genetics 0.760000 miscellaneous 0.777778 moral_disputes 0.661850 moral_scenarios 0.293855 nutrition 0.751634 philosophy 0.697749 prehistory 0.700617 professional_accounting 0.496454 professional_law 0.453064 professional_medicine 0.636029 professional_psychology 0.660131 public_relations 0.736364 security_studies 0.734694 sociology 0.810945 us_foreign_policy 0.870000 virology 0.530120 world_religions 0.801170 INFO: 2024-10-16 01:17:34,209: llmtf.base.nlpcoreteam/enMMLU: metric subject STEM 0.586671 humanities 0.670081 other (business, health, misc.) 0.669483 social sciences 0.755093 INFO: 2024-10-16 01:17:34,217: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6703320839095225} INFO: 2024-10-16 01:17:34,284: llmtf.base.evaluator: Ended eval INFO: 2024-10-16 01:17:34,298: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom 0.518 0.393 0.434 0.420 0.489 0.431 0.050 0.961 0.942 0.378 0.540 0.722 0.670 0.303 INFO: 2024-10-16 01:20:37,498: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 786.13s INFO: 2024-10-16 01:20:37,501: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: INFO: 2024-10-16 01:20:37,548: llmtf.base.nlpcoreteam/ruMMLU: metric subject abstract_algebra 0.280000 anatomy 0.259259 astronomy 0.394737 business_ethics 0.520000 clinical_knowledge 0.392453 college_biology 0.375000 college_chemistry 0.350000 college_computer_science 0.460000 college_mathematics 0.390000 college_medicine 0.358382 college_physics 0.411765 computer_security 0.580000 conceptual_physics 0.446809 econometrics 0.307018 electrical_engineering 0.427586 elementary_mathematics 0.515873 formal_logic 0.357143 global_facts 0.260000 high_school_biology 0.490323 high_school_chemistry 0.325123 high_school_computer_science 0.620000 high_school_european_history 0.496970 high_school_geography 0.545455 high_school_government_and_politics 0.476684 high_school_macroeconomics 0.471795 high_school_mathematics 0.362963 high_school_microeconomics 0.487395 high_school_physics 0.344371 high_school_psychology 0.467890 high_school_statistics 0.453704 high_school_us_history 0.455882 high_school_world_history 0.556962 human_aging 0.443946 human_sexuality 0.541985 international_law 0.619835 jurisprudence 0.509259 logical_fallacies 0.447853 machine_learning 0.401786 management 0.466019 marketing 0.675214 medical_genetics 0.510000 miscellaneous 0.413793 moral_disputes 0.468208 moral_scenarios 0.237989 nutrition 0.539216 philosophy 0.501608 prehistory 0.425926 professional_accounting 0.333333 professional_law 0.344198 professional_medicine 0.345588 professional_psychology 0.444444 public_relations 0.454545 security_studies 0.534694 sociology 0.601990 us_foreign_policy 0.610000 virology 0.463855 world_religions 0.438596 INFO: 2024-10-16 01:20:37,555: llmtf.base.nlpcoreteam/ruMMLU: metric subject STEM 0.423891 humanities 0.450802 other (business, health, misc.) 0.427218 social sciences 0.495325 INFO: 2024-10-16 01:20:37,563: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.4493090597333154} INFO: 2024-10-16 01:20:37,641: llmtf.base.evaluator: Ended eval INFO: 2024-10-16 01:20:37,655: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.513 0.393 0.434 0.420 0.489 0.431 0.050 0.961 0.942 0.378 0.540 0.722 0.670 0.449 0.303 INFO: 2024-10-16 01:30:09,544: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 816.73s INFO: 2024-10-16 01:30:09,547: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: INFO: 2024-10-16 01:30:09,581: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 3.828156059562894, 'len': 0.9521096414122393, 'lcs': 0.37} INFO: 2024-10-16 01:30:09,583: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-16 01:30:09,583: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-16 01:30:12,888: llmtf.base.darumeru/cp_para_en: Loading Dataset: 3.30s INFO: 2024-10-16 01:43:00,225: llmtf.base.daru/treewayabstractive: Processing Dataset: 2385.55s INFO: 2024-10-16 01:43:00,233: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: INFO: 2024-10-16 01:43:00,251: llmtf.base.daru/treewayabstractive: {'rouge1': 0.2792817830105651, 'rouge2': 0.09909942829468928} INFO: 2024-10-16 01:43:00,256: llmtf.base.evaluator: Ended eval INFO: 2024-10-16 01:43:00,267: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.484 0.189 0.393 0.434 0.420 0.489 0.431 0.050 0.370 0.961 0.942 0.378 0.540 0.722 0.670 0.449 0.303 INFO: 2024-10-16 01:44:44,037: llmtf.base.darumeru/cp_para_en: Processing Dataset: 871.15s INFO: 2024-10-16 01:44:44,040: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en: INFO: 2024-10-16 01:44:44,044: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.439886096406914, 'len': 0.9905967745509706, 'lcs': 1.0} INFO: 2024-10-16 01:44:44,045: llmtf.base.evaluator: Ended eval INFO: 2024-10-16 01:44:44,054: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.514 0.189 0.393 0.434 0.420 0.489 0.431 0.050 1.000 0.370 0.961 0.942 0.378 0.540 0.722 0.670 0.449 0.303