INFO: 2024-10-15 15:44:31,612: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom'] INFO: 2024-10-15 15:44:31,613: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 15:44:31,613: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 15:44:32,407: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu'] INFO: 2024-10-15 15:44:32,407: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 15:44:32,407: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 15:44:34,013: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] INFO: 2024-10-15 15:44:34,014: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 15:44:34,014: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 15:44:36,362: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] INFO: 2024-10-15 15:44:36,363: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 15:44:36,363: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 15:44:38,148: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] INFO: 2024-10-15 15:44:38,149: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 15:44:38,149: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 15:44:40,069: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] INFO: 2024-10-15 15:44:40,069: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 15:44:40,069: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 15:44:42,353: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en'] INFO: 2024-10-15 15:44:42,354: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 15:44:42,354: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 15:44:46,581: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 4.23s INFO: 2024-10-15 15:44:53,271: llmtf.base.daru/treewayextractive: Loading Dataset: 13.20s INFO: 2024-10-15 15:44:55,800: llmtf.base.darumeru/MultiQ: Loading Dataset: 24.19s INFO: 2024-10-15 15:44:57,156: llmtf.base.daru/treewayabstractive: Loading Dataset: 19.01s INFO: 2024-10-15 15:46:04,231: llmtf.base.darumeru/ruMMLU: Loading Dataset: 91.82s INFO: 2024-10-15 15:47:41,431: llmtf.base.darumeru/MultiQ: Processing Dataset: 165.60s INFO: 2024-10-15 15:47:41,432: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: INFO: 2024-10-15 15:47:41,437: llmtf.base.darumeru/MultiQ: {'f1': 0.4759256275303807, 'em': 0.3030592734225621} INFO: 2024-10-15 15:47:41,447: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 15:47:41,448: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 15:47:44,367: llmtf.base.darumeru/PARus: Loading Dataset: 2.92s INFO: 2024-10-15 15:47:50,707: llmtf.base.darumeru/PARus: Processing Dataset: 6.34s INFO: 2024-10-15 15:47:50,709: llmtf.base.darumeru/PARus: Results for darumeru/PARus: INFO: 2024-10-15 15:47:50,721: llmtf.base.darumeru/PARus: {'acc': 0.39} INFO: 2024-10-15 15:47:50,723: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 15:47:50,723: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 15:47:51,808: llmtf.base.daru/treewayextractive: Processing Dataset: 178.54s INFO: 2024-10-15 15:47:51,811: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: INFO: 2024-10-15 15:47:52,026: llmtf.base.daru/treewayextractive: {'r-prec': 0.3766208513708514} INFO: 2024-10-15 15:47:52,063: llmtf.base.evaluator: Ended eval INFO: 2024-10-15 15:47:52,068: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus 0.385 0.377 0.389 0.390 INFO: 2024-10-15 15:47:54,545: llmtf.base.darumeru/RCB: Loading Dataset: 3.82s INFO: 2024-10-15 15:48:04,785: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 208.42s INFO: 2024-10-15 15:48:05,455: llmtf.base.darumeru/RCB: Processing Dataset: 10.91s INFO: 2024-10-15 15:48:05,458: llmtf.base.darumeru/RCB: Results for darumeru/RCB: INFO: 2024-10-15 15:48:05,463: llmtf.base.darumeru/RCB: {'acc': 0.41363636363636364, 'f1_macro': 0.3966762193460725} INFO: 2024-10-15 15:48:05,465: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 15:48:05,465: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 15:48:20,815: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 15.35s INFO: 2024-10-15 15:48:55,582: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 261.57s INFO: 2024-10-15 15:49:32,641: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 71.82s INFO: 2024-10-15 15:49:32,643: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: INFO: 2024-10-15 15:49:32,656: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.5240549828178694, 'f1_macro': 0.5229146059329463} INFO: 2024-10-15 15:49:32,672: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 15:49:32,673: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 15:49:35,205: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.53s INFO: 2024-10-15 15:49:39,346: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 4.14s INFO: 2024-10-15 15:49:39,347: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: INFO: 2024-10-15 15:49:39,352: llmtf.base.darumeru/ruWorldTree: {'acc': 0.6571428571428571, 'f1_macro': 0.6537041574777424} INFO: 2024-10-15 15:49:39,353: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 15:49:39,353: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 15:49:43,268: llmtf.base.darumeru/RWSD: Loading Dataset: 3.91s INFO: 2024-10-15 15:49:53,085: llmtf.base.darumeru/RWSD: Processing Dataset: 9.82s INFO: 2024-10-15 15:49:53,087: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: INFO: 2024-10-15 15:49:53,091: llmtf.base.darumeru/RWSD: {'acc': 0.43137254901960786} INFO: 2024-10-15 15:49:53,092: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 15:49:53,092: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 15:50:09,717: llmtf.base.darumeru/USE: Loading Dataset: 16.62s INFO: 2024-10-15 15:51:24,347: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 397.77s INFO: 2024-10-15 15:51:24,350: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru: INFO: 2024-10-15 15:51:24,354: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 3.6600682410685916, 'len': 0.9213865185211005, 'lcs': 0.5523613963039015} INFO: 2024-10-15 15:51:24,358: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 15:51:24,358: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 15:51:27,707: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 3.35s INFO: 2024-10-15 15:53:18,077: llmtf.base.darumeru/USE: Processing Dataset: 188.36s INFO: 2024-10-15 15:53:18,081: llmtf.base.darumeru/USE: Results for darumeru/USE: INFO: 2024-10-15 15:53:18,102: llmtf.base.darumeru/USE: {'grade_norm': 0.042156862745098035} INFO: 2024-10-15 15:53:18,113: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 15:53:18,114: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 15:53:39,656: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 21.54s INFO: 2024-10-15 15:55:08,788: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 89.13s INFO: 2024-10-15 15:55:08,791: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom: INFO: 2024-10-15 15:55:08,804: llmtf.base.russiannlp/rucola_custom: {'acc': 0.5073555794761392, 'mcc': 0.1555547836871833} INFO: 2024-10-15 15:55:08,815: llmtf.base.evaluator: Ended eval INFO: 2024-10-15 15:55:08,825: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_ru darumeru/ruOpenBookQA darumeru/ruWorldTree russiannlp/rucola_custom 0.447 0.377 0.389 0.390 0.405 0.431 0.042 0.921 0.523 0.655 0.331 INFO: 2024-10-15 15:56:27,985: llmtf.base.darumeru/ruMMLU: Processing Dataset: 623.75s INFO: 2024-10-15 15:56:27,987: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU: INFO: 2024-10-15 15:56:28,014: llmtf.base.darumeru/ruMMLU: {'acc': 0.400179586950015} INFO: 2024-10-15 15:56:28,094: llmtf.base.evaluator: Ended eval INFO: 2024-10-15 15:56:28,107: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree russiannlp/rucola_custom 0.442 0.377 0.389 0.390 0.405 0.431 0.042 0.921 0.400 0.523 0.655 0.331 INFO: 2024-10-15 15:58:10,690: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 402.98s INFO: 2024-10-15 15:58:10,694: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en: INFO: 2024-10-15 15:58:10,698: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.420772874407176, 'len': 0.9613467200502441, 'lcs': 1.0} INFO: 2024-10-15 15:58:10,701: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 15:58:10,701: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 15:58:15,603: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 4.90s INFO: 2024-10-15 15:59:03,310: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 658.52s INFO: 2024-10-15 15:59:03,312: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: INFO: 2024-10-15 15:59:03,358: llmtf.base.nlpcoreteam/enMMLU: metric subject abstract_algebra 0.390000 anatomy 0.637037 astronomy 0.697368 business_ethics 0.690000 clinical_knowledge 0.732075 college_biology 0.743056 college_chemistry 0.460000 college_computer_science 0.530000 college_mathematics 0.460000 college_medicine 0.664740 college_physics 0.431373 computer_security 0.780000 conceptual_physics 0.617021 econometrics 0.517544 electrical_engineering 0.655172 elementary_mathematics 0.558201 formal_logic 0.373016 global_facts 0.370000 high_school_biology 0.803226 high_school_chemistry 0.566502 high_school_computer_science 0.730000 high_school_european_history 0.775758 high_school_geography 0.823232 high_school_government_and_politics 0.860104 high_school_macroeconomics 0.715385 high_school_mathematics 0.503704 high_school_microeconomics 0.777311 high_school_physics 0.417219 high_school_psychology 0.849541 high_school_statistics 0.625000 high_school_us_history 0.818627 high_school_world_history 0.827004 human_aging 0.717489 human_sexuality 0.763359 international_law 0.785124 jurisprudence 0.805556 logical_fallacies 0.742331 machine_learning 0.464286 management 0.805825 marketing 0.888889 medical_genetics 0.730000 miscellaneous 0.789272 moral_disputes 0.667630 moral_scenarios 0.273743 nutrition 0.774510 philosophy 0.707395 prehistory 0.737654 professional_accounting 0.507092 professional_law 0.476532 professional_medicine 0.621324 professional_psychology 0.696078 public_relations 0.745455 security_studies 0.722449 sociology 0.830846 us_foreign_policy 0.840000 virology 0.512048 world_religions 0.830409 INFO: 2024-10-15 15:59:03,366: llmtf.base.nlpcoreteam/enMMLU: metric subject STEM 0.579563 humanities 0.678522 other (business, health, misc.) 0.674307 social sciences 0.761775 INFO: 2024-10-15 15:59:03,374: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6735416670043434} INFO: 2024-10-15 15:59:03,440: llmtf.base.evaluator: Ended eval INFO: 2024-10-15 15:59:03,454: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom 0.500 0.377 0.389 0.390 0.405 0.431 0.042 0.961 0.921 0.400 0.523 0.655 0.674 0.331 INFO: 2024-10-15 16:02:04,868: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 789.28s INFO: 2024-10-15 16:02:04,872: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: INFO: 2024-10-15 16:02:04,918: llmtf.base.nlpcoreteam/ruMMLU: metric subject abstract_algebra 0.330000 anatomy 0.325926 astronomy 0.414474 business_ethics 0.460000 clinical_knowledge 0.415094 college_biology 0.340278 college_chemistry 0.380000 college_computer_science 0.420000 college_mathematics 0.360000 college_medicine 0.387283 college_physics 0.392157 computer_security 0.610000 conceptual_physics 0.451064 econometrics 0.333333 electrical_engineering 0.489655 elementary_mathematics 0.523810 formal_logic 0.309524 global_facts 0.310000 high_school_biology 0.541935 high_school_chemistry 0.369458 high_school_computer_science 0.600000 high_school_european_history 0.515152 high_school_geography 0.540404 high_school_government_and_politics 0.419689 high_school_macroeconomics 0.446154 high_school_mathematics 0.411111 high_school_microeconomics 0.500000 high_school_physics 0.317881 high_school_psychology 0.453211 high_school_statistics 0.398148 high_school_us_history 0.431373 high_school_world_history 0.497890 human_aging 0.466368 human_sexuality 0.511450 international_law 0.661157 jurisprudence 0.500000 logical_fallacies 0.423313 machine_learning 0.437500 management 0.456311 marketing 0.670940 medical_genetics 0.530000 miscellaneous 0.406130 moral_disputes 0.476879 moral_scenarios 0.240223 nutrition 0.535948 philosophy 0.485531 prehistory 0.438272 professional_accounting 0.375887 professional_law 0.342243 professional_medicine 0.367647 professional_psychology 0.428105 public_relations 0.509091 security_studies 0.567347 sociology 0.601990 us_foreign_policy 0.630000 virology 0.415663 world_religions 0.426901 INFO: 2024-10-15 16:02:04,926: llmtf.base.nlpcoreteam/ruMMLU: metric subject STEM 0.432637 humanities 0.442189 other (business, health, misc.) 0.437371 social sciences 0.495065 INFO: 2024-10-15 16:02:04,934: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.45181545178768195} INFO: 2024-10-15 16:02:05,011: llmtf.base.evaluator: Ended eval INFO: 2024-10-15 16:02:05,026: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.497 0.377 0.389 0.390 0.405 0.431 0.042 0.961 0.921 0.400 0.523 0.655 0.674 0.452 0.331 INFO: 2024-10-15 16:12:20,022: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 844.41s INFO: 2024-10-15 16:12:20,029: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: INFO: 2024-10-15 16:12:20,081: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 3.49153636504356, 'len': 0.7957149210024954, 'lcs': 0.16} INFO: 2024-10-15 16:12:20,087: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 16:12:20,087: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 16:12:23,374: llmtf.base.darumeru/cp_para_en: Loading Dataset: 3.29s INFO: 2024-10-15 16:26:55,233: llmtf.base.darumeru/cp_para_en: Processing Dataset: 871.86s INFO: 2024-10-15 16:26:55,235: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en: INFO: 2024-10-15 16:26:55,239: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.43961628016479, 'len': 0.9907918582197385, 'lcs': 1.0} INFO: 2024-10-15 16:26:55,239: llmtf.base.evaluator: Ended eval INFO: 2024-10-15 16:26:55,249: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.507 0.377 0.389 0.390 0.405 0.431 0.042 1.000 0.160 0.961 0.921 0.400 0.523 0.655 0.674 0.452 0.331 INFO: 2024-10-15 16:29:43,016: llmtf.base.daru/treewayabstractive: Processing Dataset: 2685.86s INFO: 2024-10-15 16:29:43,021: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: INFO: 2024-10-15 16:29:43,025: llmtf.base.daru/treewayabstractive: {'rouge1': 0.2503995223034589, 'rouge2': 0.08558468876095006} INFO: 2024-10-15 16:29:43,031: llmtf.base.evaluator: Ended eval INFO: 2024-10-15 16:29:43,056: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.487 0.168 0.377 0.389 0.390 0.405 0.431 0.042 1.000 0.160 0.961 0.921 0.400 0.523 0.655 0.674 0.452 0.331