INFO: 2024-10-15 23:27:34,633: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom'] INFO: 2024-10-15 23:27:34,634: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 23:27:34,634: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 23:27:35,580: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu'] INFO: 2024-10-15 23:27:35,581: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 23:27:35,581: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 23:27:37,434: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] INFO: 2024-10-15 23:27:37,435: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 23:27:37,435: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 23:27:39,620: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] INFO: 2024-10-15 23:27:39,621: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 23:27:39,621: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 23:27:41,015: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] INFO: 2024-10-15 23:27:41,022: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 23:27:41,022: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 23:27:43,196: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] INFO: 2024-10-15 23:27:43,196: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 23:27:43,196: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 23:27:45,192: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en'] INFO: 2024-10-15 23:27:45,193: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 23:27:45,193: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 23:27:49,559: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 4.37s INFO: 2024-10-15 23:27:56,068: llmtf.base.daru/treewayextractive: Loading Dataset: 12.87s INFO: 2024-10-15 23:27:57,445: llmtf.base.darumeru/MultiQ: Loading Dataset: 22.81s INFO: 2024-10-15 23:27:59,762: llmtf.base.daru/treewayabstractive: Loading Dataset: 18.74s INFO: 2024-10-15 23:29:07,300: llmtf.base.darumeru/ruMMLU: Loading Dataset: 91.72s INFO: 2024-10-15 23:30:55,098: llmtf.base.daru/treewayextractive: Processing Dataset: 179.02s INFO: 2024-10-15 23:30:55,100: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: INFO: 2024-10-15 23:30:55,311: llmtf.base.daru/treewayextractive: {'r-prec': 0.35143051948051945} INFO: 2024-10-15 23:30:55,347: llmtf.base.evaluator: Ended eval INFO: 2024-10-15 23:30:55,351: llmtf.base.evaluator: mean daru/treewayextractive 0.351 0.351 INFO: 2024-10-15 23:31:07,598: llmtf.base.darumeru/MultiQ: Processing Dataset: 190.15s INFO: 2024-10-15 23:31:07,614: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: INFO: 2024-10-15 23:31:07,618: llmtf.base.darumeru/MultiQ: {'f1': 0.4149813636918578, 'em': 0.2762906309751434} INFO: 2024-10-15 23:31:07,628: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 23:31:07,628: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 23:31:10,999: llmtf.base.darumeru/PARus: Loading Dataset: 3.37s INFO: 2024-10-15 23:31:13,775: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 214.15s INFO: 2024-10-15 23:31:17,369: llmtf.base.darumeru/PARus: Processing Dataset: 6.37s INFO: 2024-10-15 23:31:17,372: llmtf.base.darumeru/PARus: Results for darumeru/PARus: INFO: 2024-10-15 23:31:17,384: llmtf.base.darumeru/PARus: {'acc': 0.32} INFO: 2024-10-15 23:31:17,386: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 23:31:17,386: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 23:31:21,212: llmtf.base.darumeru/RCB: Loading Dataset: 3.82s INFO: 2024-10-15 23:31:32,123: llmtf.base.darumeru/RCB: Processing Dataset: 10.91s INFO: 2024-10-15 23:31:32,141: llmtf.base.darumeru/RCB: Results for darumeru/RCB: INFO: 2024-10-15 23:31:32,148: llmtf.base.darumeru/RCB: {'acc': 0.42727272727272725, 'f1_macro': 0.4213976946667694} INFO: 2024-10-15 23:31:32,150: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 23:31:32,151: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 23:31:47,498: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 15.35s INFO: 2024-10-15 23:32:07,223: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 269.79s INFO: 2024-10-15 23:32:59,383: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 71.88s INFO: 2024-10-15 23:32:59,384: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: INFO: 2024-10-15 23:32:59,398: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.49312714776632305, 'f1_macro': 0.48517498211245624} INFO: 2024-10-15 23:32:59,414: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 23:32:59,415: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 23:33:02,154: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.74s INFO: 2024-10-15 23:33:06,332: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 4.18s INFO: 2024-10-15 23:33:06,333: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: INFO: 2024-10-15 23:33:06,338: llmtf.base.darumeru/ruWorldTree: {'acc': 0.5619047619047619, 'f1_macro': 0.5387530387530388} INFO: 2024-10-15 23:33:06,340: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 23:33:06,340: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 23:33:10,149: llmtf.base.darumeru/RWSD: Loading Dataset: 3.81s INFO: 2024-10-15 23:33:19,994: llmtf.base.darumeru/RWSD: Processing Dataset: 9.84s INFO: 2024-10-15 23:33:19,996: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: INFO: 2024-10-15 23:33:20,000: llmtf.base.darumeru/RWSD: {'acc': 0.44607843137254904} INFO: 2024-10-15 23:33:20,002: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 23:33:20,002: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 23:33:36,147: llmtf.base.darumeru/USE: Loading Dataset: 16.14s INFO: 2024-10-15 23:34:28,409: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 398.85s INFO: 2024-10-15 23:34:28,413: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru: INFO: 2024-10-15 23:34:28,417: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 3.552729392451623, 'len': 0.912449166502956, 'lcs': 0.4414784394250513} INFO: 2024-10-15 23:34:28,421: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 23:34:28,421: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 23:34:31,828: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 3.41s INFO: 2024-10-15 23:36:47,199: llmtf.base.darumeru/USE: Processing Dataset: 191.05s INFO: 2024-10-15 23:36:47,203: llmtf.base.darumeru/USE: Results for darumeru/USE: INFO: 2024-10-15 23:36:47,225: llmtf.base.darumeru/USE: {'grade_norm': 0.04607843137254901} INFO: 2024-10-15 23:36:47,231: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 23:36:47,232: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 23:37:08,589: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 21.36s INFO: 2024-10-15 23:38:37,411: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 88.82s INFO: 2024-10-15 23:38:37,415: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom: INFO: 2024-10-15 23:38:37,428: llmtf.base.russiannlp/rucola_custom: {'acc': 0.5601004664513815, 'mcc': 0.09897569327848366} INFO: 2024-10-15 23:38:37,439: llmtf.base.evaluator: Ended eval INFO: 2024-10-15 23:38:37,451: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_ru darumeru/ruOpenBookQA darumeru/ruWorldTree russiannlp/rucola_custom 0.422 0.351 0.346 0.320 0.424 0.446 0.046 0.912 0.489 0.550 0.330 INFO: 2024-10-15 23:39:22,591: llmtf.base.darumeru/ruMMLU: Processing Dataset: 615.29s INFO: 2024-10-15 23:39:22,593: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU: INFO: 2024-10-15 23:39:22,633: llmtf.base.darumeru/ruMMLU: {'acc': 0.37663374239249725} INFO: 2024-10-15 23:39:22,704: llmtf.base.evaluator: Ended eval INFO: 2024-10-15 23:39:22,714: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree russiannlp/rucola_custom 0.417 0.351 0.346 0.320 0.424 0.446 0.046 0.912 0.377 0.489 0.550 0.330 INFO: 2024-10-15 23:41:14,360: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 402.53s INFO: 2024-10-15 23:41:14,363: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en: INFO: 2024-10-15 23:41:14,368: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.420697143481598, 'len': 0.9613784781803254, 'lcs': 1.0} INFO: 2024-10-15 23:41:14,371: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 23:41:14,371: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 23:41:17,993: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 3.62s INFO: 2024-10-15 23:42:06,906: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 653.13s INFO: 2024-10-15 23:42:06,908: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: INFO: 2024-10-15 23:42:06,954: llmtf.base.nlpcoreteam/enMMLU: metric subject abstract_algebra 0.400000 anatomy 0.651852 astronomy 0.677632 business_ethics 0.670000 clinical_knowledge 0.735849 college_biology 0.729167 college_chemistry 0.440000 college_computer_science 0.550000 college_mathematics 0.470000 college_medicine 0.670520 college_physics 0.441176 computer_security 0.790000 conceptual_physics 0.604255 econometrics 0.543860 electrical_engineering 0.648276 elementary_mathematics 0.542328 formal_logic 0.365079 global_facts 0.330000 high_school_biology 0.806452 high_school_chemistry 0.566502 high_school_computer_science 0.780000 high_school_european_history 0.787879 high_school_geography 0.797980 high_school_government_and_politics 0.849741 high_school_macroeconomics 0.720513 high_school_mathematics 0.488889 high_school_microeconomics 0.743697 high_school_physics 0.443709 high_school_psychology 0.851376 high_school_statistics 0.587963 high_school_us_history 0.808824 high_school_world_history 0.822785 human_aging 0.726457 human_sexuality 0.778626 international_law 0.752066 jurisprudence 0.824074 logical_fallacies 0.742331 machine_learning 0.508929 management 0.834951 marketing 0.888889 medical_genetics 0.740000 miscellaneous 0.802043 moral_disputes 0.676301 moral_scenarios 0.288268 nutrition 0.754902 philosophy 0.707395 prehistory 0.753086 professional_accounting 0.510638 professional_law 0.479791 professional_medicine 0.610294 professional_psychology 0.710784 public_relations 0.709091 security_studies 0.734694 sociology 0.830846 us_foreign_policy 0.840000 virology 0.518072 world_religions 0.812865 INFO: 2024-10-15 23:42:06,961: llmtf.base.nlpcoreteam/enMMLU: metric subject STEM 0.581960 humanities 0.678519 other (business, health, misc.) 0.674605 social sciences 0.759267 INFO: 2024-10-15 23:42:06,984: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6735877381763914} INFO: 2024-10-15 23:42:07,051: llmtf.base.evaluator: Ended eval INFO: 2024-10-15 23:42:07,099: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom 0.479 0.351 0.346 0.320 0.424 0.446 0.046 0.961 0.912 0.377 0.489 0.550 0.674 0.330 INFO: 2024-10-15 23:45:18,092: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 790.87s INFO: 2024-10-15 23:45:18,095: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: INFO: 2024-10-15 23:45:18,142: llmtf.base.nlpcoreteam/ruMMLU: metric subject abstract_algebra 0.360000 anatomy 0.333333 astronomy 0.388158 business_ethics 0.420000 clinical_knowledge 0.411321 college_biology 0.298611 college_chemistry 0.350000 college_computer_science 0.400000 college_mathematics 0.350000 college_medicine 0.398844 college_physics 0.362745 computer_security 0.560000 conceptual_physics 0.425532 econometrics 0.350877 electrical_engineering 0.475862 elementary_mathematics 0.529101 formal_logic 0.333333 global_facts 0.250000 high_school_biology 0.435484 high_school_chemistry 0.349754 high_school_computer_science 0.600000 high_school_european_history 0.448485 high_school_geography 0.449495 high_school_government_and_politics 0.404145 high_school_macroeconomics 0.412821 high_school_mathematics 0.414815 high_school_microeconomics 0.449580 high_school_physics 0.337748 high_school_psychology 0.381651 high_school_statistics 0.398148 high_school_us_history 0.441176 high_school_world_history 0.468354 human_aging 0.426009 human_sexuality 0.442748 international_law 0.628099 jurisprudence 0.481481 logical_fallacies 0.411043 machine_learning 0.428571 management 0.398058 marketing 0.628205 medical_genetics 0.530000 miscellaneous 0.381865 moral_disputes 0.485549 moral_scenarios 0.244693 nutrition 0.529412 philosophy 0.466238 prehistory 0.410494 professional_accounting 0.343972 professional_law 0.344850 professional_medicine 0.312500 professional_psychology 0.406863 public_relations 0.418182 security_studies 0.502041 sociology 0.572139 us_foreign_policy 0.620000 virology 0.403614 world_religions 0.415205 INFO: 2024-10-15 23:45:18,149: llmtf.base.nlpcoreteam/ruMMLU: metric subject STEM 0.414696 humanities 0.429154 other (business, health, misc.) 0.411938 social sciences 0.450878 INFO: 2024-10-15 23:45:18,174: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.4266666289558874} INFO: 2024-10-15 23:45:18,249: llmtf.base.evaluator: Ended eval INFO: 2024-10-15 23:45:18,263: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.475 0.351 0.346 0.320 0.424 0.446 0.046 0.961 0.912 0.377 0.489 0.550 0.674 0.427 0.330 INFO: 2024-10-15 23:55:25,025: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 847.01s INFO: 2024-10-15 23:55:25,029: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: INFO: 2024-10-15 23:55:25,033: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 3.317211095060652, 'len': 0.7885274694294595, 'lcs': 0.11} INFO: 2024-10-15 23:55:25,034: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] INFO: 2024-10-15 23:55:25,034: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] INFO: 2024-10-15 23:55:28,357: llmtf.base.darumeru/cp_para_en: Loading Dataset: 3.32s INFO: 2024-10-16 00:10:03,216: llmtf.base.darumeru/cp_para_en: Processing Dataset: 874.86s INFO: 2024-10-16 00:10:03,219: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en: INFO: 2024-10-16 00:10:03,224: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.4363376335695355, 'len': 0.9918057024834991, 'lcs': 1.0} INFO: 2024-10-16 00:10:03,225: llmtf.base.evaluator: Ended eval INFO: 2024-10-16 00:10:03,237: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.485 0.351 0.346 0.320 0.424 0.446 0.046 1.000 0.110 0.961 0.912 0.377 0.489 0.550 0.674 0.427 0.330 INFO: 2024-10-16 00:13:39,224: llmtf.base.daru/treewayabstractive: Processing Dataset: 2739.46s INFO: 2024-10-16 00:13:39,228: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: INFO: 2024-10-16 00:13:39,233: llmtf.base.daru/treewayabstractive: {'rouge1': 0.24324312209419716, 'rouge2': 0.07780431416219476} INFO: 2024-10-16 00:13:39,238: llmtf.base.evaluator: Ended eval INFO: 2024-10-16 00:13:39,261: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom 0.466 0.161 0.351 0.346 0.320 0.424 0.446 0.046 1.000 0.110 0.961 0.912 0.377 0.489 0.550 0.674 0.427 0.330