|
INFO: 2024-10-15 15:44:31,612: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom'] |
|
INFO: 2024-10-15 15:44:31,613: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-15 15:44:31,613: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-15 15:44:32,407: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu'] |
|
INFO: 2024-10-15 15:44:32,407: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-15 15:44:32,407: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-15 15:44:34,013: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] |
|
INFO: 2024-10-15 15:44:34,014: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-15 15:44:34,014: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-15 15:44:36,362: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] |
|
INFO: 2024-10-15 15:44:36,363: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-15 15:44:36,363: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-15 15:44:38,148: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] |
|
INFO: 2024-10-15 15:44:38,149: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-15 15:44:38,149: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-15 15:44:40,069: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] |
|
INFO: 2024-10-15 15:44:40,069: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-15 15:44:40,069: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-15 15:44:42,353: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en'] |
|
INFO: 2024-10-15 15:44:42,354: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-15 15:44:42,354: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-15 15:44:46,581: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 4.23s |
|
INFO: 2024-10-15 15:44:53,271: llmtf.base.daru/treewayextractive: Loading Dataset: 13.20s |
|
INFO: 2024-10-15 15:44:55,800: llmtf.base.darumeru/MultiQ: Loading Dataset: 24.19s |
|
INFO: 2024-10-15 15:44:57,156: llmtf.base.daru/treewayabstractive: Loading Dataset: 19.01s |
|
INFO: 2024-10-15 15:46:04,231: llmtf.base.darumeru/ruMMLU: Loading Dataset: 91.82s |
|
INFO: 2024-10-15 15:47:41,431: llmtf.base.darumeru/MultiQ: Processing Dataset: 165.60s |
|
INFO: 2024-10-15 15:47:41,432: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: |
|
INFO: 2024-10-15 15:47:41,437: llmtf.base.darumeru/MultiQ: {'f1': 0.4759256275303807, 'em': 0.3030592734225621} |
|
INFO: 2024-10-15 15:47:41,447: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-15 15:47:41,448: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-15 15:47:44,367: llmtf.base.darumeru/PARus: Loading Dataset: 2.92s |
|
INFO: 2024-10-15 15:47:50,707: llmtf.base.darumeru/PARus: Processing Dataset: 6.34s |
|
INFO: 2024-10-15 15:47:50,709: llmtf.base.darumeru/PARus: Results for darumeru/PARus: |
|
INFO: 2024-10-15 15:47:50,721: llmtf.base.darumeru/PARus: {'acc': 0.39} |
|
INFO: 2024-10-15 15:47:50,723: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-15 15:47:50,723: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-15 15:47:51,808: llmtf.base.daru/treewayextractive: Processing Dataset: 178.54s |
|
INFO: 2024-10-15 15:47:51,811: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: |
|
INFO: 2024-10-15 15:47:52,026: llmtf.base.daru/treewayextractive: {'r-prec': 0.3766208513708514} |
|
INFO: 2024-10-15 15:47:52,063: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-15 15:47:52,068: llmtf.base.evaluator: |
|
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus |
|
0.385 0.377 0.389 0.390 |
|
INFO: 2024-10-15 15:47:54,545: llmtf.base.darumeru/RCB: Loading Dataset: 3.82s |
|
INFO: 2024-10-15 15:48:04,785: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 208.42s |
|
INFO: 2024-10-15 15:48:05,455: llmtf.base.darumeru/RCB: Processing Dataset: 10.91s |
|
INFO: 2024-10-15 15:48:05,458: llmtf.base.darumeru/RCB: Results for darumeru/RCB: |
|
INFO: 2024-10-15 15:48:05,463: llmtf.base.darumeru/RCB: {'acc': 0.41363636363636364, 'f1_macro': 0.3966762193460725} |
|
INFO: 2024-10-15 15:48:05,465: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-15 15:48:05,465: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-15 15:48:20,815: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 15.35s |
|
INFO: 2024-10-15 15:48:55,582: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 261.57s |
|
INFO: 2024-10-15 15:49:32,641: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 71.82s |
|
INFO: 2024-10-15 15:49:32,643: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: |
|
INFO: 2024-10-15 15:49:32,656: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.5240549828178694, 'f1_macro': 0.5229146059329463} |
|
INFO: 2024-10-15 15:49:32,672: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-15 15:49:32,673: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-15 15:49:35,205: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.53s |
|
INFO: 2024-10-15 15:49:39,346: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 4.14s |
|
INFO: 2024-10-15 15:49:39,347: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: |
|
INFO: 2024-10-15 15:49:39,352: llmtf.base.darumeru/ruWorldTree: {'acc': 0.6571428571428571, 'f1_macro': 0.6537041574777424} |
|
INFO: 2024-10-15 15:49:39,353: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-15 15:49:39,353: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-15 15:49:43,268: llmtf.base.darumeru/RWSD: Loading Dataset: 3.91s |
|
INFO: 2024-10-15 15:49:53,085: llmtf.base.darumeru/RWSD: Processing Dataset: 9.82s |
|
INFO: 2024-10-15 15:49:53,087: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: |
|
INFO: 2024-10-15 15:49:53,091: llmtf.base.darumeru/RWSD: {'acc': 0.43137254901960786} |
|
INFO: 2024-10-15 15:49:53,092: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-15 15:49:53,092: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-15 15:50:09,717: llmtf.base.darumeru/USE: Loading Dataset: 16.62s |
|
INFO: 2024-10-15 15:51:24,347: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 397.77s |
|
INFO: 2024-10-15 15:51:24,350: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru: |
|
INFO: 2024-10-15 15:51:24,354: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 3.6600682410685916, 'len': 0.9213865185211005, 'lcs': 0.5523613963039015} |
|
INFO: 2024-10-15 15:51:24,358: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-15 15:51:24,358: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-15 15:51:27,707: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 3.35s |
|
INFO: 2024-10-15 15:53:18,077: llmtf.base.darumeru/USE: Processing Dataset: 188.36s |
|
INFO: 2024-10-15 15:53:18,081: llmtf.base.darumeru/USE: Results for darumeru/USE: |
|
INFO: 2024-10-15 15:53:18,102: llmtf.base.darumeru/USE: {'grade_norm': 0.042156862745098035} |
|
INFO: 2024-10-15 15:53:18,113: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-15 15:53:18,114: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-15 15:53:39,656: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 21.54s |
|
INFO: 2024-10-15 15:55:08,788: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 89.13s |
|
INFO: 2024-10-15 15:55:08,791: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom: |
|
INFO: 2024-10-15 15:55:08,804: llmtf.base.russiannlp/rucola_custom: {'acc': 0.5073555794761392, 'mcc': 0.1555547836871833} |
|
INFO: 2024-10-15 15:55:08,815: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-15 15:55:08,825: llmtf.base.evaluator: |
|
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_ru darumeru/ruOpenBookQA darumeru/ruWorldTree russiannlp/rucola_custom |
|
0.447 0.377 0.389 0.390 0.405 0.431 0.042 0.921 0.523 0.655 0.331 |
|
INFO: 2024-10-15 15:56:27,985: llmtf.base.darumeru/ruMMLU: Processing Dataset: 623.75s |
|
INFO: 2024-10-15 15:56:27,987: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU: |
|
INFO: 2024-10-15 15:56:28,014: llmtf.base.darumeru/ruMMLU: {'acc': 0.400179586950015} |
|
INFO: 2024-10-15 15:56:28,094: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-15 15:56:28,107: llmtf.base.evaluator: |
|
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree russiannlp/rucola_custom |
|
0.442 0.377 0.389 0.390 0.405 0.431 0.042 0.921 0.400 0.523 0.655 0.331 |
|
INFO: 2024-10-15 15:58:10,690: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 402.98s |
|
INFO: 2024-10-15 15:58:10,694: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en: |
|
INFO: 2024-10-15 15:58:10,698: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.420772874407176, 'len': 0.9613467200502441, 'lcs': 1.0} |
|
INFO: 2024-10-15 15:58:10,701: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-15 15:58:10,701: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-15 15:58:15,603: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 4.90s |
|
INFO: 2024-10-15 15:59:03,310: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 658.52s |
|
INFO: 2024-10-15 15:59:03,312: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: |
|
INFO: 2024-10-15 15:59:03,358: llmtf.base.nlpcoreteam/enMMLU: metric |
|
subject |
|
abstract_algebra 0.390000 |
|
anatomy 0.637037 |
|
astronomy 0.697368 |
|
business_ethics 0.690000 |
|
clinical_knowledge 0.732075 |
|
college_biology 0.743056 |
|
college_chemistry 0.460000 |
|
college_computer_science 0.530000 |
|
college_mathematics 0.460000 |
|
college_medicine 0.664740 |
|
college_physics 0.431373 |
|
computer_security 0.780000 |
|
conceptual_physics 0.617021 |
|
econometrics 0.517544 |
|
electrical_engineering 0.655172 |
|
elementary_mathematics 0.558201 |
|
formal_logic 0.373016 |
|
global_facts 0.370000 |
|
high_school_biology 0.803226 |
|
high_school_chemistry 0.566502 |
|
high_school_computer_science 0.730000 |
|
high_school_european_history 0.775758 |
|
high_school_geography 0.823232 |
|
high_school_government_and_politics 0.860104 |
|
high_school_macroeconomics 0.715385 |
|
high_school_mathematics 0.503704 |
|
high_school_microeconomics 0.777311 |
|
high_school_physics 0.417219 |
|
high_school_psychology 0.849541 |
|
high_school_statistics 0.625000 |
|
high_school_us_history 0.818627 |
|
high_school_world_history 0.827004 |
|
human_aging 0.717489 |
|
human_sexuality 0.763359 |
|
international_law 0.785124 |
|
jurisprudence 0.805556 |
|
logical_fallacies 0.742331 |
|
machine_learning 0.464286 |
|
management 0.805825 |
|
marketing 0.888889 |
|
medical_genetics 0.730000 |
|
miscellaneous 0.789272 |
|
moral_disputes 0.667630 |
|
moral_scenarios 0.273743 |
|
nutrition 0.774510 |
|
philosophy 0.707395 |
|
prehistory 0.737654 |
|
professional_accounting 0.507092 |
|
professional_law 0.476532 |
|
professional_medicine 0.621324 |
|
professional_psychology 0.696078 |
|
public_relations 0.745455 |
|
security_studies 0.722449 |
|
sociology 0.830846 |
|
us_foreign_policy 0.840000 |
|
virology 0.512048 |
|
world_religions 0.830409 |
|
INFO: 2024-10-15 15:59:03,366: llmtf.base.nlpcoreteam/enMMLU: metric |
|
subject |
|
STEM 0.579563 |
|
humanities 0.678522 |
|
other (business, health, misc.) 0.674307 |
|
social sciences 0.761775 |
|
INFO: 2024-10-15 15:59:03,374: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6735416670043434} |
|
INFO: 2024-10-15 15:59:03,440: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-15 15:59:03,454: llmtf.base.evaluator: |
|
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom |
|
0.500 0.377 0.389 0.390 0.405 0.431 0.042 0.961 0.921 0.400 0.523 0.655 0.674 0.331 |
|
INFO: 2024-10-15 16:02:04,868: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 789.28s |
|
INFO: 2024-10-15 16:02:04,872: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: |
|
INFO: 2024-10-15 16:02:04,918: llmtf.base.nlpcoreteam/ruMMLU: metric |
|
subject |
|
abstract_algebra 0.330000 |
|
anatomy 0.325926 |
|
astronomy 0.414474 |
|
business_ethics 0.460000 |
|
clinical_knowledge 0.415094 |
|
college_biology 0.340278 |
|
college_chemistry 0.380000 |
|
college_computer_science 0.420000 |
|
college_mathematics 0.360000 |
|
college_medicine 0.387283 |
|
college_physics 0.392157 |
|
computer_security 0.610000 |
|
conceptual_physics 0.451064 |
|
econometrics 0.333333 |
|
electrical_engineering 0.489655 |
|
elementary_mathematics 0.523810 |
|
formal_logic 0.309524 |
|
global_facts 0.310000 |
|
high_school_biology 0.541935 |
|
high_school_chemistry 0.369458 |
|
high_school_computer_science 0.600000 |
|
high_school_european_history 0.515152 |
|
high_school_geography 0.540404 |
|
high_school_government_and_politics 0.419689 |
|
high_school_macroeconomics 0.446154 |
|
high_school_mathematics 0.411111 |
|
high_school_microeconomics 0.500000 |
|
high_school_physics 0.317881 |
|
high_school_psychology 0.453211 |
|
high_school_statistics 0.398148 |
|
high_school_us_history 0.431373 |
|
high_school_world_history 0.497890 |
|
human_aging 0.466368 |
|
human_sexuality 0.511450 |
|
international_law 0.661157 |
|
jurisprudence 0.500000 |
|
logical_fallacies 0.423313 |
|
machine_learning 0.437500 |
|
management 0.456311 |
|
marketing 0.670940 |
|
medical_genetics 0.530000 |
|
miscellaneous 0.406130 |
|
moral_disputes 0.476879 |
|
moral_scenarios 0.240223 |
|
nutrition 0.535948 |
|
philosophy 0.485531 |
|
prehistory 0.438272 |
|
professional_accounting 0.375887 |
|
professional_law 0.342243 |
|
professional_medicine 0.367647 |
|
professional_psychology 0.428105 |
|
public_relations 0.509091 |
|
security_studies 0.567347 |
|
sociology 0.601990 |
|
us_foreign_policy 0.630000 |
|
virology 0.415663 |
|
world_religions 0.426901 |
|
INFO: 2024-10-15 16:02:04,926: llmtf.base.nlpcoreteam/ruMMLU: metric |
|
subject |
|
STEM 0.432637 |
|
humanities 0.442189 |
|
other (business, health, misc.) 0.437371 |
|
social sciences 0.495065 |
|
INFO: 2024-10-15 16:02:04,934: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.45181545178768195} |
|
INFO: 2024-10-15 16:02:05,011: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-15 16:02:05,026: llmtf.base.evaluator: |
|
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
|
0.497 0.377 0.389 0.390 0.405 0.431 0.042 0.961 0.921 0.400 0.523 0.655 0.674 0.452 0.331 |
|
INFO: 2024-10-15 16:12:20,022: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 844.41s |
|
INFO: 2024-10-15 16:12:20,029: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: |
|
INFO: 2024-10-15 16:12:20,081: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 3.49153636504356, 'len': 0.7957149210024954, 'lcs': 0.16} |
|
INFO: 2024-10-15 16:12:20,087: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-15 16:12:20,087: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-15 16:12:23,374: llmtf.base.darumeru/cp_para_en: Loading Dataset: 3.29s |
|
INFO: 2024-10-15 16:26:55,233: llmtf.base.darumeru/cp_para_en: Processing Dataset: 871.86s |
|
INFO: 2024-10-15 16:26:55,235: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en: |
|
INFO: 2024-10-15 16:26:55,239: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.43961628016479, 'len': 0.9907918582197385, 'lcs': 1.0} |
|
INFO: 2024-10-15 16:26:55,239: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-15 16:26:55,249: llmtf.base.evaluator: |
|
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
|
0.507 0.377 0.389 0.390 0.405 0.431 0.042 1.000 0.160 0.961 0.921 0.400 0.523 0.655 0.674 0.452 0.331 |
|
INFO: 2024-10-15 16:29:43,016: llmtf.base.daru/treewayabstractive: Processing Dataset: 2685.86s |
|
INFO: 2024-10-15 16:29:43,021: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: |
|
INFO: 2024-10-15 16:29:43,025: llmtf.base.daru/treewayabstractive: {'rouge1': 0.2503995223034589, 'rouge2': 0.08558468876095006} |
|
INFO: 2024-10-15 16:29:43,031: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-15 16:29:43,056: llmtf.base.evaluator: |
|
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
|
0.487 0.168 0.377 0.389 0.390 0.405 0.431 0.042 1.000 0.160 0.961 0.921 0.400 0.523 0.655 0.674 0.452 0.331 |
|
|