RefalMachine's picture
Upload folder using huggingface_hub
5de0ce3 verified
INFO: 2024-10-15 15:44:31,612: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom']
INFO: 2024-10-15 15:44:31,613: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 15:44:31,613: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 15:44:32,407: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu']
INFO: 2024-10-15 15:44:32,407: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 15:44:32,407: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 15:44:34,013: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu']
INFO: 2024-10-15 15:44:34,014: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 15:44:34,014: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 15:44:36,362: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu']
INFO: 2024-10-15 15:44:36,363: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 15:44:36,363: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 15:44:38,148: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-10-15 15:44:38,149: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 15:44:38,149: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 15:44:40,069: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive']
INFO: 2024-10-15 15:44:40,069: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 15:44:40,069: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 15:44:42,353: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en']
INFO: 2024-10-15 15:44:42,354: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 15:44:42,354: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 15:44:46,581: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 4.23s
INFO: 2024-10-15 15:44:53,271: llmtf.base.daru/treewayextractive: Loading Dataset: 13.20s
INFO: 2024-10-15 15:44:55,800: llmtf.base.darumeru/MultiQ: Loading Dataset: 24.19s
INFO: 2024-10-15 15:44:57,156: llmtf.base.daru/treewayabstractive: Loading Dataset: 19.01s
INFO: 2024-10-15 15:46:04,231: llmtf.base.darumeru/ruMMLU: Loading Dataset: 91.82s
INFO: 2024-10-15 15:47:41,431: llmtf.base.darumeru/MultiQ: Processing Dataset: 165.60s
INFO: 2024-10-15 15:47:41,432: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ:
INFO: 2024-10-15 15:47:41,437: llmtf.base.darumeru/MultiQ: {'f1': 0.4759256275303807, 'em': 0.3030592734225621}
INFO: 2024-10-15 15:47:41,447: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 15:47:41,448: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 15:47:44,367: llmtf.base.darumeru/PARus: Loading Dataset: 2.92s
INFO: 2024-10-15 15:47:50,707: llmtf.base.darumeru/PARus: Processing Dataset: 6.34s
INFO: 2024-10-15 15:47:50,709: llmtf.base.darumeru/PARus: Results for darumeru/PARus:
INFO: 2024-10-15 15:47:50,721: llmtf.base.darumeru/PARus: {'acc': 0.39}
INFO: 2024-10-15 15:47:50,723: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 15:47:50,723: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 15:47:51,808: llmtf.base.daru/treewayextractive: Processing Dataset: 178.54s
INFO: 2024-10-15 15:47:51,811: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive:
INFO: 2024-10-15 15:47:52,026: llmtf.base.daru/treewayextractive: {'r-prec': 0.3766208513708514}
INFO: 2024-10-15 15:47:52,063: llmtf.base.evaluator: Ended eval
INFO: 2024-10-15 15:47:52,068: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus
0.385 0.377 0.389 0.390
INFO: 2024-10-15 15:47:54,545: llmtf.base.darumeru/RCB: Loading Dataset: 3.82s
INFO: 2024-10-15 15:48:04,785: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 208.42s
INFO: 2024-10-15 15:48:05,455: llmtf.base.darumeru/RCB: Processing Dataset: 10.91s
INFO: 2024-10-15 15:48:05,458: llmtf.base.darumeru/RCB: Results for darumeru/RCB:
INFO: 2024-10-15 15:48:05,463: llmtf.base.darumeru/RCB: {'acc': 0.41363636363636364, 'f1_macro': 0.3966762193460725}
INFO: 2024-10-15 15:48:05,465: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 15:48:05,465: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 15:48:20,815: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 15.35s
INFO: 2024-10-15 15:48:55,582: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 261.57s
INFO: 2024-10-15 15:49:32,641: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 71.82s
INFO: 2024-10-15 15:49:32,643: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA:
INFO: 2024-10-15 15:49:32,656: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.5240549828178694, 'f1_macro': 0.5229146059329463}
INFO: 2024-10-15 15:49:32,672: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 15:49:32,673: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 15:49:35,205: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.53s
INFO: 2024-10-15 15:49:39,346: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 4.14s
INFO: 2024-10-15 15:49:39,347: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree:
INFO: 2024-10-15 15:49:39,352: llmtf.base.darumeru/ruWorldTree: {'acc': 0.6571428571428571, 'f1_macro': 0.6537041574777424}
INFO: 2024-10-15 15:49:39,353: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 15:49:39,353: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 15:49:43,268: llmtf.base.darumeru/RWSD: Loading Dataset: 3.91s
INFO: 2024-10-15 15:49:53,085: llmtf.base.darumeru/RWSD: Processing Dataset: 9.82s
INFO: 2024-10-15 15:49:53,087: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD:
INFO: 2024-10-15 15:49:53,091: llmtf.base.darumeru/RWSD: {'acc': 0.43137254901960786}
INFO: 2024-10-15 15:49:53,092: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 15:49:53,092: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 15:50:09,717: llmtf.base.darumeru/USE: Loading Dataset: 16.62s
INFO: 2024-10-15 15:51:24,347: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 397.77s
INFO: 2024-10-15 15:51:24,350: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru:
INFO: 2024-10-15 15:51:24,354: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 3.6600682410685916, 'len': 0.9213865185211005, 'lcs': 0.5523613963039015}
INFO: 2024-10-15 15:51:24,358: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 15:51:24,358: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 15:51:27,707: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 3.35s
INFO: 2024-10-15 15:53:18,077: llmtf.base.darumeru/USE: Processing Dataset: 188.36s
INFO: 2024-10-15 15:53:18,081: llmtf.base.darumeru/USE: Results for darumeru/USE:
INFO: 2024-10-15 15:53:18,102: llmtf.base.darumeru/USE: {'grade_norm': 0.042156862745098035}
INFO: 2024-10-15 15:53:18,113: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 15:53:18,114: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 15:53:39,656: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 21.54s
INFO: 2024-10-15 15:55:08,788: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 89.13s
INFO: 2024-10-15 15:55:08,791: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom:
INFO: 2024-10-15 15:55:08,804: llmtf.base.russiannlp/rucola_custom: {'acc': 0.5073555794761392, 'mcc': 0.1555547836871833}
INFO: 2024-10-15 15:55:08,815: llmtf.base.evaluator: Ended eval
INFO: 2024-10-15 15:55:08,825: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_ru darumeru/ruOpenBookQA darumeru/ruWorldTree russiannlp/rucola_custom
0.447 0.377 0.389 0.390 0.405 0.431 0.042 0.921 0.523 0.655 0.331
INFO: 2024-10-15 15:56:27,985: llmtf.base.darumeru/ruMMLU: Processing Dataset: 623.75s
INFO: 2024-10-15 15:56:27,987: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU:
INFO: 2024-10-15 15:56:28,014: llmtf.base.darumeru/ruMMLU: {'acc': 0.400179586950015}
INFO: 2024-10-15 15:56:28,094: llmtf.base.evaluator: Ended eval
INFO: 2024-10-15 15:56:28,107: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree russiannlp/rucola_custom
0.442 0.377 0.389 0.390 0.405 0.431 0.042 0.921 0.400 0.523 0.655 0.331
INFO: 2024-10-15 15:58:10,690: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 402.98s
INFO: 2024-10-15 15:58:10,694: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en:
INFO: 2024-10-15 15:58:10,698: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.420772874407176, 'len': 0.9613467200502441, 'lcs': 1.0}
INFO: 2024-10-15 15:58:10,701: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 15:58:10,701: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 15:58:15,603: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 4.90s
INFO: 2024-10-15 15:59:03,310: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 658.52s
INFO: 2024-10-15 15:59:03,312: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU:
INFO: 2024-10-15 15:59:03,358: llmtf.base.nlpcoreteam/enMMLU: metric
subject
abstract_algebra 0.390000
anatomy 0.637037
astronomy 0.697368
business_ethics 0.690000
clinical_knowledge 0.732075
college_biology 0.743056
college_chemistry 0.460000
college_computer_science 0.530000
college_mathematics 0.460000
college_medicine 0.664740
college_physics 0.431373
computer_security 0.780000
conceptual_physics 0.617021
econometrics 0.517544
electrical_engineering 0.655172
elementary_mathematics 0.558201
formal_logic 0.373016
global_facts 0.370000
high_school_biology 0.803226
high_school_chemistry 0.566502
high_school_computer_science 0.730000
high_school_european_history 0.775758
high_school_geography 0.823232
high_school_government_and_politics 0.860104
high_school_macroeconomics 0.715385
high_school_mathematics 0.503704
high_school_microeconomics 0.777311
high_school_physics 0.417219
high_school_psychology 0.849541
high_school_statistics 0.625000
high_school_us_history 0.818627
high_school_world_history 0.827004
human_aging 0.717489
human_sexuality 0.763359
international_law 0.785124
jurisprudence 0.805556
logical_fallacies 0.742331
machine_learning 0.464286
management 0.805825
marketing 0.888889
medical_genetics 0.730000
miscellaneous 0.789272
moral_disputes 0.667630
moral_scenarios 0.273743
nutrition 0.774510
philosophy 0.707395
prehistory 0.737654
professional_accounting 0.507092
professional_law 0.476532
professional_medicine 0.621324
professional_psychology 0.696078
public_relations 0.745455
security_studies 0.722449
sociology 0.830846
us_foreign_policy 0.840000
virology 0.512048
world_religions 0.830409
INFO: 2024-10-15 15:59:03,366: llmtf.base.nlpcoreteam/enMMLU: metric
subject
STEM 0.579563
humanities 0.678522
other (business, health, misc.) 0.674307
social sciences 0.761775
INFO: 2024-10-15 15:59:03,374: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6735416670043434}
INFO: 2024-10-15 15:59:03,440: llmtf.base.evaluator: Ended eval
INFO: 2024-10-15 15:59:03,454: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom
0.500 0.377 0.389 0.390 0.405 0.431 0.042 0.961 0.921 0.400 0.523 0.655 0.674 0.331
INFO: 2024-10-15 16:02:04,868: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 789.28s
INFO: 2024-10-15 16:02:04,872: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU:
INFO: 2024-10-15 16:02:04,918: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
abstract_algebra 0.330000
anatomy 0.325926
astronomy 0.414474
business_ethics 0.460000
clinical_knowledge 0.415094
college_biology 0.340278
college_chemistry 0.380000
college_computer_science 0.420000
college_mathematics 0.360000
college_medicine 0.387283
college_physics 0.392157
computer_security 0.610000
conceptual_physics 0.451064
econometrics 0.333333
electrical_engineering 0.489655
elementary_mathematics 0.523810
formal_logic 0.309524
global_facts 0.310000
high_school_biology 0.541935
high_school_chemistry 0.369458
high_school_computer_science 0.600000
high_school_european_history 0.515152
high_school_geography 0.540404
high_school_government_and_politics 0.419689
high_school_macroeconomics 0.446154
high_school_mathematics 0.411111
high_school_microeconomics 0.500000
high_school_physics 0.317881
high_school_psychology 0.453211
high_school_statistics 0.398148
high_school_us_history 0.431373
high_school_world_history 0.497890
human_aging 0.466368
human_sexuality 0.511450
international_law 0.661157
jurisprudence 0.500000
logical_fallacies 0.423313
machine_learning 0.437500
management 0.456311
marketing 0.670940
medical_genetics 0.530000
miscellaneous 0.406130
moral_disputes 0.476879
moral_scenarios 0.240223
nutrition 0.535948
philosophy 0.485531
prehistory 0.438272
professional_accounting 0.375887
professional_law 0.342243
professional_medicine 0.367647
professional_psychology 0.428105
public_relations 0.509091
security_studies 0.567347
sociology 0.601990
us_foreign_policy 0.630000
virology 0.415663
world_religions 0.426901
INFO: 2024-10-15 16:02:04,926: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
STEM 0.432637
humanities 0.442189
other (business, health, misc.) 0.437371
social sciences 0.495065
INFO: 2024-10-15 16:02:04,934: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.45181545178768195}
INFO: 2024-10-15 16:02:05,011: llmtf.base.evaluator: Ended eval
INFO: 2024-10-15 16:02:05,026: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.497 0.377 0.389 0.390 0.405 0.431 0.042 0.961 0.921 0.400 0.523 0.655 0.674 0.452 0.331
INFO: 2024-10-15 16:12:20,022: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 844.41s
INFO: 2024-10-15 16:12:20,029: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru:
INFO: 2024-10-15 16:12:20,081: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 3.49153636504356, 'len': 0.7957149210024954, 'lcs': 0.16}
INFO: 2024-10-15 16:12:20,087: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 16:12:20,087: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 16:12:23,374: llmtf.base.darumeru/cp_para_en: Loading Dataset: 3.29s
INFO: 2024-10-15 16:26:55,233: llmtf.base.darumeru/cp_para_en: Processing Dataset: 871.86s
INFO: 2024-10-15 16:26:55,235: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en:
INFO: 2024-10-15 16:26:55,239: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.43961628016479, 'len': 0.9907918582197385, 'lcs': 1.0}
INFO: 2024-10-15 16:26:55,239: llmtf.base.evaluator: Ended eval
INFO: 2024-10-15 16:26:55,249: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.507 0.377 0.389 0.390 0.405 0.431 0.042 1.000 0.160 0.961 0.921 0.400 0.523 0.655 0.674 0.452 0.331
INFO: 2024-10-15 16:29:43,016: llmtf.base.daru/treewayabstractive: Processing Dataset: 2685.86s
INFO: 2024-10-15 16:29:43,021: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive:
INFO: 2024-10-15 16:29:43,025: llmtf.base.daru/treewayabstractive: {'rouge1': 0.2503995223034589, 'rouge2': 0.08558468876095006}
INFO: 2024-10-15 16:29:43,031: llmtf.base.evaluator: Ended eval
INFO: 2024-10-15 16:29:43,056: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.487 0.168 0.377 0.389 0.390 0.405 0.431 0.042 1.000 0.160 0.961 0.921 0.400 0.523 0.655 0.674 0.452 0.331