RefalMachine's picture
Upload folder using huggingface_hub
7157875 verified
INFO: 2024-10-15 23:27:34,633: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom']
INFO: 2024-10-15 23:27:34,634: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 23:27:34,634: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 23:27:35,580: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu']
INFO: 2024-10-15 23:27:35,581: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 23:27:35,581: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 23:27:37,434: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu']
INFO: 2024-10-15 23:27:37,435: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 23:27:37,435: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 23:27:39,620: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu']
INFO: 2024-10-15 23:27:39,621: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 23:27:39,621: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 23:27:41,015: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-10-15 23:27:41,022: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 23:27:41,022: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 23:27:43,196: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive']
INFO: 2024-10-15 23:27:43,196: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 23:27:43,196: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 23:27:45,192: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en']
INFO: 2024-10-15 23:27:45,193: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 23:27:45,193: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 23:27:49,559: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 4.37s
INFO: 2024-10-15 23:27:56,068: llmtf.base.daru/treewayextractive: Loading Dataset: 12.87s
INFO: 2024-10-15 23:27:57,445: llmtf.base.darumeru/MultiQ: Loading Dataset: 22.81s
INFO: 2024-10-15 23:27:59,762: llmtf.base.daru/treewayabstractive: Loading Dataset: 18.74s
INFO: 2024-10-15 23:29:07,300: llmtf.base.darumeru/ruMMLU: Loading Dataset: 91.72s
INFO: 2024-10-15 23:30:55,098: llmtf.base.daru/treewayextractive: Processing Dataset: 179.02s
INFO: 2024-10-15 23:30:55,100: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive:
INFO: 2024-10-15 23:30:55,311: llmtf.base.daru/treewayextractive: {'r-prec': 0.35143051948051945}
INFO: 2024-10-15 23:30:55,347: llmtf.base.evaluator: Ended eval
INFO: 2024-10-15 23:30:55,351: llmtf.base.evaluator:
mean daru/treewayextractive
0.351 0.351
INFO: 2024-10-15 23:31:07,598: llmtf.base.darumeru/MultiQ: Processing Dataset: 190.15s
INFO: 2024-10-15 23:31:07,614: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ:
INFO: 2024-10-15 23:31:07,618: llmtf.base.darumeru/MultiQ: {'f1': 0.4149813636918578, 'em': 0.2762906309751434}
INFO: 2024-10-15 23:31:07,628: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 23:31:07,628: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 23:31:10,999: llmtf.base.darumeru/PARus: Loading Dataset: 3.37s
INFO: 2024-10-15 23:31:13,775: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 214.15s
INFO: 2024-10-15 23:31:17,369: llmtf.base.darumeru/PARus: Processing Dataset: 6.37s
INFO: 2024-10-15 23:31:17,372: llmtf.base.darumeru/PARus: Results for darumeru/PARus:
INFO: 2024-10-15 23:31:17,384: llmtf.base.darumeru/PARus: {'acc': 0.32}
INFO: 2024-10-15 23:31:17,386: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 23:31:17,386: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 23:31:21,212: llmtf.base.darumeru/RCB: Loading Dataset: 3.82s
INFO: 2024-10-15 23:31:32,123: llmtf.base.darumeru/RCB: Processing Dataset: 10.91s
INFO: 2024-10-15 23:31:32,141: llmtf.base.darumeru/RCB: Results for darumeru/RCB:
INFO: 2024-10-15 23:31:32,148: llmtf.base.darumeru/RCB: {'acc': 0.42727272727272725, 'f1_macro': 0.4213976946667694}
INFO: 2024-10-15 23:31:32,150: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 23:31:32,151: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 23:31:47,498: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 15.35s
INFO: 2024-10-15 23:32:07,223: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 269.79s
INFO: 2024-10-15 23:32:59,383: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 71.88s
INFO: 2024-10-15 23:32:59,384: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA:
INFO: 2024-10-15 23:32:59,398: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.49312714776632305, 'f1_macro': 0.48517498211245624}
INFO: 2024-10-15 23:32:59,414: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 23:32:59,415: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 23:33:02,154: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.74s
INFO: 2024-10-15 23:33:06,332: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 4.18s
INFO: 2024-10-15 23:33:06,333: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree:
INFO: 2024-10-15 23:33:06,338: llmtf.base.darumeru/ruWorldTree: {'acc': 0.5619047619047619, 'f1_macro': 0.5387530387530388}
INFO: 2024-10-15 23:33:06,340: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 23:33:06,340: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 23:33:10,149: llmtf.base.darumeru/RWSD: Loading Dataset: 3.81s
INFO: 2024-10-15 23:33:19,994: llmtf.base.darumeru/RWSD: Processing Dataset: 9.84s
INFO: 2024-10-15 23:33:19,996: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD:
INFO: 2024-10-15 23:33:20,000: llmtf.base.darumeru/RWSD: {'acc': 0.44607843137254904}
INFO: 2024-10-15 23:33:20,002: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 23:33:20,002: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 23:33:36,147: llmtf.base.darumeru/USE: Loading Dataset: 16.14s
INFO: 2024-10-15 23:34:28,409: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 398.85s
INFO: 2024-10-15 23:34:28,413: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru:
INFO: 2024-10-15 23:34:28,417: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 3.552729392451623, 'len': 0.912449166502956, 'lcs': 0.4414784394250513}
INFO: 2024-10-15 23:34:28,421: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 23:34:28,421: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 23:34:31,828: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 3.41s
INFO: 2024-10-15 23:36:47,199: llmtf.base.darumeru/USE: Processing Dataset: 191.05s
INFO: 2024-10-15 23:36:47,203: llmtf.base.darumeru/USE: Results for darumeru/USE:
INFO: 2024-10-15 23:36:47,225: llmtf.base.darumeru/USE: {'grade_norm': 0.04607843137254901}
INFO: 2024-10-15 23:36:47,231: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 23:36:47,232: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 23:37:08,589: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 21.36s
INFO: 2024-10-15 23:38:37,411: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 88.82s
INFO: 2024-10-15 23:38:37,415: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom:
INFO: 2024-10-15 23:38:37,428: llmtf.base.russiannlp/rucola_custom: {'acc': 0.5601004664513815, 'mcc': 0.09897569327848366}
INFO: 2024-10-15 23:38:37,439: llmtf.base.evaluator: Ended eval
INFO: 2024-10-15 23:38:37,451: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_ru darumeru/ruOpenBookQA darumeru/ruWorldTree russiannlp/rucola_custom
0.422 0.351 0.346 0.320 0.424 0.446 0.046 0.912 0.489 0.550 0.330
INFO: 2024-10-15 23:39:22,591: llmtf.base.darumeru/ruMMLU: Processing Dataset: 615.29s
INFO: 2024-10-15 23:39:22,593: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU:
INFO: 2024-10-15 23:39:22,633: llmtf.base.darumeru/ruMMLU: {'acc': 0.37663374239249725}
INFO: 2024-10-15 23:39:22,704: llmtf.base.evaluator: Ended eval
INFO: 2024-10-15 23:39:22,714: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree russiannlp/rucola_custom
0.417 0.351 0.346 0.320 0.424 0.446 0.046 0.912 0.377 0.489 0.550 0.330
INFO: 2024-10-15 23:41:14,360: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 402.53s
INFO: 2024-10-15 23:41:14,363: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en:
INFO: 2024-10-15 23:41:14,368: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.420697143481598, 'len': 0.9613784781803254, 'lcs': 1.0}
INFO: 2024-10-15 23:41:14,371: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 23:41:14,371: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 23:41:17,993: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 3.62s
INFO: 2024-10-15 23:42:06,906: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 653.13s
INFO: 2024-10-15 23:42:06,908: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU:
INFO: 2024-10-15 23:42:06,954: llmtf.base.nlpcoreteam/enMMLU: metric
subject
abstract_algebra 0.400000
anatomy 0.651852
astronomy 0.677632
business_ethics 0.670000
clinical_knowledge 0.735849
college_biology 0.729167
college_chemistry 0.440000
college_computer_science 0.550000
college_mathematics 0.470000
college_medicine 0.670520
college_physics 0.441176
computer_security 0.790000
conceptual_physics 0.604255
econometrics 0.543860
electrical_engineering 0.648276
elementary_mathematics 0.542328
formal_logic 0.365079
global_facts 0.330000
high_school_biology 0.806452
high_school_chemistry 0.566502
high_school_computer_science 0.780000
high_school_european_history 0.787879
high_school_geography 0.797980
high_school_government_and_politics 0.849741
high_school_macroeconomics 0.720513
high_school_mathematics 0.488889
high_school_microeconomics 0.743697
high_school_physics 0.443709
high_school_psychology 0.851376
high_school_statistics 0.587963
high_school_us_history 0.808824
high_school_world_history 0.822785
human_aging 0.726457
human_sexuality 0.778626
international_law 0.752066
jurisprudence 0.824074
logical_fallacies 0.742331
machine_learning 0.508929
management 0.834951
marketing 0.888889
medical_genetics 0.740000
miscellaneous 0.802043
moral_disputes 0.676301
moral_scenarios 0.288268
nutrition 0.754902
philosophy 0.707395
prehistory 0.753086
professional_accounting 0.510638
professional_law 0.479791
professional_medicine 0.610294
professional_psychology 0.710784
public_relations 0.709091
security_studies 0.734694
sociology 0.830846
us_foreign_policy 0.840000
virology 0.518072
world_religions 0.812865
INFO: 2024-10-15 23:42:06,961: llmtf.base.nlpcoreteam/enMMLU: metric
subject
STEM 0.581960
humanities 0.678519
other (business, health, misc.) 0.674605
social sciences 0.759267
INFO: 2024-10-15 23:42:06,984: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6735877381763914}
INFO: 2024-10-15 23:42:07,051: llmtf.base.evaluator: Ended eval
INFO: 2024-10-15 23:42:07,099: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom
0.479 0.351 0.346 0.320 0.424 0.446 0.046 0.961 0.912 0.377 0.489 0.550 0.674 0.330
INFO: 2024-10-15 23:45:18,092: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 790.87s
INFO: 2024-10-15 23:45:18,095: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU:
INFO: 2024-10-15 23:45:18,142: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
abstract_algebra 0.360000
anatomy 0.333333
astronomy 0.388158
business_ethics 0.420000
clinical_knowledge 0.411321
college_biology 0.298611
college_chemistry 0.350000
college_computer_science 0.400000
college_mathematics 0.350000
college_medicine 0.398844
college_physics 0.362745
computer_security 0.560000
conceptual_physics 0.425532
econometrics 0.350877
electrical_engineering 0.475862
elementary_mathematics 0.529101
formal_logic 0.333333
global_facts 0.250000
high_school_biology 0.435484
high_school_chemistry 0.349754
high_school_computer_science 0.600000
high_school_european_history 0.448485
high_school_geography 0.449495
high_school_government_and_politics 0.404145
high_school_macroeconomics 0.412821
high_school_mathematics 0.414815
high_school_microeconomics 0.449580
high_school_physics 0.337748
high_school_psychology 0.381651
high_school_statistics 0.398148
high_school_us_history 0.441176
high_school_world_history 0.468354
human_aging 0.426009
human_sexuality 0.442748
international_law 0.628099
jurisprudence 0.481481
logical_fallacies 0.411043
machine_learning 0.428571
management 0.398058
marketing 0.628205
medical_genetics 0.530000
miscellaneous 0.381865
moral_disputes 0.485549
moral_scenarios 0.244693
nutrition 0.529412
philosophy 0.466238
prehistory 0.410494
professional_accounting 0.343972
professional_law 0.344850
professional_medicine 0.312500
professional_psychology 0.406863
public_relations 0.418182
security_studies 0.502041
sociology 0.572139
us_foreign_policy 0.620000
virology 0.403614
world_religions 0.415205
INFO: 2024-10-15 23:45:18,149: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
STEM 0.414696
humanities 0.429154
other (business, health, misc.) 0.411938
social sciences 0.450878
INFO: 2024-10-15 23:45:18,174: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.4266666289558874}
INFO: 2024-10-15 23:45:18,249: llmtf.base.evaluator: Ended eval
INFO: 2024-10-15 23:45:18,263: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.475 0.351 0.346 0.320 0.424 0.446 0.046 0.961 0.912 0.377 0.489 0.550 0.674 0.427 0.330
INFO: 2024-10-15 23:55:25,025: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 847.01s
INFO: 2024-10-15 23:55:25,029: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru:
INFO: 2024-10-15 23:55:25,033: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 3.317211095060652, 'len': 0.7885274694294595, 'lcs': 0.11}
INFO: 2024-10-15 23:55:25,034: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-15 23:55:25,034: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-15 23:55:28,357: llmtf.base.darumeru/cp_para_en: Loading Dataset: 3.32s
INFO: 2024-10-16 00:10:03,216: llmtf.base.darumeru/cp_para_en: Processing Dataset: 874.86s
INFO: 2024-10-16 00:10:03,219: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en:
INFO: 2024-10-16 00:10:03,224: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.4363376335695355, 'len': 0.9918057024834991, 'lcs': 1.0}
INFO: 2024-10-16 00:10:03,225: llmtf.base.evaluator: Ended eval
INFO: 2024-10-16 00:10:03,237: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.485 0.351 0.346 0.320 0.424 0.446 0.046 1.000 0.110 0.961 0.912 0.377 0.489 0.550 0.674 0.427 0.330
INFO: 2024-10-16 00:13:39,224: llmtf.base.daru/treewayabstractive: Processing Dataset: 2739.46s
INFO: 2024-10-16 00:13:39,228: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive:
INFO: 2024-10-16 00:13:39,233: llmtf.base.daru/treewayabstractive: {'rouge1': 0.24324312209419716, 'rouge2': 0.07780431416219476}
INFO: 2024-10-16 00:13:39,238: llmtf.base.evaluator: Ended eval
INFO: 2024-10-16 00:13:39,261: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.466 0.161 0.351 0.346 0.320 0.424 0.446 0.046 1.000 0.110 0.961 0.912 0.377 0.489 0.550 0.674 0.427 0.330