RefalMachine's picture
Upload folder using huggingface_hub
af1529b verified
INFO: 2024-10-16 01:02:49,290: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom']
INFO: 2024-10-16 01:02:49,293: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-16 01:02:49,293: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-16 01:02:49,570: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu']
INFO: 2024-10-16 01:02:49,571: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-16 01:02:49,571: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-16 01:02:51,977: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu']
INFO: 2024-10-16 01:02:51,979: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-16 01:02:51,979: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-16 01:02:53,943: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu']
INFO: 2024-10-16 01:02:53,943: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-16 01:02:53,943: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-16 01:02:55,483: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-10-16 01:02:55,483: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-16 01:02:55,483: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-16 01:02:57,009: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive']
INFO: 2024-10-16 01:02:57,010: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-16 01:02:57,010: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-16 01:02:58,829: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en']
INFO: 2024-10-16 01:02:58,830: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-16 01:02:58,830: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-16 01:03:03,414: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 4.58s
INFO: 2024-10-16 01:03:10,009: llmtf.base.daru/treewayextractive: Loading Dataset: 13.00s
INFO: 2024-10-16 01:03:12,441: llmtf.base.darumeru/MultiQ: Loading Dataset: 23.15s
INFO: 2024-10-16 01:03:14,674: llmtf.base.daru/treewayabstractive: Loading Dataset: 19.19s
INFO: 2024-10-16 01:04:20,981: llmtf.base.darumeru/ruMMLU: Loading Dataset: 91.41s
INFO: 2024-10-16 01:06:08,702: llmtf.base.daru/treewayextractive: Processing Dataset: 178.69s
INFO: 2024-10-16 01:06:08,705: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive:
INFO: 2024-10-16 01:06:08,931: llmtf.base.daru/treewayextractive: {'r-prec': 0.392920202020202}
INFO: 2024-10-16 01:06:08,968: llmtf.base.evaluator: Ended eval
INFO: 2024-10-16 01:06:08,972: llmtf.base.evaluator:
mean daru/treewayextractive
0.393 0.393
INFO: 2024-10-16 01:06:21,426: llmtf.base.darumeru/MultiQ: Processing Dataset: 188.98s
INFO: 2024-10-16 01:06:21,427: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ:
INFO: 2024-10-16 01:06:21,432: llmtf.base.darumeru/MultiQ: {'f1': 0.49389044741748994, 'em': 0.3738049713193117}
INFO: 2024-10-16 01:06:21,442: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-16 01:06:21,442: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-16 01:06:24,486: llmtf.base.darumeru/PARus: Loading Dataset: 3.04s
INFO: 2024-10-16 01:06:30,785: llmtf.base.darumeru/PARus: Processing Dataset: 6.30s
INFO: 2024-10-16 01:06:30,787: llmtf.base.darumeru/PARus: Results for darumeru/PARus:
INFO: 2024-10-16 01:06:30,815: llmtf.base.darumeru/PARus: {'acc': 0.42}
INFO: 2024-10-16 01:06:30,816: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-16 01:06:30,816: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-16 01:06:34,602: llmtf.base.darumeru/RCB: Loading Dataset: 3.79s
INFO: 2024-10-16 01:06:35,622: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 221.68s
INFO: 2024-10-16 01:06:45,442: llmtf.base.darumeru/RCB: Processing Dataset: 10.84s
INFO: 2024-10-16 01:06:45,444: llmtf.base.darumeru/RCB: Results for darumeru/RCB:
INFO: 2024-10-16 01:06:45,451: llmtf.base.darumeru/RCB: {'acc': 0.5, 'f1_macro': 0.4788148636316176}
INFO: 2024-10-16 01:06:45,459: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-16 01:06:45,460: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-16 01:07:00,710: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 15.25s
INFO: 2024-10-16 01:07:31,366: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 279.39s
INFO: 2024-10-16 01:08:12,346: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 71.63s
INFO: 2024-10-16 01:08:12,347: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA:
INFO: 2024-10-16 01:08:12,361: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.5451030927835051, 'f1_macro': 0.5357593897587198}
INFO: 2024-10-16 01:08:12,377: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-16 01:08:12,378: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-16 01:08:15,157: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.78s
INFO: 2024-10-16 01:08:19,267: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 4.11s
INFO: 2024-10-16 01:08:19,268: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree:
INFO: 2024-10-16 01:08:19,273: llmtf.base.darumeru/ruWorldTree: {'acc': 0.7238095238095238, 'f1_macro': 0.719567254381039}
INFO: 2024-10-16 01:08:19,274: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-16 01:08:19,274: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-16 01:08:23,048: llmtf.base.darumeru/RWSD: Loading Dataset: 3.77s
INFO: 2024-10-16 01:08:32,809: llmtf.base.darumeru/RWSD: Processing Dataset: 9.76s
INFO: 2024-10-16 01:08:32,811: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD:
INFO: 2024-10-16 01:08:32,815: llmtf.base.darumeru/RWSD: {'acc': 0.43137254901960786}
INFO: 2024-10-16 01:08:32,817: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-16 01:08:32,817: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-16 01:08:48,749: llmtf.base.darumeru/USE: Loading Dataset: 15.93s
INFO: 2024-10-16 01:09:43,090: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 399.68s
INFO: 2024-10-16 01:09:43,093: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru:
INFO: 2024-10-16 01:09:43,097: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 3.900423523440511, 'len': 0.9417878040943446, 'lcs': 0.6981519507186859}
INFO: 2024-10-16 01:09:43,101: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-16 01:09:43,101: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-16 01:09:46,551: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 3.45s
INFO: 2024-10-16 01:12:13,957: llmtf.base.darumeru/USE: Processing Dataset: 205.20s
INFO: 2024-10-16 01:12:13,963: llmtf.base.darumeru/USE: Results for darumeru/USE:
INFO: 2024-10-16 01:12:13,968: llmtf.base.darumeru/USE: {'grade_norm': 0.049999999999999996}
INFO: 2024-10-16 01:12:13,975: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-16 01:12:13,975: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-16 01:12:35,082: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 21.11s
INFO: 2024-10-16 01:14:03,862: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 88.78s
INFO: 2024-10-16 01:14:03,866: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom:
INFO: 2024-10-16 01:14:03,877: llmtf.base.russiannlp/rucola_custom: {'acc': 0.4628632938643703, 'mcc': 0.14354674065192544}
INFO: 2024-10-16 01:14:03,888: llmtf.base.evaluator: Ended eval
INFO: 2024-10-16 01:14:03,934: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_ru darumeru/ruOpenBookQA darumeru/ruWorldTree russiannlp/rucola_custom
0.472 0.393 0.434 0.420 0.489 0.431 0.050 0.942 0.540 0.722 0.303
INFO: 2024-10-16 01:14:37,979: llmtf.base.darumeru/ruMMLU: Processing Dataset: 616.99s
INFO: 2024-10-16 01:14:37,996: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU:
INFO: 2024-10-16 01:14:38,020: llmtf.base.darumeru/ruMMLU: {'acc': 0.37793075925371644}
INFO: 2024-10-16 01:14:38,092: llmtf.base.evaluator: Ended eval
INFO: 2024-10-16 01:14:38,102: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree russiannlp/rucola_custom
0.464 0.393 0.434 0.420 0.489 0.431 0.050 0.942 0.378 0.540 0.722 0.303
INFO: 2024-10-16 01:16:29,537: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 402.97s
INFO: 2024-10-16 01:16:29,540: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en:
INFO: 2024-10-16 01:16:29,544: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.420697143481598, 'len': 0.9613784781803254, 'lcs': 1.0}
INFO: 2024-10-16 01:16:29,547: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-16 01:16:29,547: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-16 01:16:32,809: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 3.26s
INFO: 2024-10-16 01:17:34,153: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 658.53s
INFO: 2024-10-16 01:17:34,155: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU:
INFO: 2024-10-16 01:17:34,201: llmtf.base.nlpcoreteam/enMMLU: metric
subject
abstract_algebra 0.340000
anatomy 0.592593
astronomy 0.684211
business_ethics 0.690000
clinical_knowledge 0.720755
college_biology 0.722222
college_chemistry 0.420000
college_computer_science 0.560000
college_mathematics 0.460000
college_medicine 0.682081
college_physics 0.519608
computer_security 0.800000
conceptual_physics 0.625532
econometrics 0.473684
electrical_engineering 0.682759
elementary_mathematics 0.595238
formal_logic 0.365079
global_facts 0.340000
high_school_biology 0.800000
high_school_chemistry 0.571429
high_school_computer_science 0.760000
high_school_european_history 0.775758
high_school_geography 0.797980
high_school_government_and_politics 0.860104
high_school_macroeconomics 0.702564
high_school_mathematics 0.481481
high_school_microeconomics 0.810924
high_school_physics 0.430464
high_school_psychology 0.840367
high_school_statistics 0.625000
high_school_us_history 0.799020
high_school_world_history 0.827004
human_aging 0.672646
human_sexuality 0.763359
international_law 0.760331
jurisprudence 0.814815
logical_fallacies 0.760736
machine_learning 0.482143
management 0.825243
marketing 0.897436
medical_genetics 0.760000
miscellaneous 0.777778
moral_disputes 0.661850
moral_scenarios 0.293855
nutrition 0.751634
philosophy 0.697749
prehistory 0.700617
professional_accounting 0.496454
professional_law 0.453064
professional_medicine 0.636029
professional_psychology 0.660131
public_relations 0.736364
security_studies 0.734694
sociology 0.810945
us_foreign_policy 0.870000
virology 0.530120
world_religions 0.801170
INFO: 2024-10-16 01:17:34,209: llmtf.base.nlpcoreteam/enMMLU: metric
subject
STEM 0.586671
humanities 0.670081
other (business, health, misc.) 0.669483
social sciences 0.755093
INFO: 2024-10-16 01:17:34,217: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6703320839095225}
INFO: 2024-10-16 01:17:34,284: llmtf.base.evaluator: Ended eval
INFO: 2024-10-16 01:17:34,298: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom
0.518 0.393 0.434 0.420 0.489 0.431 0.050 0.961 0.942 0.378 0.540 0.722 0.670 0.303
INFO: 2024-10-16 01:20:37,498: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 786.13s
INFO: 2024-10-16 01:20:37,501: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU:
INFO: 2024-10-16 01:20:37,548: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
abstract_algebra 0.280000
anatomy 0.259259
astronomy 0.394737
business_ethics 0.520000
clinical_knowledge 0.392453
college_biology 0.375000
college_chemistry 0.350000
college_computer_science 0.460000
college_mathematics 0.390000
college_medicine 0.358382
college_physics 0.411765
computer_security 0.580000
conceptual_physics 0.446809
econometrics 0.307018
electrical_engineering 0.427586
elementary_mathematics 0.515873
formal_logic 0.357143
global_facts 0.260000
high_school_biology 0.490323
high_school_chemistry 0.325123
high_school_computer_science 0.620000
high_school_european_history 0.496970
high_school_geography 0.545455
high_school_government_and_politics 0.476684
high_school_macroeconomics 0.471795
high_school_mathematics 0.362963
high_school_microeconomics 0.487395
high_school_physics 0.344371
high_school_psychology 0.467890
high_school_statistics 0.453704
high_school_us_history 0.455882
high_school_world_history 0.556962
human_aging 0.443946
human_sexuality 0.541985
international_law 0.619835
jurisprudence 0.509259
logical_fallacies 0.447853
machine_learning 0.401786
management 0.466019
marketing 0.675214
medical_genetics 0.510000
miscellaneous 0.413793
moral_disputes 0.468208
moral_scenarios 0.237989
nutrition 0.539216
philosophy 0.501608
prehistory 0.425926
professional_accounting 0.333333
professional_law 0.344198
professional_medicine 0.345588
professional_psychology 0.444444
public_relations 0.454545
security_studies 0.534694
sociology 0.601990
us_foreign_policy 0.610000
virology 0.463855
world_religions 0.438596
INFO: 2024-10-16 01:20:37,555: llmtf.base.nlpcoreteam/ruMMLU: metric
subject
STEM 0.423891
humanities 0.450802
other (business, health, misc.) 0.427218
social sciences 0.495325
INFO: 2024-10-16 01:20:37,563: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.4493090597333154}
INFO: 2024-10-16 01:20:37,641: llmtf.base.evaluator: Ended eval
INFO: 2024-10-16 01:20:37,655: llmtf.base.evaluator:
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.513 0.393 0.434 0.420 0.489 0.431 0.050 0.961 0.942 0.378 0.540 0.722 0.670 0.449 0.303
INFO: 2024-10-16 01:30:09,544: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 816.73s
INFO: 2024-10-16 01:30:09,547: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru:
INFO: 2024-10-16 01:30:09,581: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 3.828156059562894, 'len': 0.9521096414122393, 'lcs': 0.37}
INFO: 2024-10-16 01:30:09,583: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-16 01:30:09,583: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-16 01:30:12,888: llmtf.base.darumeru/cp_para_en: Loading Dataset: 3.30s
INFO: 2024-10-16 01:43:00,225: llmtf.base.daru/treewayabstractive: Processing Dataset: 2385.55s
INFO: 2024-10-16 01:43:00,233: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive:
INFO: 2024-10-16 01:43:00,251: llmtf.base.daru/treewayabstractive: {'rouge1': 0.2792817830105651, 'rouge2': 0.09909942829468928}
INFO: 2024-10-16 01:43:00,256: llmtf.base.evaluator: Ended eval
INFO: 2024-10-16 01:43:00,267: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.484 0.189 0.393 0.434 0.420 0.489 0.431 0.050 0.370 0.961 0.942 0.378 0.540 0.722 0.670 0.449 0.303
INFO: 2024-10-16 01:44:44,037: llmtf.base.darumeru/cp_para_en: Processing Dataset: 871.15s
INFO: 2024-10-16 01:44:44,040: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en:
INFO: 2024-10-16 01:44:44,044: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.439886096406914, 'len': 0.9905967745509706, 'lcs': 1.0}
INFO: 2024-10-16 01:44:44,045: llmtf.base.evaluator: Ended eval
INFO: 2024-10-16 01:44:44,054: llmtf.base.evaluator:
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom
0.514 0.189 0.393 0.434 0.420 0.489 0.431 0.050 1.000 0.370 0.961 0.942 0.378 0.540 0.722 0.670 0.449 0.303