|
INFO: 2024-10-15 08:03:25,784: llmtf.base.evaluator: Starting eval on ['darumeru/multiq'] |
|
INFO: 2024-10-15 08:03:25,784: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] |
|
INFO: 2024-10-15 08:03:25,784: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-15 08:03:29,508: llmtf.base.darumeru/MultiQ: Loading Dataset: 3.72s |
|
INFO: 2024-10-15 08:08:38,771: llmtf.base.darumeru/MultiQ: Processing Dataset: 309.26s |
|
INFO: 2024-10-15 08:08:38,771: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: |
|
INFO: 2024-10-15 08:08:38,772: llmtf.base.darumeru/MultiQ: {'f1': 0.3543006348150236, 'em': 0.23996175908221798} |
|
INFO: 2024-10-15 08:08:38,777: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-15 08:08:38,778: llmtf.base.evaluator: |
|
mean darumeru/MultiQ |
|
0.297 0.297 |
|
INFO: 2024-10-15 08:08:47,368: llmtf.base.evaluator: Starting eval on ['darumeru/parus'] |
|
INFO: 2024-10-15 08:08:47,368: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] |
|
INFO: 2024-10-15 08:08:47,368: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-15 08:08:49,664: llmtf.base.darumeru/PARus: Loading Dataset: 2.30s |
|
INFO: 2024-10-15 08:08:54,092: llmtf.base.darumeru/PARus: Processing Dataset: 4.43s |
|
INFO: 2024-10-15 08:08:54,093: llmtf.base.darumeru/PARus: Results for darumeru/PARus: |
|
INFO: 2024-10-15 08:08:54,104: llmtf.base.darumeru/PARus: {'acc': 0.69} |
|
INFO: 2024-10-15 08:08:54,105: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-15 08:08:54,106: llmtf.base.evaluator: |
|
mean darumeru/MultiQ darumeru/PARus |
|
0.494 0.297 0.690 |
|
INFO: 2024-10-15 08:09:02,805: llmtf.base.evaluator: Starting eval on ['darumeru/rcb'] |
|
INFO: 2024-10-15 08:09:02,805: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] |
|
INFO: 2024-10-15 08:09:02,805: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-15 08:09:05,232: llmtf.base.darumeru/RCB: Loading Dataset: 2.43s |
|
INFO: 2024-10-15 08:09:10,833: llmtf.base.darumeru/RCB: Processing Dataset: 5.60s |
|
INFO: 2024-10-15 08:09:10,834: llmtf.base.darumeru/RCB: Results for darumeru/RCB: |
|
INFO: 2024-10-15 08:09:10,837: llmtf.base.darumeru/RCB: {'acc': 0.5409090909090909, 'f1_macro': 0.4899858481029719} |
|
INFO: 2024-10-15 08:09:10,838: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-15 08:09:10,839: llmtf.base.evaluator: |
|
mean darumeru/MultiQ darumeru/PARus darumeru/RCB |
|
0.501 0.297 0.690 0.515 |
|
INFO: 2024-10-15 08:09:19,476: llmtf.base.evaluator: Starting eval on ['darumeru/ruopenbookqa'] |
|
INFO: 2024-10-15 08:09:19,476: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] |
|
INFO: 2024-10-15 08:09:19,476: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-15 08:09:22,959: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 3.48s |
|
INFO: 2024-10-15 08:10:13,472: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 50.51s |
|
INFO: 2024-10-15 08:10:13,473: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: |
|
INFO: 2024-10-15 08:10:13,483: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7152061855670103, 'f1_macro': 0.7151629824958838} |
|
INFO: 2024-10-15 08:10:13,491: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-15 08:10:13,492: llmtf.base.evaluator: |
|
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/ruOpenBookQA |
|
0.554 0.297 0.690 0.515 0.715 |
|
INFO: 2024-10-15 08:10:22,100: llmtf.base.evaluator: Starting eval on ['darumeru/ruworldtree'] |
|
INFO: 2024-10-15 08:10:22,100: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] |
|
INFO: 2024-10-15 08:10:22,100: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-15 08:10:24,588: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.49s |
|
INFO: 2024-10-15 08:10:27,304: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 2.72s |
|
INFO: 2024-10-15 08:10:27,305: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: |
|
INFO: 2024-10-15 08:10:27,309: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8761904761904762, 'f1_macro': 0.8751507751507751} |
|
INFO: 2024-10-15 08:10:27,310: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-15 08:10:27,310: llmtf.base.evaluator: |
|
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/ruOpenBookQA darumeru/ruWorldTree |
|
0.619 0.297 0.690 0.515 0.715 0.876 |
|
INFO: 2024-10-15 08:10:36,302: llmtf.base.evaluator: Starting eval on ['darumeru/rwsd'] |
|
INFO: 2024-10-15 08:10:36,302: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] |
|
INFO: 2024-10-15 08:10:36,302: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-15 08:10:39,307: llmtf.base.darumeru/RWSD: Loading Dataset: 3.01s |
|
INFO: 2024-10-15 08:10:44,723: llmtf.base.darumeru/RWSD: Processing Dataset: 5.42s |
|
INFO: 2024-10-15 08:10:44,723: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: |
|
INFO: 2024-10-15 08:10:44,725: llmtf.base.darumeru/RWSD: {'acc': 0.5392156862745098} |
|
INFO: 2024-10-15 08:10:44,726: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-15 08:10:44,727: llmtf.base.evaluator: |
|
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree |
|
0.605 0.297 0.690 0.515 0.539 0.715 0.876 |
|
INFO: 2024-10-15 08:10:53,270: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] |
|
INFO: 2024-10-15 08:10:53,270: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] |
|
INFO: 2024-10-15 08:10:53,270: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-15 08:11:06,662: llmtf.base.daru/treewayextractive: Loading Dataset: 13.39s |
|
INFO: 2024-10-15 08:13:53,187: llmtf.base.daru/treewayextractive: Processing Dataset: 166.53s |
|
INFO: 2024-10-15 08:13:53,188: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: |
|
INFO: 2024-10-15 08:13:53,422: llmtf.base.daru/treewayextractive: {'r-prec': 0.38688455988455983} |
|
INFO: 2024-10-15 08:13:53,464: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-15 08:13:53,465: llmtf.base.evaluator: |
|
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree |
|
0.574 0.387 0.297 0.690 0.515 0.539 0.715 0.876 |
|
INFO: 2024-10-15 08:14:02,066: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] |
|
INFO: 2024-10-15 08:14:02,067: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] |
|
INFO: 2024-10-15 08:14:02,067: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-15 08:16:12,217: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 130.15s |
|
INFO: 2024-10-15 08:22:19,125: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 366.91s |
|
INFO: 2024-10-15 08:22:19,125: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: |
|
INFO: 2024-10-15 08:22:19,191: llmtf.base.nlpcoreteam/ruMMLU: metric |
|
subject |
|
abstract_algebra 0.320000 |
|
anatomy 0.444444 |
|
astronomy 0.631579 |
|
business_ethics 0.570000 |
|
clinical_knowledge 0.584906 |
|
college_biology 0.500000 |
|
college_chemistry 0.340000 |
|
college_computer_science 0.490000 |
|
college_mathematics 0.360000 |
|
college_medicine 0.537572 |
|
college_physics 0.421569 |
|
computer_security 0.580000 |
|
conceptual_physics 0.527660 |
|
econometrics 0.368421 |
|
electrical_engineering 0.524138 |
|
elementary_mathematics 0.507937 |
|
formal_logic 0.341270 |
|
global_facts 0.360000 |
|
high_school_biology 0.670968 |
|
high_school_chemistry 0.477833 |
|
high_school_computer_science 0.640000 |
|
high_school_european_history 0.727273 |
|
high_school_geography 0.707071 |
|
high_school_government_and_politics 0.595855 |
|
high_school_macroeconomics 0.525641 |
|
high_school_mathematics 0.425926 |
|
high_school_microeconomics 0.525210 |
|
high_school_physics 0.463576 |
|
high_school_psychology 0.704587 |
|
high_school_statistics 0.546296 |
|
high_school_us_history 0.651961 |
|
high_school_world_history 0.717300 |
|
human_aging 0.565022 |
|
human_sexuality 0.625954 |
|
international_law 0.719008 |
|
jurisprudence 0.638889 |
|
logical_fallacies 0.527607 |
|
machine_learning 0.392857 |
|
management 0.660194 |
|
marketing 0.722222 |
|
medical_genetics 0.560000 |
|
miscellaneous 0.625798 |
|
moral_disputes 0.575145 |
|
moral_scenarios 0.262570 |
|
nutrition 0.617647 |
|
philosophy 0.633441 |
|
prehistory 0.543210 |
|
professional_accounting 0.372340 |
|
professional_law 0.370926 |
|
professional_medicine 0.492647 |
|
professional_psychology 0.506536 |
|
public_relations 0.509091 |
|
security_studies 0.653061 |
|
sociology 0.681592 |
|
us_foreign_policy 0.710000 |
|
virology 0.433735 |
|
world_religions 0.672515 |
|
INFO: 2024-10-15 08:22:19,199: llmtf.base.nlpcoreteam/ruMMLU: metric |
|
subject |
|
STEM 0.490019 |
|
humanities 0.567778 |
|
other (business, health, misc.) 0.539038 |
|
social sciences 0.592752 |
|
INFO: 2024-10-15 08:22:19,204: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5473965020204639} |
|
INFO: 2024-10-15 08:22:19,243: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-15 08:22:19,245: llmtf.base.evaluator: |
|
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU |
|
0.571 0.387 0.297 0.690 0.515 0.539 0.715 0.876 0.547 |
|
INFO: 2024-10-15 08:22:28,449: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] |
|
INFO: 2024-10-15 08:22:28,449: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] |
|
INFO: 2024-10-15 08:22:28,449: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-15 08:24:37,142: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 128.69s |
|
INFO: 2024-10-15 08:30:16,279: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 339.14s |
|
INFO: 2024-10-15 08:30:16,280: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: |
|
INFO: 2024-10-15 08:30:16,347: llmtf.base.nlpcoreteam/enMMLU: metric |
|
subject |
|
abstract_algebra 0.370000 |
|
anatomy 0.600000 |
|
astronomy 0.703947 |
|
business_ethics 0.700000 |
|
clinical_knowledge 0.724528 |
|
college_biology 0.708333 |
|
college_chemistry 0.430000 |
|
college_computer_science 0.600000 |
|
college_mathematics 0.380000 |
|
college_medicine 0.676301 |
|
college_physics 0.480392 |
|
computer_security 0.710000 |
|
conceptual_physics 0.638298 |
|
econometrics 0.500000 |
|
electrical_engineering 0.586207 |
|
elementary_mathematics 0.544974 |
|
formal_logic 0.357143 |
|
global_facts 0.350000 |
|
high_school_biology 0.796774 |
|
high_school_chemistry 0.576355 |
|
high_school_computer_science 0.680000 |
|
high_school_european_history 0.763636 |
|
high_school_geography 0.772727 |
|
high_school_government_and_politics 0.844560 |
|
high_school_macroeconomics 0.684615 |
|
high_school_mathematics 0.466667 |
|
high_school_microeconomics 0.756303 |
|
high_school_physics 0.450331 |
|
high_school_psychology 0.847706 |
|
high_school_statistics 0.643519 |
|
high_school_us_history 0.813725 |
|
high_school_world_history 0.835443 |
|
human_aging 0.686099 |
|
human_sexuality 0.763359 |
|
international_law 0.768595 |
|
jurisprudence 0.777778 |
|
logical_fallacies 0.766871 |
|
machine_learning 0.464286 |
|
management 0.805825 |
|
marketing 0.893162 |
|
medical_genetics 0.740000 |
|
miscellaneous 0.777778 |
|
moral_disputes 0.656069 |
|
moral_scenarios 0.282682 |
|
nutrition 0.728758 |
|
philosophy 0.713826 |
|
prehistory 0.740741 |
|
professional_accounting 0.510638 |
|
professional_law 0.462842 |
|
professional_medicine 0.672794 |
|
professional_psychology 0.673203 |
|
public_relations 0.700000 |
|
security_studies 0.714286 |
|
sociology 0.805970 |
|
us_foreign_policy 0.770000 |
|
virology 0.475904 |
|
world_religions 0.807018 |
|
INFO: 2024-10-15 08:30:16,355: llmtf.base.nlpcoreteam/enMMLU: metric |
|
subject |
|
STEM 0.568338 |
|
humanities 0.672798 |
|
other (business, health, misc.) 0.667271 |
|
social sciences 0.736061 |
|
INFO: 2024-10-15 08:30:16,361: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6611166912740201} |
|
INFO: 2024-10-15 08:30:16,417: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-15 08:30:16,419: llmtf.base.evaluator: |
|
mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU |
|
0.581 0.387 0.297 0.690 0.515 0.539 0.715 0.876 0.661 0.547 |
|
INFO: 2024-10-15 08:30:25,792: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] |
|
INFO: 2024-10-15 08:30:25,792: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] |
|
INFO: 2024-10-15 08:30:25,792: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-15 08:30:29,807: llmtf.base.daru/treewayabstractive: Loading Dataset: 4.01s |
|
INFO: 2024-10-15 08:34:18,637: llmtf.base.daru/treewayabstractive: Processing Dataset: 228.83s |
|
INFO: 2024-10-15 08:34:18,637: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: |
|
INFO: 2024-10-15 08:34:18,638: llmtf.base.daru/treewayabstractive: {'rouge1': 0.33097672264173833, 'rouge2': 0.12022011135293731} |
|
INFO: 2024-10-15 08:34:18,640: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-15 08:34:18,640: llmtf.base.evaluator: |
|
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU |
|
0.545 0.226 0.387 0.297 0.690 0.515 0.539 0.715 0.876 0.661 0.547 |
|
INFO: 2024-10-15 08:34:27,535: llmtf.base.evaluator: Starting eval on ['darumeru/cp_para_ru'] |
|
INFO: 2024-10-15 08:34:27,535: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] |
|
INFO: 2024-10-15 08:34:27,535: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-10-15 08:34:30,099: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 2.56s |
|
INFO: 2024-10-15 08:37:05,943: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 155.84s |
|
INFO: 2024-10-15 08:37:05,944: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: |
|
INFO: 2024-10-15 08:37:05,944: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 3.7695317959683377, 'len': 0.9951596967747576, 'lcs': 0.9} |
|
INFO: 2024-10-15 08:37:05,945: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-15 08:37:05,946: llmtf.base.evaluator: |
|
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU |
|
0.578 0.226 0.387 0.297 0.690 0.515 0.539 0.900 0.715 0.876 0.661 0.547 |
|
|