File size: 16,576 Bytes
0baa152 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 |
INFO: 2024-10-15 08:03:25,784: llmtf.base.evaluator: Starting eval on ['darumeru/multiq'] INFO: 2024-10-15 08:03:25,784: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] INFO: 2024-10-15 08:03:25,784: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-15 08:03:29,508: llmtf.base.darumeru/MultiQ: Loading Dataset: 3.72s INFO: 2024-10-15 08:08:38,771: llmtf.base.darumeru/MultiQ: Processing Dataset: 309.26s INFO: 2024-10-15 08:08:38,771: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: INFO: 2024-10-15 08:08:38,772: llmtf.base.darumeru/MultiQ: {'f1': 0.3543006348150236, 'em': 0.23996175908221798} INFO: 2024-10-15 08:08:38,777: llmtf.base.evaluator: Ended eval INFO: 2024-10-15 08:08:38,778: llmtf.base.evaluator: mean darumeru/MultiQ 0.297 0.297 INFO: 2024-10-15 08:08:47,368: llmtf.base.evaluator: Starting eval on ['darumeru/parus'] INFO: 2024-10-15 08:08:47,368: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] INFO: 2024-10-15 08:08:47,368: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-15 08:08:49,664: llmtf.base.darumeru/PARus: Loading Dataset: 2.30s INFO: 2024-10-15 08:08:54,092: llmtf.base.darumeru/PARus: Processing Dataset: 4.43s INFO: 2024-10-15 08:08:54,093: llmtf.base.darumeru/PARus: Results for darumeru/PARus: INFO: 2024-10-15 08:08:54,104: llmtf.base.darumeru/PARus: {'acc': 0.69} INFO: 2024-10-15 08:08:54,105: llmtf.base.evaluator: Ended eval INFO: 2024-10-15 08:08:54,106: llmtf.base.evaluator: mean darumeru/MultiQ darumeru/PARus 0.494 0.297 0.690 INFO: 2024-10-15 08:09:02,805: llmtf.base.evaluator: Starting eval on ['darumeru/rcb'] INFO: 2024-10-15 08:09:02,805: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] INFO: 2024-10-15 08:09:02,805: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-15 08:09:05,232: llmtf.base.darumeru/RCB: Loading Dataset: 2.43s INFO: 2024-10-15 08:09:10,833: llmtf.base.darumeru/RCB: Processing Dataset: 5.60s INFO: 2024-10-15 08:09:10,834: llmtf.base.darumeru/RCB: Results for darumeru/RCB: INFO: 2024-10-15 08:09:10,837: llmtf.base.darumeru/RCB: {'acc': 0.5409090909090909, 'f1_macro': 0.4899858481029719} INFO: 2024-10-15 08:09:10,838: llmtf.base.evaluator: Ended eval INFO: 2024-10-15 08:09:10,839: llmtf.base.evaluator: mean darumeru/MultiQ darumeru/PARus darumeru/RCB 0.501 0.297 0.690 0.515 INFO: 2024-10-15 08:09:19,476: llmtf.base.evaluator: Starting eval on ['darumeru/ruopenbookqa'] INFO: 2024-10-15 08:09:19,476: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] INFO: 2024-10-15 08:09:19,476: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-15 08:09:22,959: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 3.48s INFO: 2024-10-15 08:10:13,472: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 50.51s INFO: 2024-10-15 08:10:13,473: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: INFO: 2024-10-15 08:10:13,483: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7152061855670103, 'f1_macro': 0.7151629824958838} INFO: 2024-10-15 08:10:13,491: llmtf.base.evaluator: Ended eval INFO: 2024-10-15 08:10:13,492: llmtf.base.evaluator: mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/ruOpenBookQA 0.554 0.297 0.690 0.515 0.715 INFO: 2024-10-15 08:10:22,100: llmtf.base.evaluator: Starting eval on ['darumeru/ruworldtree'] INFO: 2024-10-15 08:10:22,100: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] INFO: 2024-10-15 08:10:22,100: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-15 08:10:24,588: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.49s INFO: 2024-10-15 08:10:27,304: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 2.72s INFO: 2024-10-15 08:10:27,305: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: INFO: 2024-10-15 08:10:27,309: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8761904761904762, 'f1_macro': 0.8751507751507751} INFO: 2024-10-15 08:10:27,310: llmtf.base.evaluator: Ended eval INFO: 2024-10-15 08:10:27,310: llmtf.base.evaluator: mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/ruOpenBookQA darumeru/ruWorldTree 0.619 0.297 0.690 0.515 0.715 0.876 INFO: 2024-10-15 08:10:36,302: llmtf.base.evaluator: Starting eval on ['darumeru/rwsd'] INFO: 2024-10-15 08:10:36,302: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] INFO: 2024-10-15 08:10:36,302: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-15 08:10:39,307: llmtf.base.darumeru/RWSD: Loading Dataset: 3.01s INFO: 2024-10-15 08:10:44,723: llmtf.base.darumeru/RWSD: Processing Dataset: 5.42s INFO: 2024-10-15 08:10:44,723: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: INFO: 2024-10-15 08:10:44,725: llmtf.base.darumeru/RWSD: {'acc': 0.5392156862745098} INFO: 2024-10-15 08:10:44,726: llmtf.base.evaluator: Ended eval INFO: 2024-10-15 08:10:44,727: llmtf.base.evaluator: mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree 0.605 0.297 0.690 0.515 0.539 0.715 0.876 INFO: 2024-10-15 08:10:53,270: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] INFO: 2024-10-15 08:10:53,270: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] INFO: 2024-10-15 08:10:53,270: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-15 08:11:06,662: llmtf.base.daru/treewayextractive: Loading Dataset: 13.39s INFO: 2024-10-15 08:13:53,187: llmtf.base.daru/treewayextractive: Processing Dataset: 166.53s INFO: 2024-10-15 08:13:53,188: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: INFO: 2024-10-15 08:13:53,422: llmtf.base.daru/treewayextractive: {'r-prec': 0.38688455988455983} INFO: 2024-10-15 08:13:53,464: llmtf.base.evaluator: Ended eval INFO: 2024-10-15 08:13:53,465: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree 0.574 0.387 0.297 0.690 0.515 0.539 0.715 0.876 INFO: 2024-10-15 08:14:02,066: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] INFO: 2024-10-15 08:14:02,067: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] INFO: 2024-10-15 08:14:02,067: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-15 08:16:12,217: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 130.15s INFO: 2024-10-15 08:22:19,125: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 366.91s INFO: 2024-10-15 08:22:19,125: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: INFO: 2024-10-15 08:22:19,191: llmtf.base.nlpcoreteam/ruMMLU: metric subject abstract_algebra 0.320000 anatomy 0.444444 astronomy 0.631579 business_ethics 0.570000 clinical_knowledge 0.584906 college_biology 0.500000 college_chemistry 0.340000 college_computer_science 0.490000 college_mathematics 0.360000 college_medicine 0.537572 college_physics 0.421569 computer_security 0.580000 conceptual_physics 0.527660 econometrics 0.368421 electrical_engineering 0.524138 elementary_mathematics 0.507937 formal_logic 0.341270 global_facts 0.360000 high_school_biology 0.670968 high_school_chemistry 0.477833 high_school_computer_science 0.640000 high_school_european_history 0.727273 high_school_geography 0.707071 high_school_government_and_politics 0.595855 high_school_macroeconomics 0.525641 high_school_mathematics 0.425926 high_school_microeconomics 0.525210 high_school_physics 0.463576 high_school_psychology 0.704587 high_school_statistics 0.546296 high_school_us_history 0.651961 high_school_world_history 0.717300 human_aging 0.565022 human_sexuality 0.625954 international_law 0.719008 jurisprudence 0.638889 logical_fallacies 0.527607 machine_learning 0.392857 management 0.660194 marketing 0.722222 medical_genetics 0.560000 miscellaneous 0.625798 moral_disputes 0.575145 moral_scenarios 0.262570 nutrition 0.617647 philosophy 0.633441 prehistory 0.543210 professional_accounting 0.372340 professional_law 0.370926 professional_medicine 0.492647 professional_psychology 0.506536 public_relations 0.509091 security_studies 0.653061 sociology 0.681592 us_foreign_policy 0.710000 virology 0.433735 world_religions 0.672515 INFO: 2024-10-15 08:22:19,199: llmtf.base.nlpcoreteam/ruMMLU: metric subject STEM 0.490019 humanities 0.567778 other (business, health, misc.) 0.539038 social sciences 0.592752 INFO: 2024-10-15 08:22:19,204: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5473965020204639} INFO: 2024-10-15 08:22:19,243: llmtf.base.evaluator: Ended eval INFO: 2024-10-15 08:22:19,245: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU 0.571 0.387 0.297 0.690 0.515 0.539 0.715 0.876 0.547 INFO: 2024-10-15 08:22:28,449: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] INFO: 2024-10-15 08:22:28,449: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] INFO: 2024-10-15 08:22:28,449: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-15 08:24:37,142: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 128.69s INFO: 2024-10-15 08:30:16,279: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 339.14s INFO: 2024-10-15 08:30:16,280: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: INFO: 2024-10-15 08:30:16,347: llmtf.base.nlpcoreteam/enMMLU: metric subject abstract_algebra 0.370000 anatomy 0.600000 astronomy 0.703947 business_ethics 0.700000 clinical_knowledge 0.724528 college_biology 0.708333 college_chemistry 0.430000 college_computer_science 0.600000 college_mathematics 0.380000 college_medicine 0.676301 college_physics 0.480392 computer_security 0.710000 conceptual_physics 0.638298 econometrics 0.500000 electrical_engineering 0.586207 elementary_mathematics 0.544974 formal_logic 0.357143 global_facts 0.350000 high_school_biology 0.796774 high_school_chemistry 0.576355 high_school_computer_science 0.680000 high_school_european_history 0.763636 high_school_geography 0.772727 high_school_government_and_politics 0.844560 high_school_macroeconomics 0.684615 high_school_mathematics 0.466667 high_school_microeconomics 0.756303 high_school_physics 0.450331 high_school_psychology 0.847706 high_school_statistics 0.643519 high_school_us_history 0.813725 high_school_world_history 0.835443 human_aging 0.686099 human_sexuality 0.763359 international_law 0.768595 jurisprudence 0.777778 logical_fallacies 0.766871 machine_learning 0.464286 management 0.805825 marketing 0.893162 medical_genetics 0.740000 miscellaneous 0.777778 moral_disputes 0.656069 moral_scenarios 0.282682 nutrition 0.728758 philosophy 0.713826 prehistory 0.740741 professional_accounting 0.510638 professional_law 0.462842 professional_medicine 0.672794 professional_psychology 0.673203 public_relations 0.700000 security_studies 0.714286 sociology 0.805970 us_foreign_policy 0.770000 virology 0.475904 world_religions 0.807018 INFO: 2024-10-15 08:30:16,355: llmtf.base.nlpcoreteam/enMMLU: metric subject STEM 0.568338 humanities 0.672798 other (business, health, misc.) 0.667271 social sciences 0.736061 INFO: 2024-10-15 08:30:16,361: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6611166912740201} INFO: 2024-10-15 08:30:16,417: llmtf.base.evaluator: Ended eval INFO: 2024-10-15 08:30:16,419: llmtf.base.evaluator: mean daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU 0.581 0.387 0.297 0.690 0.515 0.539 0.715 0.876 0.661 0.547 INFO: 2024-10-15 08:30:25,792: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] INFO: 2024-10-15 08:30:25,792: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] INFO: 2024-10-15 08:30:25,792: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-15 08:30:29,807: llmtf.base.daru/treewayabstractive: Loading Dataset: 4.01s INFO: 2024-10-15 08:34:18,637: llmtf.base.daru/treewayabstractive: Processing Dataset: 228.83s INFO: 2024-10-15 08:34:18,637: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: INFO: 2024-10-15 08:34:18,638: llmtf.base.daru/treewayabstractive: {'rouge1': 0.33097672264173833, 'rouge2': 0.12022011135293731} INFO: 2024-10-15 08:34:18,640: llmtf.base.evaluator: Ended eval INFO: 2024-10-15 08:34:18,640: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU 0.545 0.226 0.387 0.297 0.690 0.515 0.539 0.715 0.876 0.661 0.547 INFO: 2024-10-15 08:34:27,535: llmtf.base.evaluator: Starting eval on ['darumeru/cp_para_ru'] INFO: 2024-10-15 08:34:27,535: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508] INFO: 2024-10-15 08:34:27,535: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-10-15 08:34:30,099: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 2.56s INFO: 2024-10-15 08:37:05,943: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 155.84s INFO: 2024-10-15 08:37:05,944: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: INFO: 2024-10-15 08:37:05,944: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 3.7695317959683377, 'len': 0.9951596967747576, 'lcs': 0.9} INFO: 2024-10-15 08:37:05,945: llmtf.base.evaluator: Ended eval INFO: 2024-10-15 08:37:05,946: llmtf.base.evaluator: mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU 0.578 0.226 0.387 0.297 0.690 0.515 0.539 0.900 0.715 0.876 0.661 0.547 |