File size: 16,576 Bytes
0baa152
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
INFO: 2024-10-15 08:03:25,784: llmtf.base.evaluator: Starting eval on ['darumeru/multiq']
INFO: 2024-10-15 08:03:25,784: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508]
INFO: 2024-10-15 08:03:25,784: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-15 08:03:29,508: llmtf.base.darumeru/MultiQ: Loading Dataset: 3.72s
INFO: 2024-10-15 08:08:38,771: llmtf.base.darumeru/MultiQ: Processing Dataset: 309.26s
INFO: 2024-10-15 08:08:38,771: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ:
INFO: 2024-10-15 08:08:38,772: llmtf.base.darumeru/MultiQ: {'f1': 0.3543006348150236, 'em': 0.23996175908221798}
INFO: 2024-10-15 08:08:38,777: llmtf.base.evaluator: Ended eval
INFO: 2024-10-15 08:08:38,778: llmtf.base.evaluator: 
mean	darumeru/MultiQ
0.297	0.297
INFO: 2024-10-15 08:08:47,368: llmtf.base.evaluator: Starting eval on ['darumeru/parus']
INFO: 2024-10-15 08:08:47,368: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508]
INFO: 2024-10-15 08:08:47,368: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-15 08:08:49,664: llmtf.base.darumeru/PARus: Loading Dataset: 2.30s
INFO: 2024-10-15 08:08:54,092: llmtf.base.darumeru/PARus: Processing Dataset: 4.43s
INFO: 2024-10-15 08:08:54,093: llmtf.base.darumeru/PARus: Results for darumeru/PARus:
INFO: 2024-10-15 08:08:54,104: llmtf.base.darumeru/PARus: {'acc': 0.69}
INFO: 2024-10-15 08:08:54,105: llmtf.base.evaluator: Ended eval
INFO: 2024-10-15 08:08:54,106: llmtf.base.evaluator: 
mean	darumeru/MultiQ	darumeru/PARus
0.494	0.297	0.690
INFO: 2024-10-15 08:09:02,805: llmtf.base.evaluator: Starting eval on ['darumeru/rcb']
INFO: 2024-10-15 08:09:02,805: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508]
INFO: 2024-10-15 08:09:02,805: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-15 08:09:05,232: llmtf.base.darumeru/RCB: Loading Dataset: 2.43s
INFO: 2024-10-15 08:09:10,833: llmtf.base.darumeru/RCB: Processing Dataset: 5.60s
INFO: 2024-10-15 08:09:10,834: llmtf.base.darumeru/RCB: Results for darumeru/RCB:
INFO: 2024-10-15 08:09:10,837: llmtf.base.darumeru/RCB: {'acc': 0.5409090909090909, 'f1_macro': 0.4899858481029719}
INFO: 2024-10-15 08:09:10,838: llmtf.base.evaluator: Ended eval
INFO: 2024-10-15 08:09:10,839: llmtf.base.evaluator: 
mean	darumeru/MultiQ	darumeru/PARus	darumeru/RCB
0.501	0.297	0.690	0.515
INFO: 2024-10-15 08:09:19,476: llmtf.base.evaluator: Starting eval on ['darumeru/ruopenbookqa']
INFO: 2024-10-15 08:09:19,476: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508]
INFO: 2024-10-15 08:09:19,476: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-15 08:09:22,959: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 3.48s
INFO: 2024-10-15 08:10:13,472: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 50.51s
INFO: 2024-10-15 08:10:13,473: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA:
INFO: 2024-10-15 08:10:13,483: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7152061855670103, 'f1_macro': 0.7151629824958838}
INFO: 2024-10-15 08:10:13,491: llmtf.base.evaluator: Ended eval
INFO: 2024-10-15 08:10:13,492: llmtf.base.evaluator: 
mean	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/ruOpenBookQA
0.554	0.297	0.690	0.515	0.715
INFO: 2024-10-15 08:10:22,100: llmtf.base.evaluator: Starting eval on ['darumeru/ruworldtree']
INFO: 2024-10-15 08:10:22,100: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508]
INFO: 2024-10-15 08:10:22,100: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-15 08:10:24,588: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.49s
INFO: 2024-10-15 08:10:27,304: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 2.72s
INFO: 2024-10-15 08:10:27,305: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree:
INFO: 2024-10-15 08:10:27,309: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8761904761904762, 'f1_macro': 0.8751507751507751}
INFO: 2024-10-15 08:10:27,310: llmtf.base.evaluator: Ended eval
INFO: 2024-10-15 08:10:27,310: llmtf.base.evaluator: 
mean	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/ruOpenBookQA	darumeru/ruWorldTree
0.619	0.297	0.690	0.515	0.715	0.876
INFO: 2024-10-15 08:10:36,302: llmtf.base.evaluator: Starting eval on ['darumeru/rwsd']
INFO: 2024-10-15 08:10:36,302: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508]
INFO: 2024-10-15 08:10:36,302: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-15 08:10:39,307: llmtf.base.darumeru/RWSD: Loading Dataset: 3.01s
INFO: 2024-10-15 08:10:44,723: llmtf.base.darumeru/RWSD: Processing Dataset: 5.42s
INFO: 2024-10-15 08:10:44,723: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD:
INFO: 2024-10-15 08:10:44,725: llmtf.base.darumeru/RWSD: {'acc': 0.5392156862745098}
INFO: 2024-10-15 08:10:44,726: llmtf.base.evaluator: Ended eval
INFO: 2024-10-15 08:10:44,727: llmtf.base.evaluator: 
mean	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/ruOpenBookQA	darumeru/ruWorldTree
0.605	0.297	0.690	0.515	0.539	0.715	0.876
INFO: 2024-10-15 08:10:53,270: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive']
INFO: 2024-10-15 08:10:53,270: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508]
INFO: 2024-10-15 08:10:53,270: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-15 08:11:06,662: llmtf.base.daru/treewayextractive: Loading Dataset: 13.39s
INFO: 2024-10-15 08:13:53,187: llmtf.base.daru/treewayextractive: Processing Dataset: 166.53s
INFO: 2024-10-15 08:13:53,188: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive:
INFO: 2024-10-15 08:13:53,422: llmtf.base.daru/treewayextractive: {'r-prec': 0.38688455988455983}
INFO: 2024-10-15 08:13:53,464: llmtf.base.evaluator: Ended eval
INFO: 2024-10-15 08:13:53,465: llmtf.base.evaluator: 
mean	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/ruOpenBookQA	darumeru/ruWorldTree
0.574	0.387	0.297	0.690	0.515	0.539	0.715	0.876
INFO: 2024-10-15 08:14:02,066: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu']
INFO: 2024-10-15 08:14:02,067: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508]
INFO: 2024-10-15 08:14:02,067: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-15 08:16:12,217: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 130.15s
INFO: 2024-10-15 08:22:19,125: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 366.91s
INFO: 2024-10-15 08:22:19,125: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU:
INFO: 2024-10-15 08:22:19,191: llmtf.base.nlpcoreteam/ruMMLU:                                        metric
subject                                      
abstract_algebra                     0.320000
anatomy                              0.444444
astronomy                            0.631579
business_ethics                      0.570000
clinical_knowledge                   0.584906
college_biology                      0.500000
college_chemistry                    0.340000
college_computer_science             0.490000
college_mathematics                  0.360000
college_medicine                     0.537572
college_physics                      0.421569
computer_security                    0.580000
conceptual_physics                   0.527660
econometrics                         0.368421
electrical_engineering               0.524138
elementary_mathematics               0.507937
formal_logic                         0.341270
global_facts                         0.360000
high_school_biology                  0.670968
high_school_chemistry                0.477833
high_school_computer_science         0.640000
high_school_european_history         0.727273
high_school_geography                0.707071
high_school_government_and_politics  0.595855
high_school_macroeconomics           0.525641
high_school_mathematics              0.425926
high_school_microeconomics           0.525210
high_school_physics                  0.463576
high_school_psychology               0.704587
high_school_statistics               0.546296
high_school_us_history               0.651961
high_school_world_history            0.717300
human_aging                          0.565022
human_sexuality                      0.625954
international_law                    0.719008
jurisprudence                        0.638889
logical_fallacies                    0.527607
machine_learning                     0.392857
management                           0.660194
marketing                            0.722222
medical_genetics                     0.560000
miscellaneous                        0.625798
moral_disputes                       0.575145
moral_scenarios                      0.262570
nutrition                            0.617647
philosophy                           0.633441
prehistory                           0.543210
professional_accounting              0.372340
professional_law                     0.370926
professional_medicine                0.492647
professional_psychology              0.506536
public_relations                     0.509091
security_studies                     0.653061
sociology                            0.681592
us_foreign_policy                    0.710000
virology                             0.433735
world_religions                      0.672515
INFO: 2024-10-15 08:22:19,199: llmtf.base.nlpcoreteam/ruMMLU:                                    metric
subject                                  
STEM                             0.490019
humanities                       0.567778
other (business, health, misc.)  0.539038
social sciences                  0.592752
INFO: 2024-10-15 08:22:19,204: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5473965020204639}
INFO: 2024-10-15 08:22:19,243: llmtf.base.evaluator: Ended eval
INFO: 2024-10-15 08:22:19,245: llmtf.base.evaluator: 
mean	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/ruOpenBookQA	darumeru/ruWorldTree	nlpcoreteam/ruMMLU
0.571	0.387	0.297	0.690	0.515	0.539	0.715	0.876	0.547
INFO: 2024-10-15 08:22:28,449: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu']
INFO: 2024-10-15 08:22:28,449: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508]
INFO: 2024-10-15 08:22:28,449: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-15 08:24:37,142: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 128.69s
INFO: 2024-10-15 08:30:16,279: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 339.14s
INFO: 2024-10-15 08:30:16,280: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU:
INFO: 2024-10-15 08:30:16,347: llmtf.base.nlpcoreteam/enMMLU:                                        metric
subject                                      
abstract_algebra                     0.370000
anatomy                              0.600000
astronomy                            0.703947
business_ethics                      0.700000
clinical_knowledge                   0.724528
college_biology                      0.708333
college_chemistry                    0.430000
college_computer_science             0.600000
college_mathematics                  0.380000
college_medicine                     0.676301
college_physics                      0.480392
computer_security                    0.710000
conceptual_physics                   0.638298
econometrics                         0.500000
electrical_engineering               0.586207
elementary_mathematics               0.544974
formal_logic                         0.357143
global_facts                         0.350000
high_school_biology                  0.796774
high_school_chemistry                0.576355
high_school_computer_science         0.680000
high_school_european_history         0.763636
high_school_geography                0.772727
high_school_government_and_politics  0.844560
high_school_macroeconomics           0.684615
high_school_mathematics              0.466667
high_school_microeconomics           0.756303
high_school_physics                  0.450331
high_school_psychology               0.847706
high_school_statistics               0.643519
high_school_us_history               0.813725
high_school_world_history            0.835443
human_aging                          0.686099
human_sexuality                      0.763359
international_law                    0.768595
jurisprudence                        0.777778
logical_fallacies                    0.766871
machine_learning                     0.464286
management                           0.805825
marketing                            0.893162
medical_genetics                     0.740000
miscellaneous                        0.777778
moral_disputes                       0.656069
moral_scenarios                      0.282682
nutrition                            0.728758
philosophy                           0.713826
prehistory                           0.740741
professional_accounting              0.510638
professional_law                     0.462842
professional_medicine                0.672794
professional_psychology              0.673203
public_relations                     0.700000
security_studies                     0.714286
sociology                            0.805970
us_foreign_policy                    0.770000
virology                             0.475904
world_religions                      0.807018
INFO: 2024-10-15 08:30:16,355: llmtf.base.nlpcoreteam/enMMLU:                                    metric
subject                                  
STEM                             0.568338
humanities                       0.672798
other (business, health, misc.)  0.667271
social sciences                  0.736061
INFO: 2024-10-15 08:30:16,361: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6611166912740201}
INFO: 2024-10-15 08:30:16,417: llmtf.base.evaluator: Ended eval
INFO: 2024-10-15 08:30:16,419: llmtf.base.evaluator: 
mean	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/ruOpenBookQA	darumeru/ruWorldTree	nlpcoreteam/enMMLU	nlpcoreteam/ruMMLU
0.581	0.387	0.297	0.690	0.515	0.539	0.715	0.876	0.661	0.547
INFO: 2024-10-15 08:30:25,792: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-10-15 08:30:25,792: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508]
INFO: 2024-10-15 08:30:25,792: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-15 08:30:29,807: llmtf.base.daru/treewayabstractive: Loading Dataset: 4.01s
INFO: 2024-10-15 08:34:18,637: llmtf.base.daru/treewayabstractive: Processing Dataset: 228.83s
INFO: 2024-10-15 08:34:18,637: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive:
INFO: 2024-10-15 08:34:18,638: llmtf.base.daru/treewayabstractive: {'rouge1': 0.33097672264173833, 'rouge2': 0.12022011135293731}
INFO: 2024-10-15 08:34:18,640: llmtf.base.evaluator: Ended eval
INFO: 2024-10-15 08:34:18,640: llmtf.base.evaluator: 
mean	daru/treewayabstractive	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/ruOpenBookQA	darumeru/ruWorldTree	nlpcoreteam/enMMLU	nlpcoreteam/ruMMLU
0.545	0.226	0.387	0.297	0.690	0.515	0.539	0.715	0.876	0.661	0.547
INFO: 2024-10-15 08:34:27,535: llmtf.base.evaluator: Starting eval on ['darumeru/cp_para_ru']
INFO: 2024-10-15 08:34:27,535: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [131508]
INFO: 2024-10-15 08:34:27,535: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>']
INFO: 2024-10-15 08:34:30,099: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 2.56s
INFO: 2024-10-15 08:37:05,943: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 155.84s
INFO: 2024-10-15 08:37:05,944: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru:
INFO: 2024-10-15 08:37:05,944: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 3.7695317959683377, 'len': 0.9951596967747576, 'lcs': 0.9}
INFO: 2024-10-15 08:37:05,945: llmtf.base.evaluator: Ended eval
INFO: 2024-10-15 08:37:05,946: llmtf.base.evaluator: 
mean	daru/treewayabstractive	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/cp_para_ru	darumeru/ruOpenBookQA	darumeru/ruWorldTree	nlpcoreteam/enMMLU	nlpcoreteam/ruMMLU
0.578	0.226	0.387	0.297	0.690	0.515	0.539	0.900	0.715	0.876	0.661	0.547