File size: 42,343 Bytes
5124860
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bb7315b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
INFO: 2024-10-14 17:12:15,109: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom']
INFO: 2024-10-14 17:12:15,109: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:12:15,109: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:12:15,157: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu']
INFO: 2024-10-14 17:12:15,157: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:12:15,157: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:12:16,297: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu']
INFO: 2024-10-14 17:12:16,297: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:12:16,297: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:12:18,005: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu']
INFO: 2024-10-14 17:12:18,006: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:12:18,006: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:12:19,753: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-10-14 17:12:19,753: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:12:19,753: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:12:21,769: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive']
INFO: 2024-10-14 17:12:21,769: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:12:21,769: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:12:23,453: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en']
INFO: 2024-10-14 17:12:23,453: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:12:23,453: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:12:29,425: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 5.97s
INFO: 2024-10-14 17:12:39,535: llmtf.base.daru/treewayextractive: Loading Dataset: 17.77s
INFO: 2024-10-14 17:12:39,537: llmtf.base.daru/treewayextractive: Processing Dataset: 0.00s
ERROR: 2024-10-14 17:12:39,537: llmtf.base.evaluator: return_offset_mapping is not available when using Python tokenizers. To use this feature, change your tokenizer to one deriving from transformers.PreTrainedTokenizerFast.
ERROR: 2024-10-14 17:12:39,542: llmtf.base.evaluator: Traceback (most recent call last):
  File "/scratch/tikhomirov/workdir/projects/llmtf_open/llmtf/evaluator.py", line 42, in evaluate
    self.evaluate_dataset(task, model, output_dir, prompt_max_len, few_shot_count, generation_config, batch_size, max_sample_per_dataset)
  File "/scratch/tikhomirov/workdir/projects/llmtf_open/llmtf/evaluator.py", line 65, in evaluate_dataset
    prompts, y_preds, infos = getattr(model, task.method + '_batch')(**messages_batch)
  File "/scratch/tikhomirov/workdir/projects/llmtf_open/llmtf/model.py", line 417, in calculate_logsoftmax_batch
    data = self.tokenizer(
  File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2829, in __call__
    encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2915, in _call_one
    return self.batch_encode_plus(
  File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3106, in batch_encode_plus
    return self._batch_encode_plus(
  File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 788, in _batch_encode_plus
    raise NotImplementedError(
NotImplementedError: return_offset_mapping is not available when using Python tokenizers. To use this feature, change your tokenizer to one deriving from transformers.PreTrainedTokenizerFast.

INFO: 2024-10-14 17:12:48,032: llmtf.base.daru/treewayabstractive: Loading Dataset: 28.28s
INFO: 2024-10-14 17:12:51,035: llmtf.base.darumeru/MultiQ: Loading Dataset: 35.93s
INFO: 2024-10-14 17:14:25,981: llmtf.base.darumeru/ruMMLU: Loading Dataset: 130.82s
INFO: 2024-10-14 17:16:23,865: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 245.86s
INFO: 2024-10-14 17:16:52,771: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 276.47s
INFO: 2024-10-14 17:17:20,199: llmtf.base.darumeru/MultiQ: Processing Dataset: 269.16s
INFO: 2024-10-14 17:17:20,200: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ:
INFO: 2024-10-14 17:17:20,204: llmtf.base.darumeru/MultiQ: {'f1': 0.48670297858620026, 'em': 0.372848948374761}
INFO: 2024-10-14 17:17:20,215: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:17:20,216: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:17:23,562: llmtf.base.darumeru/PARus: Loading Dataset: 3.35s
INFO: 2024-10-14 17:17:30,091: llmtf.base.darumeru/PARus: Processing Dataset: 6.53s
INFO: 2024-10-14 17:17:30,093: llmtf.base.darumeru/PARus: Results for darumeru/PARus:
INFO: 2024-10-14 17:17:30,106: llmtf.base.darumeru/PARus: {'acc': 0.72}
INFO: 2024-10-14 17:17:30,108: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:17:30,108: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:17:34,240: llmtf.base.darumeru/RCB: Loading Dataset: 4.13s
INFO: 2024-10-14 17:17:45,351: llmtf.base.darumeru/RCB: Processing Dataset: 11.11s
INFO: 2024-10-14 17:17:45,353: llmtf.base.darumeru/RCB: Results for darumeru/RCB:
INFO: 2024-10-14 17:17:45,362: llmtf.base.darumeru/RCB: {'acc': 0.55, 'f1_macro': 0.4643770525787481}
INFO: 2024-10-14 17:17:45,365: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:17:45,366: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:18:02,200: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 16.83s
INFO: 2024-10-14 17:19:12,133: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 402.71s
INFO: 2024-10-14 17:19:12,136: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru:
INFO: 2024-10-14 17:19:12,140: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 3.958846363652826, 'len': 0.9810206403044489, 'lcs': 0.9815195071868583}
INFO: 2024-10-14 17:19:12,144: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:19:12,144: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:19:16,530: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 74.33s
INFO: 2024-10-14 17:19:16,531: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA:
INFO: 2024-10-14 17:19:16,545: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7371134020618557, 'f1_macro': 0.7358424630320096}
INFO: 2024-10-14 17:19:16,562: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:19:16,562: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:19:16,859: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 4.71s
INFO: 2024-10-14 17:19:19,074: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.51s
INFO: 2024-10-14 17:19:23,376: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 4.30s
INFO: 2024-10-14 17:19:23,378: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree:
INFO: 2024-10-14 17:19:23,383: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8666666666666667, 'f1_macro': 0.8654128672745693}
INFO: 2024-10-14 17:19:23,384: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:19:23,384: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:19:27,363: llmtf.base.darumeru/RWSD: Loading Dataset: 3.98s
INFO: 2024-10-14 17:19:37,387: llmtf.base.darumeru/RWSD: Processing Dataset: 10.02s
INFO: 2024-10-14 17:19:37,405: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD:
INFO: 2024-10-14 17:19:37,409: llmtf.base.darumeru/RWSD: {'acc': 0.5735294117647058}
INFO: 2024-10-14 17:19:37,411: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:19:37,411: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:19:54,737: llmtf.base.darumeru/USE: Loading Dataset: 17.32s
INFO: 2024-10-14 17:25:32,317: llmtf.base.darumeru/USE: Processing Dataset: 337.58s
INFO: 2024-10-14 17:25:32,321: llmtf.base.darumeru/USE: Results for darumeru/USE:
INFO: 2024-10-14 17:25:32,326: llmtf.base.darumeru/USE: {'grade_norm': 0.06176470588235293}
INFO: 2024-10-14 17:25:32,333: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:25:32,334: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:25:48,469: llmtf.base.darumeru/ruMMLU: Processing Dataset: 682.49s
INFO: 2024-10-14 17:25:48,470: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU:
INFO: 2024-10-14 17:25:48,496: llmtf.base.darumeru/ruMMLU: {'acc': 0.5159133991818817}
INFO: 2024-10-14 17:25:48,576: llmtf.base.evaluator: Ended eval
INFO: 2024-10-14 17:25:48,608: llmtf.base.evaluator: 
mean	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruWorldTree
0.599	0.430	0.720	0.507	0.574	0.062	0.981	0.516	0.736	0.866
INFO: 2024-10-14 17:25:55,560: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 23.23s
INFO: 2024-10-14 17:26:02,923: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 406.06s
INFO: 2024-10-14 17:26:02,926: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en:
INFO: 2024-10-14 17:26:02,930: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.416472661458526, 'len': 0.9634384484255067, 'lcs': 1.0}
INFO: 2024-10-14 17:26:02,932: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:26:02,932: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:26:07,243: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 4.31s
INFO: 2024-10-14 17:27:27,303: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 91.74s
INFO: 2024-10-14 17:27:27,307: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom:
INFO: 2024-10-14 17:27:27,320: llmtf.base.russiannlp/rucola_custom: {'acc': 0.5999282382490133, 'mcc': 0.26826918795125926}
INFO: 2024-10-14 17:27:27,332: llmtf.base.evaluator: Ended eval
INFO: 2024-10-14 17:27:27,344: llmtf.base.evaluator: 
mean	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruWorldTree	russiannlp/rucola_custom
0.617	0.430	0.720	0.507	0.574	0.062	0.963	0.981	0.516	0.736	0.866	0.434
INFO: 2024-10-14 17:28:31,573: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 727.71s
INFO: 2024-10-14 17:28:31,575: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU:
INFO: 2024-10-14 17:28:31,622: llmtf.base.nlpcoreteam/enMMLU:                                        metric
subject                                      
abstract_algebra                     0.380000
anatomy                              0.592593
astronomy                            0.736842
business_ethics                      0.730000
clinical_knowledge                   0.716981
college_biology                      0.770833
college_chemistry                    0.500000
college_computer_science             0.590000
college_mathematics                  0.420000
college_medicine                     0.676301
college_physics                      0.509804
computer_security                    0.810000
conceptual_physics                   0.642553
econometrics                         0.552632
electrical_engineering               0.675862
elementary_mathematics               0.603175
formal_logic                         0.349206
global_facts                         0.310000
high_school_biology                  0.816129
high_school_chemistry                0.610837
high_school_computer_science         0.750000
high_school_european_history         0.787879
high_school_geography                0.808081
high_school_government_and_politics  0.870466
high_school_macroeconomics           0.712821
high_school_mathematics              0.518519
high_school_microeconomics           0.768908
high_school_physics                  0.443709
high_school_psychology               0.860550
high_school_statistics               0.666667
high_school_us_history               0.852941
high_school_world_history            0.843882
human_aging                          0.721973
human_sexuality                      0.793893
international_law                    0.785124
jurisprudence                        0.787037
logical_fallacies                    0.766871
machine_learning                     0.473214
management                           0.834951
marketing                            0.888889
medical_genetics                     0.790000
miscellaneous                        0.793103
moral_disputes                       0.687861
moral_scenarios                      0.288268
nutrition                            0.751634
philosophy                           0.717042
prehistory                           0.731481
professional_accounting              0.524823
professional_law                     0.477184
professional_medicine                0.661765
professional_psychology              0.696078
public_relations                     0.745455
security_studies                     0.759184
sociology                            0.830846
us_foreign_policy                    0.850000
virology                             0.493976
world_religions                      0.807018
INFO: 2024-10-14 17:28:31,630: llmtf.base.nlpcoreteam/enMMLU:                                    metric
subject                                  
STEM                             0.606564
humanities                       0.683215
other (business, health, misc.)  0.677642
social sciences                  0.770743
INFO: 2024-10-14 17:28:31,637: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6845408165512243}
INFO: 2024-10-14 17:28:31,706: llmtf.base.evaluator: Ended eval
INFO: 2024-10-14 17:28:31,716: llmtf.base.evaluator: 
mean	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruWorldTree	nlpcoreteam/enMMLU	russiannlp/rucola_custom
0.623	0.430	0.720	0.507	0.574	0.062	0.963	0.981	0.516	0.736	0.866	0.685	0.434
INFO: 2024-10-14 17:30:17,754: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 804.98s
INFO: 2024-10-14 17:30:17,758: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU:
INFO: 2024-10-14 17:30:17,804: llmtf.base.nlpcoreteam/ruMMLU:                                        metric
subject                                      
abstract_algebra                     0.340000
anatomy                              0.385185
astronomy                            0.651316
business_ethics                      0.630000
clinical_knowledge                   0.558491
college_biology                      0.527778
college_chemistry                    0.420000
college_computer_science             0.560000
college_mathematics                  0.360000
college_medicine                     0.549133
college_physics                      0.450980
computer_security                    0.670000
conceptual_physics                   0.582979
econometrics                         0.447368
electrical_engineering               0.572414
elementary_mathematics               0.529101
formal_logic                         0.357143
global_facts                         0.320000
high_school_biology                  0.696774
high_school_chemistry                0.522167
high_school_computer_science         0.700000
high_school_european_history         0.715152
high_school_geography                0.757576
high_school_government_and_politics  0.595855
high_school_macroeconomics           0.574359
high_school_mathematics              0.444444
high_school_microeconomics           0.563025
high_school_physics                  0.384106
high_school_psychology               0.693578
high_school_statistics               0.518519
high_school_us_history               0.642157
high_school_world_history            0.687764
human_aging                          0.582960
human_sexuality                      0.625954
international_law                    0.735537
jurisprudence                        0.611111
logical_fallacies                    0.588957
machine_learning                     0.401786
management                           0.669903
marketing                            0.773504
medical_genetics                     0.560000
miscellaneous                        0.604087
moral_disputes                       0.630058
moral_scenarios                      0.226816
nutrition                            0.663399
philosophy                           0.639871
prehistory                           0.577160
professional_accounting              0.361702
professional_law                     0.371578
professional_medicine                0.470588
professional_psychology              0.539216
public_relations                     0.545455
security_studies                     0.657143
sociology                            0.726368
us_foreign_policy                    0.770000
virology                             0.463855
world_religions                      0.678363
INFO: 2024-10-14 17:30:17,812: llmtf.base.nlpcoreteam/ruMMLU:                                    metric
subject                                  
STEM                             0.518465
humanities                       0.573974
other (business, health, misc.)  0.542343
social sciences                  0.624658
INFO: 2024-10-14 17:30:17,835: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5648600813341976}
INFO: 2024-10-14 17:30:17,918: llmtf.base.evaluator: Ended eval
INFO: 2024-10-14 17:30:17,931: llmtf.base.evaluator: 
mean	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruWorldTree	nlpcoreteam/enMMLU	nlpcoreteam/ruMMLU	russiannlp/rucola_custom
0.618	0.430	0.720	0.507	0.574	0.062	0.963	0.981	0.516	0.736	0.866	0.685	0.565	0.434
INFO: 2024-10-14 17:35:49,735: llmtf.base.daru/treewayabstractive: Processing Dataset: 1381.70s
INFO: 2024-10-14 17:35:49,744: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive:
INFO: 2024-10-14 17:35:49,749: llmtf.base.daru/treewayabstractive: {'rouge1': 0.3333904808228802, 'rouge2': 0.127931429649173}
INFO: 2024-10-14 17:35:49,754: llmtf.base.evaluator: Ended eval
INFO: 2024-10-14 17:35:49,768: llmtf.base.evaluator: 
mean	daru/treewayabstractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruWorldTree	nlpcoreteam/enMMLU	nlpcoreteam/ruMMLU	russiannlp/rucola_custom
0.591	0.231	0.430	0.720	0.507	0.574	0.062	0.963	0.981	0.516	0.736	0.866	0.685	0.565	0.434
INFO: 2024-10-14 17:36:46,624: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 639.38s
INFO: 2024-10-14 17:36:46,627: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru:
INFO: 2024-10-14 17:36:46,630: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 3.9840096010204995, 'len': 0.9812951313228986, 'lcs': 0.98}
INFO: 2024-10-14 17:36:46,632: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:36:46,632: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:36:50,656: llmtf.base.darumeru/cp_para_en: Loading Dataset: 4.02s
INFO: 2024-10-14 17:52:40,643: llmtf.base.darumeru/cp_para_en: Processing Dataset: 949.99s
INFO: 2024-10-14 17:52:40,649: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en:
INFO: 2024-10-14 17:52:40,654: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.4342745144269164, 'len': 0.9926602787834059, 'lcs': 1.0}
INFO: 2024-10-14 17:52:40,654: llmtf.base.evaluator: Ended eval
INFO: 2024-10-14 17:52:40,665: llmtf.base.evaluator: 
mean	daru/treewayabstractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_para_en	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruWorldTree	nlpcoreteam/enMMLU	nlpcoreteam/ruMMLU	russiannlp/rucola_custom
0.641	0.231	0.430	0.720	0.507	0.574	0.062	1.000	0.980	0.963	0.981	0.516	0.736	0.866	0.685	0.565	0.434
INFO: 2024-10-14 18:14:22,049: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom']
INFO: 2024-10-14 18:14:22,051: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 18:14:22,051: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 18:14:22,798: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu']
INFO: 2024-10-14 18:14:22,799: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 18:14:22,799: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 18:14:24,660: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu']
INFO: 2024-10-14 18:14:24,661: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 18:14:24,661: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 18:14:26,344: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu']
INFO: 2024-10-14 18:14:26,345: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 18:14:26,345: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 18:14:28,729: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-10-14 18:14:28,729: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 18:14:28,729: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 18:14:30,599: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive']
INFO: 2024-10-14 18:14:30,599: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 18:14:30,599: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 18:14:32,435: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en']
INFO: 2024-10-14 18:14:32,436: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 18:14:32,436: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 18:14:36,755: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 4.32s
INFO: 2024-10-14 18:14:44,237: llmtf.base.daru/treewayextractive: Loading Dataset: 13.64s
INFO: 2024-10-14 18:14:45,157: llmtf.base.darumeru/MultiQ: Loading Dataset: 23.11s
INFO: 2024-10-14 18:14:47,866: llmtf.base.daru/treewayabstractive: Loading Dataset: 19.14s
INFO: 2024-10-14 18:15:54,592: llmtf.base.darumeru/ruMMLU: Loading Dataset: 91.79s
INFO: 2024-10-14 18:17:41,911: llmtf.base.daru/treewayextractive: Processing Dataset: 177.67s
INFO: 2024-10-14 18:17:41,914: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive:
INFO: 2024-10-14 18:17:42,127: llmtf.base.daru/treewayextractive: {'r-prec': 0.41475829725829727}
INFO: 2024-10-14 18:17:42,164: llmtf.base.evaluator: Ended eval
INFO: 2024-10-14 18:17:42,199: llmtf.base.evaluator: 
mean	daru/treewayabstractive	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_para_en	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruWorldTree	nlpcoreteam/enMMLU	nlpcoreteam/ruMMLU	russiannlp/rucola_custom
0.627	0.231	0.415	0.430	0.720	0.507	0.574	0.062	1.000	0.980	0.963	0.981	0.516	0.736	0.866	0.685	0.565	0.434
INFO: 2024-10-14 18:17:53,326: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 206.98s
INFO: 2024-10-14 18:18:33,858: llmtf.base.darumeru/MultiQ: Processing Dataset: 228.70s
INFO: 2024-10-14 18:18:33,859: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ:
INFO: 2024-10-14 18:18:33,865: llmtf.base.darumeru/MultiQ: {'f1': 0.48323651229633574, 'em': 0.367112810707457}
INFO: 2024-10-14 18:18:33,875: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 18:18:33,876: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 18:18:37,304: llmtf.base.darumeru/PARus: Loading Dataset: 3.43s
INFO: 2024-10-14 18:18:43,545: llmtf.base.darumeru/PARus: Processing Dataset: 6.24s
INFO: 2024-10-14 18:18:43,547: llmtf.base.darumeru/PARus: Results for darumeru/PARus:
INFO: 2024-10-14 18:18:43,560: llmtf.base.darumeru/PARus: {'acc': 0.72}
INFO: 2024-10-14 18:18:43,562: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 18:18:43,562: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 18:18:46,981: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 262.32s
INFO: 2024-10-14 18:18:47,349: llmtf.base.darumeru/RCB: Loading Dataset: 3.79s
INFO: 2024-10-14 18:18:58,026: llmtf.base.darumeru/RCB: Processing Dataset: 10.67s
INFO: 2024-10-14 18:18:58,028: llmtf.base.darumeru/RCB: Results for darumeru/RCB:
INFO: 2024-10-14 18:18:58,037: llmtf.base.darumeru/RCB: {'acc': 0.55, 'f1_macro': 0.4643770525787481}
INFO: 2024-10-14 18:18:58,039: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 18:18:58,039: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 18:19:12,716: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 14.68s
INFO: 2024-10-14 18:20:23,567: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 70.85s
INFO: 2024-10-14 18:20:23,569: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA:
INFO: 2024-10-14 18:20:23,597: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7371134020618557, 'f1_macro': 0.7358424630320096}
INFO: 2024-10-14 18:20:23,612: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 18:20:23,613: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 18:20:26,102: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.49s
INFO: 2024-10-14 18:20:30,186: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 4.08s
INFO: 2024-10-14 18:20:30,187: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree:
INFO: 2024-10-14 18:20:30,194: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8666666666666667, 'f1_macro': 0.8654128672745693}
INFO: 2024-10-14 18:20:30,195: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 18:20:30,196: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 18:20:34,435: llmtf.base.darumeru/RWSD: Loading Dataset: 4.24s
INFO: 2024-10-14 18:20:44,138: llmtf.base.darumeru/RWSD: Processing Dataset: 9.70s
INFO: 2024-10-14 18:20:44,140: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD:
INFO: 2024-10-14 18:20:44,175: llmtf.base.darumeru/RWSD: {'acc': 0.5686274509803921}
INFO: 2024-10-14 18:20:44,177: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 18:20:44,177: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 18:20:59,838: llmtf.base.darumeru/USE: Loading Dataset: 15.66s
INFO: 2024-10-14 18:21:17,184: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 400.43s
INFO: 2024-10-14 18:21:17,186: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru:
INFO: 2024-10-14 18:21:17,209: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 3.959579067863063, 'len': 0.980424354124537, 'lcs': 0.9815195071868583}
INFO: 2024-10-14 18:21:17,212: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 18:21:17,212: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 18:21:20,775: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 3.56s
INFO: 2024-10-14 18:26:01,388: llmtf.base.darumeru/ruMMLU: Processing Dataset: 606.76s
INFO: 2024-10-14 18:26:01,391: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU:
INFO: 2024-10-14 18:26:01,402: llmtf.base.darumeru/ruMMLU: {'acc': 0.5146163823206624}
INFO: 2024-10-14 18:26:01,478: llmtf.base.evaluator: Ended eval
INFO: 2024-10-14 18:26:01,500: llmtf.base.evaluator: 
mean	daru/treewayabstractive	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_para_en	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruWorldTree	nlpcoreteam/enMMLU	nlpcoreteam/ruMMLU	russiannlp/rucola_custom
0.627	0.231	0.415	0.425	0.720	0.507	0.569	0.062	1.000	0.980	0.963	0.980	0.515	0.736	0.866	0.685	0.565	0.434
INFO: 2024-10-14 18:26:07,030: llmtf.base.darumeru/USE: Processing Dataset: 307.19s
INFO: 2024-10-14 18:26:07,033: llmtf.base.darumeru/USE: Results for darumeru/USE:
INFO: 2024-10-14 18:26:07,038: llmtf.base.darumeru/USE: {'grade_norm': 0.054901960784313725}
INFO: 2024-10-14 18:26:07,045: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 18:26:07,045: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 18:26:27,701: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 20.66s
INFO: 2024-10-14 18:27:55,196: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 87.49s
INFO: 2024-10-14 18:27:55,199: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom:
INFO: 2024-10-14 18:27:55,212: llmtf.base.russiannlp/rucola_custom: {'acc': 0.5999282382490133, 'mcc': 0.26826918795125926}
INFO: 2024-10-14 18:27:55,223: llmtf.base.evaluator: Ended eval
INFO: 2024-10-14 18:27:55,232: llmtf.base.evaluator: 
mean	daru/treewayabstractive	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_para_en	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruWorldTree	nlpcoreteam/enMMLU	nlpcoreteam/ruMMLU	russiannlp/rucola_custom
0.626	0.231	0.415	0.425	0.720	0.507	0.569	0.055	1.000	0.980	0.963	0.980	0.515	0.736	0.866	0.685	0.565	0.434
INFO: 2024-10-14 18:28:09,184: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 408.41s
INFO: 2024-10-14 18:28:09,186: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en:
INFO: 2024-10-14 18:28:09,192: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.41667825834814, 'len': 0.9633845897145296, 'lcs': 1.0}
INFO: 2024-10-14 18:28:09,194: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 18:28:09,195: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 18:28:12,669: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 3.47s
INFO: 2024-10-14 18:28:38,665: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 645.34s
INFO: 2024-10-14 18:28:38,667: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU:
INFO: 2024-10-14 18:28:38,712: llmtf.base.nlpcoreteam/enMMLU:                                        metric
subject                                      
abstract_algebra                     0.380000
anatomy                              0.592593
astronomy                            0.736842
business_ethics                      0.720000
clinical_knowledge                   0.716981
college_biology                      0.770833
college_chemistry                    0.500000
college_computer_science             0.610000
college_mathematics                  0.410000
college_medicine                     0.676301
college_physics                      0.509804
computer_security                    0.810000
conceptual_physics                   0.638298
econometrics                         0.578947
electrical_engineering               0.675862
elementary_mathematics               0.611111
formal_logic                         0.365079
global_facts                         0.310000
high_school_biology                  0.816129
high_school_chemistry                0.605911
high_school_computer_science         0.750000
high_school_european_history         0.787879
high_school_geography                0.808081
high_school_government_and_politics  0.870466
high_school_macroeconomics           0.712821
high_school_mathematics              0.525926
high_school_microeconomics           0.768908
high_school_physics                  0.443709
high_school_psychology               0.860550
high_school_statistics               0.666667
high_school_us_history               0.852941
high_school_world_history            0.835443
human_aging                          0.721973
human_sexuality                      0.793893
international_law                    0.776860
jurisprudence                        0.787037
logical_fallacies                    0.766871
machine_learning                     0.473214
management                           0.834951
marketing                            0.888889
medical_genetics                     0.790000
miscellaneous                        0.793103
moral_disputes                       0.687861
moral_scenarios                      0.289385
nutrition                            0.748366
philosophy                           0.717042
prehistory                           0.734568
professional_accounting              0.524823
professional_law                     0.475880
professional_medicine                0.661765
professional_psychology              0.689542
public_relations                     0.745455
security_studies                     0.767347
sociology                            0.830846
us_foreign_policy                    0.850000
virology                             0.493976
world_religions                      0.807018
INFO: 2024-10-14 18:28:38,720: llmtf.base.nlpcoreteam/enMMLU:                                    metric
subject                                  
STEM                             0.607461
humanities                       0.683374
other (business, health, misc.)  0.676694
social sciences                  0.773071
INFO: 2024-10-14 18:28:38,732: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.685150317271941}
INFO: 2024-10-14 18:28:38,801: llmtf.base.evaluator: Ended eval
INFO: 2024-10-14 18:28:38,813: llmtf.base.evaluator: 
mean	daru/treewayabstractive	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_para_en	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruWorldTree	nlpcoreteam/enMMLU	nlpcoreteam/ruMMLU	russiannlp/rucola_custom
0.626	0.231	0.415	0.425	0.720	0.507	0.569	0.055	1.000	0.980	0.963	0.980	0.515	0.736	0.866	0.685	0.565	0.434
INFO: 2024-10-14 18:31:34,379: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 767.40s
INFO: 2024-10-14 18:31:34,382: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU:
INFO: 2024-10-14 18:31:34,428: llmtf.base.nlpcoreteam/ruMMLU:                                        metric
subject                                      
abstract_algebra                     0.340000
anatomy                              0.385185
astronomy                            0.651316
business_ethics                      0.630000
clinical_knowledge                   0.558491
college_biology                      0.534722
college_chemistry                    0.440000
college_computer_science             0.530000
college_mathematics                  0.360000
college_medicine                     0.560694
college_physics                      0.450980
computer_security                    0.670000
conceptual_physics                   0.582979
econometrics                         0.429825
electrical_engineering               0.572414
elementary_mathematics               0.539683
formal_logic                         0.341270
global_facts                         0.320000
high_school_biology                  0.696774
high_school_chemistry                0.512315
high_school_computer_science         0.690000
high_school_european_history         0.715152
high_school_geography                0.757576
high_school_government_and_politics  0.595855
high_school_macroeconomics           0.574359
high_school_mathematics              0.440741
high_school_microeconomics           0.563025
high_school_physics                  0.390728
high_school_psychology               0.695413
high_school_statistics               0.527778
high_school_us_history               0.637255
high_school_world_history            0.687764
human_aging                          0.582960
human_sexuality                      0.625954
international_law                    0.752066
jurisprudence                        0.620370
logical_fallacies                    0.588957
machine_learning                     0.392857
management                           0.669903
marketing                            0.773504
medical_genetics                     0.560000
miscellaneous                        0.604087
moral_disputes                       0.630058
moral_scenarios                      0.220112
nutrition                            0.679739
philosophy                           0.639871
prehistory                           0.577160
professional_accounting              0.361702
professional_law                     0.371578
professional_medicine                0.470588
professional_psychology              0.542484
public_relations                     0.545455
security_studies                     0.661224
sociology                            0.726368
us_foreign_policy                    0.770000
virology                             0.463855
world_religions                      0.678363
INFO: 2024-10-14 18:31:34,436: llmtf.base.nlpcoreteam/ruMMLU:                                    metric
subject                                  
STEM                             0.517960
humanities                       0.573844
other (business, health, misc.)  0.544336
social sciences                  0.623961
INFO: 2024-10-14 18:31:34,444: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5650255790093989}
INFO: 2024-10-14 18:31:34,525: llmtf.base.evaluator: Ended eval
INFO: 2024-10-14 18:31:34,574: llmtf.base.evaluator: 
mean	daru/treewayabstractive	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_para_en	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruWorldTree	nlpcoreteam/enMMLU	nlpcoreteam/ruMMLU	russiannlp/rucola_custom
0.626	0.231	0.415	0.425	0.720	0.507	0.569	0.055	1.000	0.980	0.963	0.980	0.515	0.736	0.866	0.685	0.565	0.434
INFO: 2024-10-14 18:37:06,192: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 533.52s
INFO: 2024-10-14 18:37:06,211: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru:
INFO: 2024-10-14 18:37:06,232: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 3.9799773492348676, 'len': 0.9741181435431592, 'lcs': 0.96}
INFO: 2024-10-14 18:37:06,233: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 18:37:06,233: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 18:37:08,580: llmtf.base.daru/treewayabstractive: Processing Dataset: 1340.71s
INFO: 2024-10-14 18:37:08,584: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive:
INFO: 2024-10-14 18:37:08,590: llmtf.base.daru/treewayabstractive: {'rouge1': 0.3325254442550525, 'rouge2': 0.12953796507174228}
INFO: 2024-10-14 18:37:08,596: llmtf.base.evaluator: Ended eval
INFO: 2024-10-14 18:37:08,607: llmtf.base.evaluator: 
mean	daru/treewayabstractive	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_para_en	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruWorldTree	nlpcoreteam/enMMLU	nlpcoreteam/ruMMLU	russiannlp/rucola_custom
0.625	0.231	0.415	0.425	0.720	0.507	0.569	0.055	1.000	0.960	0.963	0.980	0.515	0.736	0.866	0.685	0.565	0.434
INFO: 2024-10-14 18:37:09,536: llmtf.base.darumeru/cp_para_en: Loading Dataset: 3.30s
INFO: 2024-10-14 18:51:43,471: llmtf.base.darumeru/cp_para_en: Processing Dataset: 873.93s
INFO: 2024-10-14 18:51:43,473: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en:
INFO: 2024-10-14 18:51:43,478: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.435138601612895, 'len': 0.9922018370708586, 'lcs': 1.0}
INFO: 2024-10-14 18:51:43,478: llmtf.base.evaluator: Ended eval
INFO: 2024-10-14 18:51:43,491: llmtf.base.evaluator: 
mean	daru/treewayabstractive	daru/treewayextractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_para_en	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruWorldTree	nlpcoreteam/enMMLU	nlpcoreteam/ruMMLU	russiannlp/rucola_custom
0.625	0.231	0.415	0.425	0.720	0.507	0.569	0.055	1.000	0.960	0.963	0.980	0.515	0.736	0.866	0.685	0.565	0.434