File size: 21,289 Bytes
5124860
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
INFO: 2024-10-14 17:12:15,109: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom']
INFO: 2024-10-14 17:12:15,109: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:12:15,109: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:12:15,157: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu']
INFO: 2024-10-14 17:12:15,157: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:12:15,157: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:12:16,297: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu']
INFO: 2024-10-14 17:12:16,297: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:12:16,297: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:12:18,005: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu']
INFO: 2024-10-14 17:12:18,006: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:12:18,006: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:12:19,753: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive']
INFO: 2024-10-14 17:12:19,753: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:12:19,753: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:12:21,769: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive']
INFO: 2024-10-14 17:12:21,769: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:12:21,769: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:12:23,453: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en']
INFO: 2024-10-14 17:12:23,453: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:12:23,453: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:12:29,425: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 5.97s
INFO: 2024-10-14 17:12:39,535: llmtf.base.daru/treewayextractive: Loading Dataset: 17.77s
INFO: 2024-10-14 17:12:39,537: llmtf.base.daru/treewayextractive: Processing Dataset: 0.00s
ERROR: 2024-10-14 17:12:39,537: llmtf.base.evaluator: return_offset_mapping is not available when using Python tokenizers. To use this feature, change your tokenizer to one deriving from transformers.PreTrainedTokenizerFast.
ERROR: 2024-10-14 17:12:39,542: llmtf.base.evaluator: Traceback (most recent call last):
  File "/scratch/tikhomirov/workdir/projects/llmtf_open/llmtf/evaluator.py", line 42, in evaluate
    self.evaluate_dataset(task, model, output_dir, prompt_max_len, few_shot_count, generation_config, batch_size, max_sample_per_dataset)
  File "/scratch/tikhomirov/workdir/projects/llmtf_open/llmtf/evaluator.py", line 65, in evaluate_dataset
    prompts, y_preds, infos = getattr(model, task.method + '_batch')(**messages_batch)
  File "/scratch/tikhomirov/workdir/projects/llmtf_open/llmtf/model.py", line 417, in calculate_logsoftmax_batch
    data = self.tokenizer(
  File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2829, in __call__
    encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2915, in _call_one
    return self.batch_encode_plus(
  File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3106, in batch_encode_plus
    return self._batch_encode_plus(
  File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 788, in _batch_encode_plus
    raise NotImplementedError(
NotImplementedError: return_offset_mapping is not available when using Python tokenizers. To use this feature, change your tokenizer to one deriving from transformers.PreTrainedTokenizerFast.

INFO: 2024-10-14 17:12:48,032: llmtf.base.daru/treewayabstractive: Loading Dataset: 28.28s
INFO: 2024-10-14 17:12:51,035: llmtf.base.darumeru/MultiQ: Loading Dataset: 35.93s
INFO: 2024-10-14 17:14:25,981: llmtf.base.darumeru/ruMMLU: Loading Dataset: 130.82s
INFO: 2024-10-14 17:16:23,865: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 245.86s
INFO: 2024-10-14 17:16:52,771: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 276.47s
INFO: 2024-10-14 17:17:20,199: llmtf.base.darumeru/MultiQ: Processing Dataset: 269.16s
INFO: 2024-10-14 17:17:20,200: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ:
INFO: 2024-10-14 17:17:20,204: llmtf.base.darumeru/MultiQ: {'f1': 0.48670297858620026, 'em': 0.372848948374761}
INFO: 2024-10-14 17:17:20,215: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:17:20,216: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:17:23,562: llmtf.base.darumeru/PARus: Loading Dataset: 3.35s
INFO: 2024-10-14 17:17:30,091: llmtf.base.darumeru/PARus: Processing Dataset: 6.53s
INFO: 2024-10-14 17:17:30,093: llmtf.base.darumeru/PARus: Results for darumeru/PARus:
INFO: 2024-10-14 17:17:30,106: llmtf.base.darumeru/PARus: {'acc': 0.72}
INFO: 2024-10-14 17:17:30,108: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:17:30,108: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:17:34,240: llmtf.base.darumeru/RCB: Loading Dataset: 4.13s
INFO: 2024-10-14 17:17:45,351: llmtf.base.darumeru/RCB: Processing Dataset: 11.11s
INFO: 2024-10-14 17:17:45,353: llmtf.base.darumeru/RCB: Results for darumeru/RCB:
INFO: 2024-10-14 17:17:45,362: llmtf.base.darumeru/RCB: {'acc': 0.55, 'f1_macro': 0.4643770525787481}
INFO: 2024-10-14 17:17:45,365: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:17:45,366: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:18:02,200: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 16.83s
INFO: 2024-10-14 17:19:12,133: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 402.71s
INFO: 2024-10-14 17:19:12,136: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru:
INFO: 2024-10-14 17:19:12,140: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 3.958846363652826, 'len': 0.9810206403044489, 'lcs': 0.9815195071868583}
INFO: 2024-10-14 17:19:12,144: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:19:12,144: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:19:16,530: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 74.33s
INFO: 2024-10-14 17:19:16,531: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA:
INFO: 2024-10-14 17:19:16,545: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7371134020618557, 'f1_macro': 0.7358424630320096}
INFO: 2024-10-14 17:19:16,562: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:19:16,562: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:19:16,859: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 4.71s
INFO: 2024-10-14 17:19:19,074: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.51s
INFO: 2024-10-14 17:19:23,376: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 4.30s
INFO: 2024-10-14 17:19:23,378: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree:
INFO: 2024-10-14 17:19:23,383: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8666666666666667, 'f1_macro': 0.8654128672745693}
INFO: 2024-10-14 17:19:23,384: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:19:23,384: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:19:27,363: llmtf.base.darumeru/RWSD: Loading Dataset: 3.98s
INFO: 2024-10-14 17:19:37,387: llmtf.base.darumeru/RWSD: Processing Dataset: 10.02s
INFO: 2024-10-14 17:19:37,405: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD:
INFO: 2024-10-14 17:19:37,409: llmtf.base.darumeru/RWSD: {'acc': 0.5735294117647058}
INFO: 2024-10-14 17:19:37,411: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:19:37,411: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:19:54,737: llmtf.base.darumeru/USE: Loading Dataset: 17.32s
INFO: 2024-10-14 17:25:32,317: llmtf.base.darumeru/USE: Processing Dataset: 337.58s
INFO: 2024-10-14 17:25:32,321: llmtf.base.darumeru/USE: Results for darumeru/USE:
INFO: 2024-10-14 17:25:32,326: llmtf.base.darumeru/USE: {'grade_norm': 0.06176470588235293}
INFO: 2024-10-14 17:25:32,333: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:25:32,334: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:25:48,469: llmtf.base.darumeru/ruMMLU: Processing Dataset: 682.49s
INFO: 2024-10-14 17:25:48,470: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU:
INFO: 2024-10-14 17:25:48,496: llmtf.base.darumeru/ruMMLU: {'acc': 0.5159133991818817}
INFO: 2024-10-14 17:25:48,576: llmtf.base.evaluator: Ended eval
INFO: 2024-10-14 17:25:48,608: llmtf.base.evaluator: 
mean	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruWorldTree
0.599	0.430	0.720	0.507	0.574	0.062	0.981	0.516	0.736	0.866
INFO: 2024-10-14 17:25:55,560: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 23.23s
INFO: 2024-10-14 17:26:02,923: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 406.06s
INFO: 2024-10-14 17:26:02,926: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en:
INFO: 2024-10-14 17:26:02,930: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.416472661458526, 'len': 0.9634384484255067, 'lcs': 1.0}
INFO: 2024-10-14 17:26:02,932: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:26:02,932: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:26:07,243: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 4.31s
INFO: 2024-10-14 17:27:27,303: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 91.74s
INFO: 2024-10-14 17:27:27,307: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom:
INFO: 2024-10-14 17:27:27,320: llmtf.base.russiannlp/rucola_custom: {'acc': 0.5999282382490133, 'mcc': 0.26826918795125926}
INFO: 2024-10-14 17:27:27,332: llmtf.base.evaluator: Ended eval
INFO: 2024-10-14 17:27:27,344: llmtf.base.evaluator: 
mean	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruWorldTree	russiannlp/rucola_custom
0.617	0.430	0.720	0.507	0.574	0.062	0.963	0.981	0.516	0.736	0.866	0.434
INFO: 2024-10-14 17:28:31,573: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 727.71s
INFO: 2024-10-14 17:28:31,575: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU:
INFO: 2024-10-14 17:28:31,622: llmtf.base.nlpcoreteam/enMMLU:                                        metric
subject                                      
abstract_algebra                     0.380000
anatomy                              0.592593
astronomy                            0.736842
business_ethics                      0.730000
clinical_knowledge                   0.716981
college_biology                      0.770833
college_chemistry                    0.500000
college_computer_science             0.590000
college_mathematics                  0.420000
college_medicine                     0.676301
college_physics                      0.509804
computer_security                    0.810000
conceptual_physics                   0.642553
econometrics                         0.552632
electrical_engineering               0.675862
elementary_mathematics               0.603175
formal_logic                         0.349206
global_facts                         0.310000
high_school_biology                  0.816129
high_school_chemistry                0.610837
high_school_computer_science         0.750000
high_school_european_history         0.787879
high_school_geography                0.808081
high_school_government_and_politics  0.870466
high_school_macroeconomics           0.712821
high_school_mathematics              0.518519
high_school_microeconomics           0.768908
high_school_physics                  0.443709
high_school_psychology               0.860550
high_school_statistics               0.666667
high_school_us_history               0.852941
high_school_world_history            0.843882
human_aging                          0.721973
human_sexuality                      0.793893
international_law                    0.785124
jurisprudence                        0.787037
logical_fallacies                    0.766871
machine_learning                     0.473214
management                           0.834951
marketing                            0.888889
medical_genetics                     0.790000
miscellaneous                        0.793103
moral_disputes                       0.687861
moral_scenarios                      0.288268
nutrition                            0.751634
philosophy                           0.717042
prehistory                           0.731481
professional_accounting              0.524823
professional_law                     0.477184
professional_medicine                0.661765
professional_psychology              0.696078
public_relations                     0.745455
security_studies                     0.759184
sociology                            0.830846
us_foreign_policy                    0.850000
virology                             0.493976
world_religions                      0.807018
INFO: 2024-10-14 17:28:31,630: llmtf.base.nlpcoreteam/enMMLU:                                    metric
subject                                  
STEM                             0.606564
humanities                       0.683215
other (business, health, misc.)  0.677642
social sciences                  0.770743
INFO: 2024-10-14 17:28:31,637: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6845408165512243}
INFO: 2024-10-14 17:28:31,706: llmtf.base.evaluator: Ended eval
INFO: 2024-10-14 17:28:31,716: llmtf.base.evaluator: 
mean	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruWorldTree	nlpcoreteam/enMMLU	russiannlp/rucola_custom
0.623	0.430	0.720	0.507	0.574	0.062	0.963	0.981	0.516	0.736	0.866	0.685	0.434
INFO: 2024-10-14 17:30:17,754: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 804.98s
INFO: 2024-10-14 17:30:17,758: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU:
INFO: 2024-10-14 17:30:17,804: llmtf.base.nlpcoreteam/ruMMLU:                                        metric
subject                                      
abstract_algebra                     0.340000
anatomy                              0.385185
astronomy                            0.651316
business_ethics                      0.630000
clinical_knowledge                   0.558491
college_biology                      0.527778
college_chemistry                    0.420000
college_computer_science             0.560000
college_mathematics                  0.360000
college_medicine                     0.549133
college_physics                      0.450980
computer_security                    0.670000
conceptual_physics                   0.582979
econometrics                         0.447368
electrical_engineering               0.572414
elementary_mathematics               0.529101
formal_logic                         0.357143
global_facts                         0.320000
high_school_biology                  0.696774
high_school_chemistry                0.522167
high_school_computer_science         0.700000
high_school_european_history         0.715152
high_school_geography                0.757576
high_school_government_and_politics  0.595855
high_school_macroeconomics           0.574359
high_school_mathematics              0.444444
high_school_microeconomics           0.563025
high_school_physics                  0.384106
high_school_psychology               0.693578
high_school_statistics               0.518519
high_school_us_history               0.642157
high_school_world_history            0.687764
human_aging                          0.582960
human_sexuality                      0.625954
international_law                    0.735537
jurisprudence                        0.611111
logical_fallacies                    0.588957
machine_learning                     0.401786
management                           0.669903
marketing                            0.773504
medical_genetics                     0.560000
miscellaneous                        0.604087
moral_disputes                       0.630058
moral_scenarios                      0.226816
nutrition                            0.663399
philosophy                           0.639871
prehistory                           0.577160
professional_accounting              0.361702
professional_law                     0.371578
professional_medicine                0.470588
professional_psychology              0.539216
public_relations                     0.545455
security_studies                     0.657143
sociology                            0.726368
us_foreign_policy                    0.770000
virology                             0.463855
world_religions                      0.678363
INFO: 2024-10-14 17:30:17,812: llmtf.base.nlpcoreteam/ruMMLU:                                    metric
subject                                  
STEM                             0.518465
humanities                       0.573974
other (business, health, misc.)  0.542343
social sciences                  0.624658
INFO: 2024-10-14 17:30:17,835: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5648600813341976}
INFO: 2024-10-14 17:30:17,918: llmtf.base.evaluator: Ended eval
INFO: 2024-10-14 17:30:17,931: llmtf.base.evaluator: 
mean	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruWorldTree	nlpcoreteam/enMMLU	nlpcoreteam/ruMMLU	russiannlp/rucola_custom
0.618	0.430	0.720	0.507	0.574	0.062	0.963	0.981	0.516	0.736	0.866	0.685	0.565	0.434
INFO: 2024-10-14 17:35:49,735: llmtf.base.daru/treewayabstractive: Processing Dataset: 1381.70s
INFO: 2024-10-14 17:35:49,744: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive:
INFO: 2024-10-14 17:35:49,749: llmtf.base.daru/treewayabstractive: {'rouge1': 0.3333904808228802, 'rouge2': 0.127931429649173}
INFO: 2024-10-14 17:35:49,754: llmtf.base.evaluator: Ended eval
INFO: 2024-10-14 17:35:49,768: llmtf.base.evaluator: 
mean	daru/treewayabstractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruWorldTree	nlpcoreteam/enMMLU	nlpcoreteam/ruMMLU	russiannlp/rucola_custom
0.591	0.231	0.430	0.720	0.507	0.574	0.062	0.963	0.981	0.516	0.736	0.866	0.685	0.565	0.434
INFO: 2024-10-14 17:36:46,624: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 639.38s
INFO: 2024-10-14 17:36:46,627: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru:
INFO: 2024-10-14 17:36:46,630: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 3.9840096010204995, 'len': 0.9812951313228986, 'lcs': 0.98}
INFO: 2024-10-14 17:36:46,632: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271]
INFO: 2024-10-14 17:36:46,632: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n']
INFO: 2024-10-14 17:36:50,656: llmtf.base.darumeru/cp_para_en: Loading Dataset: 4.02s
INFO: 2024-10-14 17:52:40,643: llmtf.base.darumeru/cp_para_en: Processing Dataset: 949.99s
INFO: 2024-10-14 17:52:40,649: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en:
INFO: 2024-10-14 17:52:40,654: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.4342745144269164, 'len': 0.9926602787834059, 'lcs': 1.0}
INFO: 2024-10-14 17:52:40,654: llmtf.base.evaluator: Ended eval
INFO: 2024-10-14 17:52:40,665: llmtf.base.evaluator: 
mean	daru/treewayabstractive	darumeru/MultiQ	darumeru/PARus	darumeru/RCB	darumeru/RWSD	darumeru/USE	darumeru/cp_para_en	darumeru/cp_para_ru	darumeru/cp_sent_en	darumeru/cp_sent_ru	darumeru/ruMMLU	darumeru/ruOpenBookQA	darumeru/ruWorldTree	nlpcoreteam/enMMLU	nlpcoreteam/ruMMLU	russiannlp/rucola_custom
0.641	0.231	0.430	0.720	0.507	0.574	0.062	1.000	0.980	0.963	0.981	0.516	0.736	0.866	0.685	0.565	0.434