|
INFO: 2024-10-14 17:12:15,109: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/ruopenbookqa', 'darumeru/ruworldtree', 'darumeru/rwsd', 'darumeru/use', 'russiannlp/rucola_custom'] |
|
INFO: 2024-10-14 17:12:15,109: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-14 17:12:15,109: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-14 17:12:15,157: llmtf.base.evaluator: Starting eval on ['darumeru/rummlu'] |
|
INFO: 2024-10-14 17:12:15,157: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-14 17:12:15,157: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-14 17:12:16,297: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] |
|
INFO: 2024-10-14 17:12:16,297: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-14 17:12:16,297: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-14 17:12:18,005: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] |
|
INFO: 2024-10-14 17:12:18,006: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-14 17:12:18,006: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-14 17:12:19,753: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] |
|
INFO: 2024-10-14 17:12:19,753: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-14 17:12:19,753: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-14 17:12:21,769: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] |
|
INFO: 2024-10-14 17:12:21,769: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-14 17:12:21,769: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-14 17:12:23,453: llmtf.base.evaluator: Starting eval on ['darumeru/cp_sent_ru', 'darumeru/cp_sent_en', 'darumeru/cp_para_ru', 'darumeru/cp_para_en'] |
|
INFO: 2024-10-14 17:12:23,453: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-14 17:12:23,453: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-14 17:12:29,425: llmtf.base.darumeru/cp_sent_ru: Loading Dataset: 5.97s |
|
INFO: 2024-10-14 17:12:39,535: llmtf.base.daru/treewayextractive: Loading Dataset: 17.77s |
|
INFO: 2024-10-14 17:12:39,537: llmtf.base.daru/treewayextractive: Processing Dataset: 0.00s |
|
ERROR: 2024-10-14 17:12:39,537: llmtf.base.evaluator: return_offset_mapping is not available when using Python tokenizers. To use this feature, change your tokenizer to one deriving from transformers.PreTrainedTokenizerFast. |
|
ERROR: 2024-10-14 17:12:39,542: llmtf.base.evaluator: Traceback (most recent call last): |
|
File "/scratch/tikhomirov/workdir/projects/llmtf_open/llmtf/evaluator.py", line 42, in evaluate |
|
self.evaluate_dataset(task, model, output_dir, prompt_max_len, few_shot_count, generation_config, batch_size, max_sample_per_dataset) |
|
File "/scratch/tikhomirov/workdir/projects/llmtf_open/llmtf/evaluator.py", line 65, in evaluate_dataset |
|
prompts, y_preds, infos = getattr(model, task.method + '_batch')(**messages_batch) |
|
File "/scratch/tikhomirov/workdir/projects/llmtf_open/llmtf/model.py", line 417, in calculate_logsoftmax_batch |
|
data = self.tokenizer( |
|
File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2829, in __call__ |
|
encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs) |
|
File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2915, in _call_one |
|
return self.batch_encode_plus( |
|
File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3106, in batch_encode_plus |
|
return self._batch_encode_plus( |
|
File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils.py", line 788, in _batch_encode_plus |
|
raise NotImplementedError( |
|
NotImplementedError: return_offset_mapping is not available when using Python tokenizers. To use this feature, change your tokenizer to one deriving from transformers.PreTrainedTokenizerFast. |
|
|
|
INFO: 2024-10-14 17:12:48,032: llmtf.base.daru/treewayabstractive: Loading Dataset: 28.28s |
|
INFO: 2024-10-14 17:12:51,035: llmtf.base.darumeru/MultiQ: Loading Dataset: 35.93s |
|
INFO: 2024-10-14 17:14:25,981: llmtf.base.darumeru/ruMMLU: Loading Dataset: 130.82s |
|
INFO: 2024-10-14 17:16:23,865: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 245.86s |
|
INFO: 2024-10-14 17:16:52,771: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 276.47s |
|
INFO: 2024-10-14 17:17:20,199: llmtf.base.darumeru/MultiQ: Processing Dataset: 269.16s |
|
INFO: 2024-10-14 17:17:20,200: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: |
|
INFO: 2024-10-14 17:17:20,204: llmtf.base.darumeru/MultiQ: {'f1': 0.48670297858620026, 'em': 0.372848948374761} |
|
INFO: 2024-10-14 17:17:20,215: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-14 17:17:20,216: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-14 17:17:23,562: llmtf.base.darumeru/PARus: Loading Dataset: 3.35s |
|
INFO: 2024-10-14 17:17:30,091: llmtf.base.darumeru/PARus: Processing Dataset: 6.53s |
|
INFO: 2024-10-14 17:17:30,093: llmtf.base.darumeru/PARus: Results for darumeru/PARus: |
|
INFO: 2024-10-14 17:17:30,106: llmtf.base.darumeru/PARus: {'acc': 0.72} |
|
INFO: 2024-10-14 17:17:30,108: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-14 17:17:30,108: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-14 17:17:34,240: llmtf.base.darumeru/RCB: Loading Dataset: 4.13s |
|
INFO: 2024-10-14 17:17:45,351: llmtf.base.darumeru/RCB: Processing Dataset: 11.11s |
|
INFO: 2024-10-14 17:17:45,353: llmtf.base.darumeru/RCB: Results for darumeru/RCB: |
|
INFO: 2024-10-14 17:17:45,362: llmtf.base.darumeru/RCB: {'acc': 0.55, 'f1_macro': 0.4643770525787481} |
|
INFO: 2024-10-14 17:17:45,365: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-14 17:17:45,366: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-14 17:18:02,200: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 16.83s |
|
INFO: 2024-10-14 17:19:12,133: llmtf.base.darumeru/cp_sent_ru: Processing Dataset: 402.71s |
|
INFO: 2024-10-14 17:19:12,136: llmtf.base.darumeru/cp_sent_ru: Results for darumeru/cp_sent_ru: |
|
INFO: 2024-10-14 17:19:12,140: llmtf.base.darumeru/cp_sent_ru: {'symbol_per_token': 3.958846363652826, 'len': 0.9810206403044489, 'lcs': 0.9815195071868583} |
|
INFO: 2024-10-14 17:19:12,144: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-14 17:19:12,144: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-14 17:19:16,530: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 74.33s |
|
INFO: 2024-10-14 17:19:16,531: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: |
|
INFO: 2024-10-14 17:19:16,545: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.7371134020618557, 'f1_macro': 0.7358424630320096} |
|
INFO: 2024-10-14 17:19:16,562: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-14 17:19:16,562: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-14 17:19:16,859: llmtf.base.darumeru/cp_sent_en: Loading Dataset: 4.71s |
|
INFO: 2024-10-14 17:19:19,074: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.51s |
|
INFO: 2024-10-14 17:19:23,376: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 4.30s |
|
INFO: 2024-10-14 17:19:23,378: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: |
|
INFO: 2024-10-14 17:19:23,383: llmtf.base.darumeru/ruWorldTree: {'acc': 0.8666666666666667, 'f1_macro': 0.8654128672745693} |
|
INFO: 2024-10-14 17:19:23,384: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-14 17:19:23,384: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-14 17:19:27,363: llmtf.base.darumeru/RWSD: Loading Dataset: 3.98s |
|
INFO: 2024-10-14 17:19:37,387: llmtf.base.darumeru/RWSD: Processing Dataset: 10.02s |
|
INFO: 2024-10-14 17:19:37,405: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: |
|
INFO: 2024-10-14 17:19:37,409: llmtf.base.darumeru/RWSD: {'acc': 0.5735294117647058} |
|
INFO: 2024-10-14 17:19:37,411: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-14 17:19:37,411: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-14 17:19:54,737: llmtf.base.darumeru/USE: Loading Dataset: 17.32s |
|
INFO: 2024-10-14 17:25:32,317: llmtf.base.darumeru/USE: Processing Dataset: 337.58s |
|
INFO: 2024-10-14 17:25:32,321: llmtf.base.darumeru/USE: Results for darumeru/USE: |
|
INFO: 2024-10-14 17:25:32,326: llmtf.base.darumeru/USE: {'grade_norm': 0.06176470588235293} |
|
INFO: 2024-10-14 17:25:32,333: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-14 17:25:32,334: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-14 17:25:48,469: llmtf.base.darumeru/ruMMLU: Processing Dataset: 682.49s |
|
INFO: 2024-10-14 17:25:48,470: llmtf.base.darumeru/ruMMLU: Results for darumeru/ruMMLU: |
|
INFO: 2024-10-14 17:25:48,496: llmtf.base.darumeru/ruMMLU: {'acc': 0.5159133991818817} |
|
INFO: 2024-10-14 17:25:48,576: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-14 17:25:48,608: llmtf.base.evaluator: |
|
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree |
|
0.599 0.430 0.720 0.507 0.574 0.062 0.981 0.516 0.736 0.866 |
|
INFO: 2024-10-14 17:25:55,560: llmtf.base.russiannlp/rucola_custom: Loading Dataset: 23.23s |
|
INFO: 2024-10-14 17:26:02,923: llmtf.base.darumeru/cp_sent_en: Processing Dataset: 406.06s |
|
INFO: 2024-10-14 17:26:02,926: llmtf.base.darumeru/cp_sent_en: Results for darumeru/cp_sent_en: |
|
INFO: 2024-10-14 17:26:02,930: llmtf.base.darumeru/cp_sent_en: {'symbol_per_token': 4.416472661458526, 'len': 0.9634384484255067, 'lcs': 1.0} |
|
INFO: 2024-10-14 17:26:02,932: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-14 17:26:02,932: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-14 17:26:07,243: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 4.31s |
|
INFO: 2024-10-14 17:27:27,303: llmtf.base.russiannlp/rucola_custom: Processing Dataset: 91.74s |
|
INFO: 2024-10-14 17:27:27,307: llmtf.base.russiannlp/rucola_custom: Results for russiannlp/rucola_custom: |
|
INFO: 2024-10-14 17:27:27,320: llmtf.base.russiannlp/rucola_custom: {'acc': 0.5999282382490133, 'mcc': 0.26826918795125926} |
|
INFO: 2024-10-14 17:27:27,332: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-14 17:27:27,344: llmtf.base.evaluator: |
|
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree russiannlp/rucola_custom |
|
0.617 0.430 0.720 0.507 0.574 0.062 0.963 0.981 0.516 0.736 0.866 0.434 |
|
INFO: 2024-10-14 17:28:31,573: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 727.71s |
|
INFO: 2024-10-14 17:28:31,575: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: |
|
INFO: 2024-10-14 17:28:31,622: llmtf.base.nlpcoreteam/enMMLU: metric |
|
subject |
|
abstract_algebra 0.380000 |
|
anatomy 0.592593 |
|
astronomy 0.736842 |
|
business_ethics 0.730000 |
|
clinical_knowledge 0.716981 |
|
college_biology 0.770833 |
|
college_chemistry 0.500000 |
|
college_computer_science 0.590000 |
|
college_mathematics 0.420000 |
|
college_medicine 0.676301 |
|
college_physics 0.509804 |
|
computer_security 0.810000 |
|
conceptual_physics 0.642553 |
|
econometrics 0.552632 |
|
electrical_engineering 0.675862 |
|
elementary_mathematics 0.603175 |
|
formal_logic 0.349206 |
|
global_facts 0.310000 |
|
high_school_biology 0.816129 |
|
high_school_chemistry 0.610837 |
|
high_school_computer_science 0.750000 |
|
high_school_european_history 0.787879 |
|
high_school_geography 0.808081 |
|
high_school_government_and_politics 0.870466 |
|
high_school_macroeconomics 0.712821 |
|
high_school_mathematics 0.518519 |
|
high_school_microeconomics 0.768908 |
|
high_school_physics 0.443709 |
|
high_school_psychology 0.860550 |
|
high_school_statistics 0.666667 |
|
high_school_us_history 0.852941 |
|
high_school_world_history 0.843882 |
|
human_aging 0.721973 |
|
human_sexuality 0.793893 |
|
international_law 0.785124 |
|
jurisprudence 0.787037 |
|
logical_fallacies 0.766871 |
|
machine_learning 0.473214 |
|
management 0.834951 |
|
marketing 0.888889 |
|
medical_genetics 0.790000 |
|
miscellaneous 0.793103 |
|
moral_disputes 0.687861 |
|
moral_scenarios 0.288268 |
|
nutrition 0.751634 |
|
philosophy 0.717042 |
|
prehistory 0.731481 |
|
professional_accounting 0.524823 |
|
professional_law 0.477184 |
|
professional_medicine 0.661765 |
|
professional_psychology 0.696078 |
|
public_relations 0.745455 |
|
security_studies 0.759184 |
|
sociology 0.830846 |
|
us_foreign_policy 0.850000 |
|
virology 0.493976 |
|
world_religions 0.807018 |
|
INFO: 2024-10-14 17:28:31,630: llmtf.base.nlpcoreteam/enMMLU: metric |
|
subject |
|
STEM 0.606564 |
|
humanities 0.683215 |
|
other (business, health, misc.) 0.677642 |
|
social sciences 0.770743 |
|
INFO: 2024-10-14 17:28:31,637: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6845408165512243} |
|
INFO: 2024-10-14 17:28:31,706: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-14 17:28:31,716: llmtf.base.evaluator: |
|
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU russiannlp/rucola_custom |
|
0.623 0.430 0.720 0.507 0.574 0.062 0.963 0.981 0.516 0.736 0.866 0.685 0.434 |
|
INFO: 2024-10-14 17:30:17,754: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 804.98s |
|
INFO: 2024-10-14 17:30:17,758: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: |
|
INFO: 2024-10-14 17:30:17,804: llmtf.base.nlpcoreteam/ruMMLU: metric |
|
subject |
|
abstract_algebra 0.340000 |
|
anatomy 0.385185 |
|
astronomy 0.651316 |
|
business_ethics 0.630000 |
|
clinical_knowledge 0.558491 |
|
college_biology 0.527778 |
|
college_chemistry 0.420000 |
|
college_computer_science 0.560000 |
|
college_mathematics 0.360000 |
|
college_medicine 0.549133 |
|
college_physics 0.450980 |
|
computer_security 0.670000 |
|
conceptual_physics 0.582979 |
|
econometrics 0.447368 |
|
electrical_engineering 0.572414 |
|
elementary_mathematics 0.529101 |
|
formal_logic 0.357143 |
|
global_facts 0.320000 |
|
high_school_biology 0.696774 |
|
high_school_chemistry 0.522167 |
|
high_school_computer_science 0.700000 |
|
high_school_european_history 0.715152 |
|
high_school_geography 0.757576 |
|
high_school_government_and_politics 0.595855 |
|
high_school_macroeconomics 0.574359 |
|
high_school_mathematics 0.444444 |
|
high_school_microeconomics 0.563025 |
|
high_school_physics 0.384106 |
|
high_school_psychology 0.693578 |
|
high_school_statistics 0.518519 |
|
high_school_us_history 0.642157 |
|
high_school_world_history 0.687764 |
|
human_aging 0.582960 |
|
human_sexuality 0.625954 |
|
international_law 0.735537 |
|
jurisprudence 0.611111 |
|
logical_fallacies 0.588957 |
|
machine_learning 0.401786 |
|
management 0.669903 |
|
marketing 0.773504 |
|
medical_genetics 0.560000 |
|
miscellaneous 0.604087 |
|
moral_disputes 0.630058 |
|
moral_scenarios 0.226816 |
|
nutrition 0.663399 |
|
philosophy 0.639871 |
|
prehistory 0.577160 |
|
professional_accounting 0.361702 |
|
professional_law 0.371578 |
|
professional_medicine 0.470588 |
|
professional_psychology 0.539216 |
|
public_relations 0.545455 |
|
security_studies 0.657143 |
|
sociology 0.726368 |
|
us_foreign_policy 0.770000 |
|
virology 0.463855 |
|
world_religions 0.678363 |
|
INFO: 2024-10-14 17:30:17,812: llmtf.base.nlpcoreteam/ruMMLU: metric |
|
subject |
|
STEM 0.518465 |
|
humanities 0.573974 |
|
other (business, health, misc.) 0.542343 |
|
social sciences 0.624658 |
|
INFO: 2024-10-14 17:30:17,835: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.5648600813341976} |
|
INFO: 2024-10-14 17:30:17,918: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-14 17:30:17,931: llmtf.base.evaluator: |
|
mean darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
|
0.618 0.430 0.720 0.507 0.574 0.062 0.963 0.981 0.516 0.736 0.866 0.685 0.565 0.434 |
|
INFO: 2024-10-14 17:35:49,735: llmtf.base.daru/treewayabstractive: Processing Dataset: 1381.70s |
|
INFO: 2024-10-14 17:35:49,744: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: |
|
INFO: 2024-10-14 17:35:49,749: llmtf.base.daru/treewayabstractive: {'rouge1': 0.3333904808228802, 'rouge2': 0.127931429649173} |
|
INFO: 2024-10-14 17:35:49,754: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-14 17:35:49,768: llmtf.base.evaluator: |
|
mean daru/treewayabstractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
|
0.591 0.231 0.430 0.720 0.507 0.574 0.062 0.963 0.981 0.516 0.736 0.866 0.685 0.565 0.434 |
|
INFO: 2024-10-14 17:36:46,624: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 639.38s |
|
INFO: 2024-10-14 17:36:46,627: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: |
|
INFO: 2024-10-14 17:36:46,630: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 3.9840096010204995, 'len': 0.9812951313228986, 'lcs': 0.98} |
|
INFO: 2024-10-14 17:36:46,632: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [147075, 198, 271] |
|
INFO: 2024-10-14 17:36:46,632: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['\n', '\n\n'] |
|
INFO: 2024-10-14 17:36:50,656: llmtf.base.darumeru/cp_para_en: Loading Dataset: 4.02s |
|
INFO: 2024-10-14 17:52:40,643: llmtf.base.darumeru/cp_para_en: Processing Dataset: 949.99s |
|
INFO: 2024-10-14 17:52:40,649: llmtf.base.darumeru/cp_para_en: Results for darumeru/cp_para_en: |
|
INFO: 2024-10-14 17:52:40,654: llmtf.base.darumeru/cp_para_en: {'symbol_per_token': 4.4342745144269164, 'len': 0.9926602787834059, 'lcs': 1.0} |
|
INFO: 2024-10-14 17:52:40,654: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-10-14 17:52:40,665: llmtf.base.evaluator: |
|
mean daru/treewayabstractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_en darumeru/cp_para_ru darumeru/cp_sent_en darumeru/cp_sent_ru darumeru/ruMMLU darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU russiannlp/rucola_custom |
|
0.641 0.231 0.430 0.720 0.507 0.574 0.062 1.000 0.980 0.963 0.981 0.516 0.736 0.866 0.685 0.565 0.434 |
|
|