|
--- |
|
license: other |
|
language: |
|
- zh |
|
- en |
|
datasets: |
|
- thomas-yanxin/MT-SFT-ShareGPT |
|
--- |
|
|
|
|
|
The main purpose of this model is to validate the usability of [thomas-yanxin/MT-SFT-ShareGPT](https://huggingface.co/datasets/thomas-yanxin/MT-SFT-ShareGPT), i.e., the quality of the data is all you need. We found that when we meticulously extract the data through a better data governance approach, the corresponding model results can be vastly improved, even if only through SFT. |
|
|
|
Here are the results from our OpenCompass evaluation: |
|
|
|
| Classification | Benchmarks | Models | |
|
| :------------: | :--------: | :--------: | |
|
| | 名称 | XinYuan-Qwen2-7B | |
|
| English | MMLU | 73.72 | |
|
| | MMLU-Pro | / | |
|
| | Theorem QA | / | |
|
| | GPQA | 33.04 | |
|
| | BBH | 67.55 | |
|
| | IFEval (Prompt Strict-Acc.) | 40.48 | |
|
| | ARC-C | 91.19 | |
|
| Math | GSM8K | 82.94 | |
|
| | MATH | 41.06 | |
|
| Chinese | C-EVAL | 81.02 | |
|
| | CMMLU | 80.06 | |
|
| Code | MBPP | 50.6 | |
|
| | HumanEval | 83.99 | |