license: apache-2.0
language:
- zh
- en
pipeline_tag: text-generation
Mengzi3-13B-Base
🤗 Hugging Face | 🤖 ModelScope | | Wisemodel | 💬 WeChat | API | 孟子GPT
模型介绍/Introduction
本次开源Mengzi3 13B系列模型,模型的地址如下:
Mengzi3-13B模型基于Llama架构,语料精选自网页、百科、社交、媒体、新闻,以及高质量的开源数据集。通过在万亿tokens上进行多语言语料的继续训练,模型的中文能力突出并且兼顾多语言能力。
Mengzi3-13B is based on the Llama architecture, and the corpus is selected from web pages, encyclopedias, social networking, media, news, and high-quality open source data sets. By continuing to train multilingual corpus on trillions of tokens, the model has outstanding Chinese capabilities and takes into account multilingual capabilities.
快速开始/Quickstart
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("Langboat/Mengzi3-13B-Base", use_fast=False, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Langboat/Mengzi3-13B-Base", device_map="auto", trust_remote_code=True)
inputs = tokenizer('指令:回答以下问题。输入:介绍一下孟子。输出:', return_tensors='pt')
if torch.cuda.is_available():
inputs = inputs.to('cuda')
pred = model.generate(**inputs, max_new_tokens=512, repetition_penalty=1.01, eos_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(pred[0], skip_special_tokens=True))
详细的模型推理和微调代码见Github
Detailed code of model reasoning and finetune see Github
性能评测/Evaluation
Mengzi3-13B-Base在各项基准测试中与同等参数量大语言模型相比,语言能力成绩领先,数学和编程能力位于前列。
Mengzi3-13B-Base leads in language proficiency and is at the forefront in math and programming proficiency compared to the equivalent large language model in various benchmark tests.
MMLU | CMMLU | OCNLI | GSM8K | HumanEval | |
---|---|---|---|---|---|
Baichuan2-13B-Base | 0.530 | 0.489 | 0.433 | 0.528 | 0.171 |
Qwen-14B | 0.589 | 0.539 | 0.550 | 0.613 | 0.323 |
ChatGLM3-6B-base | 0.551 | 0.495 | 0.754 | 0.723 | - |
InternLM2-20B | 0.610 | 0.538 | 0.650 | 0.761 | 0.488 |
Skywork-13B-base | 0.557 | 0.524 | 0.426 | 0.558 | - |
LingoWhale-8B | 0.541 | 0.495 | 0.352 | 0.550 | 0.329 |
DeepSeek-7B | 0.436 | 0.424 | 0.356 | 0.174 | 0.262 |
DeepSeek-MoE-16B-base | 0.423 | 0.388 | 0.342 | 0.188 | 0.268 |
MindSource-7B | 0.498 | 0.425 | 0.528 | - | - |
Mengzi3-13B-Base | 0.651 (+6.7%) | 0.588 (+9.1%) | 0.776 (+2.9%) | 0.631 | 0.287 |
以上结果基于5-shot,MMLU/CMMLU/OCNLI结果来自FlagEval
The above results are based on 5-shot,MMLU/CMMLU/OCNLI results from FlagEval
协议/License Agreement
Mengzi3-13B-Base依照Apache 2.0协议开源,对学术研究完全开放,同时支持免费商用。如需申请商业许可证,请联系我们,其他商务合作请联系[email protected]。
Mengzi3-13B-Base is open source under the Apache 2.0 protocol, fully open for academic research, and free for commercial use. If you need to apply for business license, please contact us, other business cooperation, please contact [email protected].