Mengzi3-13B-Base

模型介绍/Introduction

本次开源Mengzi3 13B系列模型，模型的地址如下:

	Mengzi3-13B-Base	Mengzi3-13B-Chat
13B	🤗 / 🤖 / MindSpore / Wisemodel	敬请期待

Mengzi3-13B模型基于Llama架构，语料精选自网页、百科、社交、媒体、新闻，以及高质量的开源数据集。通过在万亿tokens上进行多语言语料的继续训练，模型的中文能力突出并且兼顾多语言能力。

Mengzi3-13B is based on the Llama architecture, and the corpus is selected from web pages, encyclopedias, social networking, media, news, and high-quality open source data sets. By continuing to train multilingual corpus on trillions of tokens, the model has outstanding Chinese capabilities and takes into account multilingual capabilities.

快速开始/Quickstart

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Langboat/Mengzi3-13B-Base", use_fast=False, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Langboat/Mengzi3-13B-Base", device_map="auto", trust_remote_code=True)
inputs = tokenizer('指令：回答以下问题。输入：介绍一下孟子。输出：', return_tensors='pt')
if torch.cuda.is_available():
    inputs = inputs.to('cuda')
pred = model.generate(**inputs, max_new_tokens=512, repetition_penalty=1.01, eos_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(pred[0], skip_special_tokens=True))

详细的模型推理和微调代码见Github

Detailed code of model reasoning and finetune see Github

性能评测/Evaluation

Mengzi3-13B-Base在各项基准测试中与同等参数量大语言模型相比，语言能力成绩领先，数学和编程能力位于前列。

Mengzi3-13B-Base leads in language proficiency and is at the forefront in math and programming proficiency compared to the equivalent large language model in various benchmark tests.

	MMLU	CMMLU	OCNLI	GSM8K	HumanEval
Baichuan2-13B-Base	0.530	0.489	0.433	0.528	0.171
Qwen-14B	0.589	0.539	0.550	0.613	0.323
ChatGLM3-6B-base	0.551	0.495	0.754	0.723	-
InternLM2-20B	0.610	0.538	0.650	0.761	0.488
Skywork-13B-base	0.557	0.524	0.426	0.558	-
LingoWhale-8B	0.541	0.495	0.352	0.550	0.329
DeepSeek-7B	0.436	0.424	0.356	0.174	0.262
DeepSeek-MoE-16B-base	0.423	0.388	0.342	0.188	0.268
MindSource-7B	0.498	0.425	0.528	-	-
Mengzi3-13B-Base	0.651 (+6.7%)	0.588 (+9.1%)	0.776 (+2.9%)	0.631	0.287

以上结果基于5-shot，MMLU/CMMLU/OCNLI结果来自FlagEval

The above results are based on 5-shot，MMLU/CMMLU/OCNLI results from FlagEval

协议/License Agreement

Mengzi3-13B-Base依照Apache 2.0协议开源，对学术研究完全开放，同时支持免费商用。如需申请商业许可证，请联系我们，其他商务合作请联系[email protected]。

Mengzi3-13B-Base is open source under the Apache 2.0 protocol, fully open for academic research, and free for commercial use. If you need to apply for business license, please contact us, other business cooperation, please contact [email protected].