BAAI
/

Aquila-7B

Transformers

PyTorch

aquila

custom_code

Inference Endpoints

Model card Files Files and versions Community

Anhforth commited on Jun 10, 2023

Commit

977be54

•

1 Parent(s): f8a2e45

Update README.md

Browse files

Files changed (1) hide show

README.md +15 -12

README.md CHANGED Viewed

@@ -9,13 +9,14 @@ Aquila语言大模型在技术上继承了GPT-3、LLaMA等的架构设计优点
 The Aquila language model inherits the architectural design advantages of GPT-3 and LLaMA, replacing a batch of more efficient underlying operator implementations and redesigning the tokenizer for Chinese-English bilingual support. It upgrades the BMTrain parallel training method, achieving nearly 8 times the training efficiency of Magtron+DeepSpeed ZeRO-2 in the training process of Aquila. The Aquila language model is trained from scratch on high-quality Chinese and English corpora. Through data quality control and various training optimization methods, it achieves better performance than other open-source models with smaller datasets and shorter training times. It is also the first large-scale open-source language model that supports Chinese-English-Knowledge, commercial licensing, and complies with domestic data regulations.
-## 模型细节/Model details
-|   Model          |  License    | Commercial use?  |  GPU
-| :---------------- | :------- | :-- |:-- |
-| Aquila-7B         | Apache 2.0  |  ✅   | Nvidia-A100  |
-| AquilaCode-7B-NV          | Apache 2.0  |  ✅   |   Nvidia-A100   |
-| AquilaCode-7B-TS           | Apache 2.0  |  ✅    |  Tianshu-BI-V100   |
-| AquilaChat-7B           | Apache 2.0  |  ✅    | Nvidia-A100  |
 我们使用了一系列更高效的底层算子来辅助模型训练，其中包括参考[flash-attention](https://github.com/HazyResearch/flash-attention)的方法并替换了一些中间计算，同时还使用了RMSNorm。在此基础上，我们升级了[BMtrain](https://github.com/OpenBMB/BMTrain)技术进行轻量化的并行训练，该技术采用了数据并行、ZeRO（零冗余优化器）、优化器卸载、检查点和操作融合、通信-计算重叠等方法来优化模型训练过程。
@@ -33,16 +34,18 @@ We used different tokenizers to extract ten thousand data samples from English,
 | 模型/Model | 词表大小/Vocab size | 说明/Note |英文平均tokens量/Avg tokens(English)| 中文平均tokens量/Avg tokens(Chinesse)|代码平均tokens量/Avg tokens(code)  |
 |  -----  | ----  | -----  | ----  | -----  | ----  |
-| gpt2 | 50527 | bpe|1717 | 1764|2323 |
-| llama | 32000 | sp(bpe)|1805| 1257|1970 |
-| gpt2_new_100k | 100000 | bpe|1575 | 477|1679 |
 ## 训练数据集/Training data
-Aquila预训练使用了Pile，[RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T), [Wikipedia](https://huggingface.co/datasets/wikipedia), [C4](https://huggingface.co/datasets/c4), 悟道中文数据集、电子书、专利、百科、论坛, github数据等
-The Aquila-7B model was pretrained on Pile，[RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T), [Wikipedia](https://huggingface.co/datasets/wikipedia), [C4](https://huggingface.co/datasets/c4), Wudao Corpus、e-book、Patent, encyclopedia, forum, github etc.
 ## 使用方式/How to use
 ### 1. 预训练/Pre-training

 The Aquila language model inherits the architectural design advantages of GPT-3 and LLaMA, replacing a batch of more efficient underlying operator implementations and redesigning the tokenizer for Chinese-English bilingual support. It upgrades the BMTrain parallel training method, achieving nearly 8 times the training efficiency of Magtron+DeepSpeed ZeRO-2 in the training process of Aquila. The Aquila language model is trained from scratch on high-quality Chinese and English corpora. Through data quality control and various training optimization methods, it achieves better performance than other open-source models with smaller datasets and shorter training times. It is also the first large-scale open-source language model that supports Chinese-English-Knowledge, commercial licensing, and complies with domestic data regulations.
+|   模型/Model          |  状态/State    | 能否商用/Commercial use?  |  所用显卡/GPU   |
+| :---------------- | :------- | :-- |:-- |
+| Aquila-7B         | 已发布  |   ✅   | Nvidia-A100  |
+| AquilaChat-7B          |已发布  |    ✅    | Nvidia-A100  |
+| AquilaCode-7B-NV          |已发布  |    ✅   |   Nvidia-A100   |
+| AquilaCode-7B-TS           |已发布 |   ✅    |  Tianshu-BI-V100   |
+| Aquila-33B          | **敬请期待**  |   ✅   | Nvidia-A100  |
+| AquilaChat-33B           |**敬请期待**  |    ✅    | Nvidia-A100  |
 我们使用了一系列更高效的底层算子来辅助模型训练，其中包括参考[flash-attention](https://github.com/HazyResearch/flash-attention)的方法并替换了一些中间计算，同时还使用了RMSNorm。在此基础上，我们升级了[BMtrain](https://github.com/OpenBMB/BMTrain)技术进行轻量化的并行训练，该技术采用了数据并行、ZeRO（零冗余优化器）、优化器卸载、检查点和操作融合、通信-计算重叠等方法来优化模型训练过程。
 | 模型/Model | 词表大小/Vocab size | 说明/Note |英文平均tokens量/Avg tokens(English)| 中文平均tokens量/Avg tokens(Chinesse)|代码平均tokens量/Avg tokens(code)  |
 |  -----  | ----  | -----  | ----  | -----  | ----  |
+| GPT2 | 50527 | bpe|1717 | 1764|2323 |
+| LLaMA | 32000 | sp(bpe)|1805| 1257|1970 |
+| Aquila | 100000 | bpe|1575 | 477|1679 |
 ## 训练数据集/Training data
+Aquila预训练使用了Pile，[RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T), [Wikipedia](https://huggingface.co/datasets/wikipedia), [C4](https://huggingface.co/datasets/c4), 悟道中文数据集、电子书、专利、百科、论坛, github数据等, 详情可见下图。
+The Aquila-7B model was pretrained on Pile，[RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T), [Wikipedia](https://huggingface.co/datasets/wikipedia), [C4](https://huggingface.co/datasets/c4), Wudao Corpus、e-book、Patent, encyclopedia, forum, github etc. Details are given in the figure below.
+![Screenshot](./img/data_dist.png)
 ## 使用方式/How to use
 ### 1. 预训练/Pre-training