BAAI
/

Anhforth commited on
Commit
977be54
1 Parent(s): f8a2e45

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -12
README.md CHANGED
@@ -9,13 +9,14 @@ Aquila语言大模型在技术上继承了GPT-3、LLaMA等的架构设计优点
9
 
10
  The Aquila language model inherits the architectural design advantages of GPT-3 and LLaMA, replacing a batch of more efficient underlying operator implementations and redesigning the tokenizer for Chinese-English bilingual support. It upgrades the BMTrain parallel training method, achieving nearly 8 times the training efficiency of Magtron+DeepSpeed ZeRO-2 in the training process of Aquila. The Aquila language model is trained from scratch on high-quality Chinese and English corpora. Through data quality control and various training optimization methods, it achieves better performance than other open-source models with smaller datasets and shorter training times. It is also the first large-scale open-source language model that supports Chinese-English-Knowledge, commercial licensing, and complies with domestic data regulations.
11
 
12
- ## 模型细节/Model details
13
- | Model | License | Commercial use? | GPU
14
- | :---------------- | :------- | :-- |:-- |
15
- | Aquila-7B | Apache 2.0 | | Nvidia-A100 |
16
- | AquilaCode-7B-NV | Apache 2.0 | ✅ | Nvidia-A100 |
17
- | AquilaCode-7B-TS | Apache 2.0 | ✅ | Tianshu-BI-V100 |
18
- | AquilaChat-7B | Apache 2.0 | | Nvidia-A100 |
 
19
 
20
 
21
  我们使用了一系列更高效的底层算子来辅助模型训练,其中包括参考[flash-attention](https://github.com/HazyResearch/flash-attention)的方法并替换了一些中间计算,同时还使用了RMSNorm。在此基础上,我们升级了[BMtrain](https://github.com/OpenBMB/BMTrain)技术进行轻量化的并行训练,该技术采用了数据并行、ZeRO(零冗余优化器)、优化器卸载、检查点和操作融合、通信-计算重叠等方法来优化模型训练过程。
@@ -33,16 +34,18 @@ We used different tokenizers to extract ten thousand data samples from English,
33
 
34
  | 模型/Model | 词表大小/Vocab size | 说明/Note |英文平均tokens量/Avg tokens(English)| 中文平均tokens量/Avg tokens(Chinesse)|代码平均tokens量/Avg tokens(code) |
35
  | ----- | ---- | ----- | ---- | ----- | ---- |
36
- | gpt2 | 50527 | bpe|1717 | 1764|2323 |
37
- | llama | 32000 | sp(bpe)|1805| 1257|1970 |
38
- | gpt2_new_100k | 100000 | bpe|1575 | 477|1679 |
39
 
40
 
41
 
42
  ## 训练数据集/Training data
43
- Aquila预训练使用了Pile,[RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T), [Wikipedia](https://huggingface.co/datasets/wikipedia), [C4](https://huggingface.co/datasets/c4), 悟道中文数据集、电子书、专利、百科、论坛, github数据等
 
 
 
44
 
45
- The Aquila-7B model was pretrained on Pile,[RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T), [Wikipedia](https://huggingface.co/datasets/wikipedia), [C4](https://huggingface.co/datasets/c4), Wudao Corpus、e-book、Patent, encyclopedia, forum, github etc.
46
  ## 使用方式/How to use
47
 
48
  ### 1. 预训练/Pre-training
 
9
 
10
  The Aquila language model inherits the architectural design advantages of GPT-3 and LLaMA, replacing a batch of more efficient underlying operator implementations and redesigning the tokenizer for Chinese-English bilingual support. It upgrades the BMTrain parallel training method, achieving nearly 8 times the training efficiency of Magtron+DeepSpeed ZeRO-2 in the training process of Aquila. The Aquila language model is trained from scratch on high-quality Chinese and English corpora. Through data quality control and various training optimization methods, it achieves better performance than other open-source models with smaller datasets and shorter training times. It is also the first large-scale open-source language model that supports Chinese-English-Knowledge, commercial licensing, and complies with domestic data regulations.
11
 
12
+ | 模型/Model | 状态/State | 能否商用/Commercial use? | 所用显卡/GPU |
13
+ | :---------------- | :------- | :-- |:-- |
14
+ | Aquila-7B | 已发布 | ✅ | Nvidia-A100 |
15
+ | AquilaChat-7B |已发布 | | Nvidia-A100 |
16
+ | AquilaCode-7B-NV |已发布 | ✅ | Nvidia-A100 |
17
+ | AquilaCode-7B-TS |已发布 | ✅ | Tianshu-BI-V100 |
18
+ | Aquila-33B | **敬请期待** | | Nvidia-A100 |
19
+ | AquilaChat-33B |**敬请期待** | ✅ | Nvidia-A100 |
20
 
21
 
22
  我们使用了一系列更高效的底层算子来辅助模型训练,其中包括参考[flash-attention](https://github.com/HazyResearch/flash-attention)的方法并替换了一些中间计算,同时还使用了RMSNorm。在此基础上,我们升级了[BMtrain](https://github.com/OpenBMB/BMTrain)技术进行轻量化的并行训练,该技术采用了数据并行、ZeRO(零冗余优化器)、优化器卸载、检查点和操作融合、通信-计算重叠等方法来优化模型训练过程。
 
34
 
35
  | 模型/Model | 词表大小/Vocab size | 说明/Note |英文平均tokens量/Avg tokens(English)| 中文平均tokens量/Avg tokens(Chinesse)|代码平均tokens量/Avg tokens(code) |
36
  | ----- | ---- | ----- | ---- | ----- | ---- |
37
+ | GPT2 | 50527 | bpe|1717 | 1764|2323 |
38
+ | LLaMA | 32000 | sp(bpe)|1805| 1257|1970 |
39
+ | Aquila | 100000 | bpe|1575 | 477|1679 |
40
 
41
 
42
 
43
  ## 训练数据集/Training data
44
+ Aquila预训练使用了Pile,[RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T), [Wikipedia](https://huggingface.co/datasets/wikipedia), [C4](https://huggingface.co/datasets/c4), 悟道中文数据集、电子书、专利、百科、论坛, github数据等, 详情可见下图。
45
+
46
+ The Aquila-7B model was pretrained on Pile,[RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T), [Wikipedia](https://huggingface.co/datasets/wikipedia), [C4](https://huggingface.co/datasets/c4), Wudao Corpus、e-book、Patent, encyclopedia, forum, github etc. Details are given in the figure below.
47
+ ![Screenshot](./img/data_dist.png)
48
 
 
49
  ## 使用方式/How to use
50
 
51
  ### 1. 预训练/Pre-training