pratham-simpli commited on
Commit
ea3321d
1 Parent(s): d73c673

config changed

Browse files
LICENSE ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ The CogVLM License
2
+
3
+ 1. Definitions
4
+
5
+ “Licensor” means the CogVLM Model Team that distributes its Software.
6
+
7
+ “Software” means the CogVLM model parameters made available under this license.
8
+
9
+ 2. License Grant
10
+
11
+ Under the terms and conditions of this license, the Licensor hereby grants you a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty-free copyright license.
12
+ This license permits you to use all open-source models in this repository for academic research free. Users who wish to use the models for commercial purposes must register [here](https://open.bigmodel.cn/mla/form).
13
+ Registered users may use the models for commercial activities free of charge, but must comply with all terms and conditions of this license.
14
+ The license notice shall be included in all copies or substantial portions of the Software.
15
+
16
+ 3. Restriction
17
+
18
+ You will not use, copy, modify, merge, publish, distribute, reproduce, or create derivative works of the Software, in whole or in part, for any military, or illegal purposes.
19
+
20
+ You will not use the Software for any act that may undermine China's national security and national unity, harm the public interest of society, or infringe upon the rights and interests of human beings.
21
+
22
+ 4. Disclaimer
23
+
24
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
25
+
26
+ 5. Limitation of Liability
27
+
28
+ EXCEPT TO THE EXTENT PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL THEORY, WHETHER BASED IN TORT, NEGLIGENCE, CONTRACT, LIABILITY, OR OTHERWISE WILL ANY LICENSOR BE LIABLE TO YOU FOR ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES, OR ANY OTHER COMMERCIAL LOSSES, EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
29
+
30
+ 6. Dispute Resolution
31
+
32
+ This license shall be governed and construed in accordance with the laws of People’s Republic of China. Any dispute arising from or in connection with this License shall be submitted to Haidian District People's Court in Beijing.
33
+
34
+ Note that the license is subject to update to a more comprehensive version. For any questions related to the license and copyright, please contact us at [email protected].
35
+
36
+ 7. Llama3 and EVA-CLIP2 License
37
+
38
+ For the CogVLM2 open source model based on the LLama3 series model as the base model, the Llama3 license conditions (https://llama.meta.com/llama3/license/, a copy of this repository license conditions) and the EVA-CLIP2 license conditions (MIT , https://github.com/baaivision/EVA/blob/master/LICENSE) for model weights.
39
+
40
+ 1. 定义
41
+
42
+ “许可方”是指分发其软件的 CogVLM 模型团队。
43
+
44
+ “软件”是指根据本许可提供的 CogVLM 模型参数。
45
+
46
+ 2. 许可授予
47
+
48
+ 根据本许可的条款和条件,许可方特此授予您非排他性、全球性、不可转让、不可再许可、可撤销、免版税的版权许可。
49
+ 本许可允许您免费使用本仓库中的所有开源模型进行学术研究,对于希望将模型用于商业目的的用户,需在[这里](https://open.bigmodel.cn/mla/form)完成登记。
50
+ 经过登记的用户可以免费使用本模型进行商业活动,但必须遵守本许可的所有条款和条件。
51
+ 上述版权声明和本许可声明应包含在本软件的所有副本或重要部分中。
52
+
53
+ 3.限制
54
+
55
+ 您不得出于任何军事或非法目的使用、复制、修改、合并、发布、分发、复制或创建本软件的全部或部分衍生作品。
56
+
57
+ 您不得利用本软件从事任何危害国家安全和国家统一、危害社会公共利益、侵犯人身权益的行为。
58
+
59
+ 4.免责声明
60
+
61
+ 本软件“按原样”提供,不提供任何明示或暗示的保证,包括但不限于对适销性、特定用途的适用性和非侵权性的保证。 在任何情况下,作者或版权持有人均不对任何索赔、损害或其他责任负责,无论是在合同诉讼、侵权行为还是其他方面,由软件或软件的使用或其他交易引起、由软件引起或与之相关 软件。
62
+
63
+ 5. 责任限制
64
+
65
+ 除适用法律禁止的范围外,在任何情况下且根据任何法律理论,无论是基于侵权行为、疏忽、合同、责任或其他原因,任何许可方均不对您承担任何直接、间接、特殊、偶然、示范性、 或间接损害,或任何其他商业损失,即使许可人已被告知此类损害的可能性。
66
+
67
+ 6.争议解决
68
+
69
+ 本许可受中华人民共和国法律管辖并按其解释。 因本许可引起的或与本许可有关的任何争议应提交北京市海淀区人民法院。
70
+
71
+ 请注意,许可证可能会更新到更全面的版本。 有关许可和版权的任何问题,请���过 [email protected] 与我们联系。
72
+
73
+ 7. Llama3 和 EVA-CLIP2 许可
74
+
75
+ 针对基于以 LLama3 系列模型作为基座模型的 CogVLM2 开源模型, Llama3 许可条件 (https://llama.meta.com/llama3/license/ ,本仓库副本一份许可条件) 和 EVA-CLIP2 许可条件 (MIT, https://github.com/baaivision/EVA/blob/master/LICENSE) 适用于模型权重。
LLAMA3_LICENSE ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ META LLAMA 3 COMMUNITY LICENSE AGREEMENT
2
+ Meta Llama 3 Version Release Date: April 18, 2024
3
+
4
+ “Agreement” means the terms and conditions for use, reproduction, distribution and modification of the
5
+ Llama Materials set forth herein.
6
+
7
+ “Documentation” means the specifications, manuals and documentation accompanying Meta Llama 3
8
+ distributed by Meta at https://llama.meta.com/get-started/.
9
+
10
+ “Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into
11
+ this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or
12
+ regulations to provide legal consent and that has legal authority to bind your employer or such other
13
+ person or entity if you are entering in this Agreement on their behalf.
14
+
15
+ “Meta Llama 3” means the foundational large language models and software and algorithms, including
16
+ machine-learning model code, trained model weights, inference-enabling code, training-enabling code,
17
+ fine-tuning enabling code and other elements of the foregoing distributed by Meta at
18
+ https://llama.meta.com/llama-downloads.
19
+
20
+ “Llama Materials” means, collectively, Meta’s proprietary Meta Llama 3 and Documentation (and any
21
+ portion thereof) made available under this Agreement.
22
+
23
+ “Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your
24
+ principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located
25
+ outside of the EEA or Switzerland).
26
+
27
+ By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials,
28
+ you agree to be bound by this Agreement.
29
+
30
+ 1. License Rights and Redistribution.
31
+
32
+ a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free
33
+ limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama
34
+ Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the
35
+ Llama Materials.
36
+
37
+ b. Redistribution and Use.
38
+
39
+ i. If you distribute or make available the Llama Materials (or any derivative works
40
+ thereof), or a product or service that uses any of them, including another AI model, you shall (A) provide
41
+ a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with Meta
42
+ Llama 3” on a related website, user interface, blogpost, about page, or product documentation. If you
43
+ use the Llama Materials to create, train, fine tune, or otherwise improve an AI model, which is
44
+ distributed or made available, you shall also include “Llama 3” at the beginning of any such AI model
45
+ name.
46
+
47
+ ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part
48
+ of an integrated end user product, then Section 2 of this Agreement will not apply to you.
49
+
50
+ iii. You must retain in all copies of the Llama Materials that you distribute the following
51
+ attribution notice within a “Notice” text file distributed as a part of such copies: “Meta Llama 3 is
52
+ licensed under the Meta Llama 3 Community License, Copyright © Meta Platforms, Inc. All Rights
53
+ Reserved.”
54
+
55
+ iv. Your use of the Llama Materials must comply with applicable laws and regulations
56
+ (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama
57
+ Materials (available at https://llama.meta.com/llama3/use-policy), which is hereby incorporated by
58
+ reference into this Agreement.
59
+
60
+ v. You will not use the Llama Materials or any output or results of the Llama Materials to
61
+ improve any other large language model (excluding Meta Llama 3 or derivative works thereof).
62
+
63
+ 2. Additional Commercial Terms. If, on the Meta Llama 3 version release date, the monthly active users
64
+ of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700
65
+ million monthly active users in the preceding calendar month, you must request a license from Meta,
66
+ which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the
67
+ rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
68
+
69
+ 3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY
70
+ OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF
71
+ ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED,
72
+ INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT,
73
+ MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR
74
+ DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND
75
+ ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND
76
+ RESULTS.
77
+
78
+ 4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF
79
+ LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING
80
+ OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL,
81
+ INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED
82
+ OF THE POSSIBILITY OF ANY OF THE FOREGOING.
83
+
84
+ 5. Intellectual Property.
85
+
86
+ a. No trademark licenses are granted under this Agreement, and in connection with the Llama
87
+ Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other
88
+ or any of its affiliates, except as required for reasonable and customary use in describing and
89
+ redistributing the Llama Materials or as set forth in this Section 5(a). Meta hereby grants you a license to
90
+ use “Llama 3” (the “Mark”) solely as required to comply with the last sentence of Section 1.b.i. You will
91
+ comply with Meta’s brand guidelines (currently accessible at
92
+ https://about.meta.com/brand/resources/meta/company-brand/ ). All goodwill arising out of your use
93
+ of the Mark will inure to the benefit of Meta.
94
+
95
+ b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with
96
+ respect to any derivative works and modifications of the Llama Materials that are made by you, as
97
+ between you and Meta, you are and will be the owner of such derivative works and modifications.
98
+
99
+ c. If you institute litigation or other proceedings against Meta or any entity (including a
100
+ cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Meta Llama 3 outputs or
101
+ results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other
102
+ rights owned or licensable by you, then any licenses granted to you under this Agreement shall
103
+ terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold
104
+ harmless Meta from and against any claim by any third party arising out of or related to your use or
105
+ distribution of the Llama Materials.
106
+
107
+ 6. Term and Termination. The term of this Agreement will commence upon your acceptance of this
108
+ Agreement or access to the Llama Materials and will continue in full force and effect until terminated in
109
+ accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in
110
+ breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete
111
+ and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this
112
+ Agreement.
113
+
114
+ 7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of
115
+ the State of California without regard to choice of law principles, and the UN Convention on Contracts
116
+ for the International Sale of Goods does not apply to this Agreement. The courts of California shall have
117
+ exclusive jurisdiction of any dispute arising out of this Agreement.
README.md ADDED
@@ -0,0 +1,170 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: cogvlm2
4
+ license_link: https://huggingface.co/THUDM/cogvlm2-llama3-chat-19B/blob/main/LICENSE
5
+
6
+ language:
7
+ - en
8
+ pipeline_tag: text-generation
9
+ tags:
10
+ - chat
11
+ - cogvlm2
12
+
13
+ inference: false
14
+ ---
15
+
16
+ # CogVLM2
17
+
18
+ <div align="center">
19
+ <img src=https://raw.githubusercontent.com/THUDM/CogVLM2/53d5d5ea1aa8d535edffc0d15e31685bac40f878/resources/logo.svg width="40%"/>
20
+ </div>
21
+ <p align="center">
22
+ 👋 <a href="resources/WECHAT.md" target="_blank">Wechat</a> · 💡<a href="http://36.103.203.44:7861/" target="_blank">Online Demo</a> · 🎈<a href="https://github.com/THUDM/CogVLM2" target="_blank">Github Page</a>
23
+ </p>
24
+ <p align="center">
25
+ 📍Experience the larger-scale CogVLM model on the <a href="https://open.bigmodel.cn/dev/api#glm-4v">ZhipuAI Open Platform</a>.
26
+ </p>
27
+
28
+
29
+ ## Model introduction
30
+
31
+ We launch a new generation of **CogVLM2** series of models and open source two models built with [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct). Compared with the previous generation of CogVLM open source models, the CogVLM2 series of open source models have the following improvements:
32
+
33
+ 1. Significant improvements in many benchmarks such as `TextVQA`, `DocVQA`.
34
+ 2. Support **8K** content length.
35
+ 3. Support image resolution up to **1344 * 1344**.
36
+ 4. Provide an open source model version that supports both **Chinese and English**.
37
+
38
+ You can see the details of the CogVLM2 family of open source models in the table below:
39
+
40
+ | Model name | cogvlm2-llama3-chat-19B | cogvlm2-llama3-chinese-chat-19B |
41
+ |------------------|-------------------------------------|-------------------------------------|
42
+ | Base Model | Meta-Llama-3-8B-Instruct | Meta-Llama-3-8B-Instruct |
43
+ | Language | English | Chinese, English |
44
+ | Model size | 19B | 19B |
45
+ | Task | Image understanding, dialogue model | Image understanding, dialogue model |
46
+ | Text length | 8K | 8K |
47
+ | Image resolution | 1344 * 1344 | 1344 * 1344 |
48
+
49
+ ## Benchmark
50
+
51
+ Our open source models have achieved good results in many lists compared to the previous generation of CogVLM open source models. Its excellent performance can compete with some non-open source models, as shown in the table below:
52
+
53
+ | Model | Open Source | LLM Size | TextVQA | DocVQA | ChartQA | OCRbench | MMMU | MMVet | MMBench |
54
+ |--------------------------------|-------------|----------|----------|----------|----------|----------|----------|----------|----------|
55
+ | CogVLM1.1 | ✅ | 7B | 69.7 | - | 68.3 | 590 | 37.3 | 52.0 | 65.8 |
56
+ | LLaVA-1.5 | ✅ | 13B | 61.3 | - | - | 337 | 37.0 | 35.4 | 67.7 |
57
+ | Mini-Gemini | ✅ | 34B | 74.1 | - | - | - | 48.0 | 59.3 | 80.6 |
58
+ | LLaVA-NeXT-LLaMA3 | ✅ | 8B | - | 78.2 | 69.5 | - | 41.7 | - | 72.1 |
59
+ | LLaVA-NeXT-110B | ✅ | 110B | - | 85.7 | 79.7 | - | 49.1 | - | 80.5 |
60
+ | InternVL-1.5 | ✅ | 20B | 80.6 | 90.9 | **83.8** | 720 | 46.8 | 55.4 | **82.3** |
61
+ | QwenVL-Plus | ❌ | - | 78.9 | 91.4 | 78.1 | 726 | 51.4 | 55.7 | 67.0 |
62
+ | Claude3-Opus | ❌ | - | - | 89.3 | 80.8 | 694 | **59.4** | 51.7 | 63.3 |
63
+ | Gemini Pro 1.5 | ❌ | - | 73.5 | 86.5 | 81.3 | - | 58.5 | - | - |
64
+ | GPT-4V | ❌ | - | 78.0 | 88.4 | 78.5 | 656 | 56.8 | **67.7** | 75.0 |
65
+ | CogVLM2-LLaMA3 (Ours) | ✅ | 8B | 84.2 | **92.3** | 81.0 | 756 | 44.3 | 60.4 | 80.5 |
66
+ | CogVLM2-LLaMA3-Chinese (Ours) | ✅ | 8B | **85.0** | 88.4 | 74.7 | **780** | 42.8 | 60.5 | 78.9 |
67
+
68
+ All reviews were obtained without using any external OCR tools ("pixel only").
69
+ ## Quick Start
70
+
71
+ here is a simple example of how to use the model to chat with the CogVLM2 model. For More use case. Find in our [github](https://github.com/THUDM/CogVLM2)
72
+ ```python
73
+ import torch
74
+ from PIL import Image
75
+ from transformers import AutoModelForCausalLM, AutoTokenizer
76
+
77
+ MODEL_PATH = "THUDM/cogvlm2-llama3-chat-19B"
78
+ DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
79
+ TORCH_TYPE = torch.bfloat16 if torch.cuda.is_available() and torch.cuda.get_device_capability()[0] >= 8 else torch.float16
80
+
81
+ tokenizer = AutoTokenizer.from_pretrained(
82
+ MODEL_PATH,
83
+ trust_remote_code=True
84
+ )
85
+ model = AutoModelForCausalLM.from_pretrained(
86
+ MODEL_PATH,
87
+ torch_dtype=TORCH_TYPE,
88
+ trust_remote_code=True,
89
+ ).to(DEVICE).eval()
90
+
91
+ text_only_template = "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {} ASSISTANT:"
92
+
93
+ while True:
94
+ image_path = input("image path >>>>> ")
95
+ if image_path == '':
96
+ print('You did not enter image path, the following will be a plain text conversation.')
97
+ image = None
98
+ text_only_first_query = True
99
+ else:
100
+ image = Image.open(image_path).convert('RGB')
101
+
102
+ history = []
103
+
104
+ while True:
105
+ query = input("Human:")
106
+ if query == "clear":
107
+ break
108
+
109
+ if image is None:
110
+ if text_only_first_query:
111
+ query = text_only_template.format(query)
112
+ text_only_first_query = False
113
+ else:
114
+ old_prompt = ''
115
+ for _, (old_query, response) in enumerate(history):
116
+ old_prompt += old_query + " " + response + "\n"
117
+ query = old_prompt + "USER: {} ASSISTANT:".format(query)
118
+ if image is None:
119
+ input_by_model = model.build_conversation_input_ids(
120
+ tokenizer,
121
+ query=query,
122
+ history=history,
123
+ template_version='chat'
124
+ )
125
+ else:
126
+ input_by_model = model.build_conversation_input_ids(
127
+ tokenizer,
128
+ query=query,
129
+ history=history,
130
+ images=[image],
131
+ template_version='chat'
132
+ )
133
+ inputs = {
134
+ 'input_ids': input_by_model['input_ids'].unsqueeze(0).to(DEVICE),
135
+ 'token_type_ids': input_by_model['token_type_ids'].unsqueeze(0).to(DEVICE),
136
+ 'attention_mask': input_by_model['attention_mask'].unsqueeze(0).to(DEVICE),
137
+ 'images': [[input_by_model['images'][0].to(DEVICE).to(TORCH_TYPE)]] if image is not None else None,
138
+ }
139
+ gen_kwargs = {
140
+ "max_new_tokens": 2048,
141
+ "pad_token_id": 128002,
142
+ }
143
+ with torch.no_grad():
144
+ outputs = model.generate(**inputs, **gen_kwargs)
145
+ outputs = outputs[:, inputs['input_ids'].shape[1]:]
146
+ response = tokenizer.decode(outputs[0])
147
+ response = response.split("<|end_of_text|>")[0]
148
+ print("\nCogVLM2:", response)
149
+ history.append((query, response))
150
+ ```
151
+
152
+
153
+ ## License
154
+
155
+ This model is released under the CogVLM2 [LICENSE](LICENSE). For models built with Meta Llama 3, please also adhere to the [LLAMA3_LICENSE](LLAMA3_LICENSE).
156
+
157
+ ## Citation
158
+
159
+ If you find our work helpful, please consider citing the following papers
160
+
161
+ ```
162
+ @misc{wang2023cogvlm,
163
+ title={CogVLM: Visual Expert for Pretrained Language Models},
164
+ author={Weihan Wang and Qingsong Lv and Wenmeng Yu and Wenyi Hong and Ji Qi and Yan Wang and Junhui Ji and Zhuoyi Yang and Lei Zhao and Xixuan Song and Jiazheng Xu and Bin Xu and Juanzi Li and Yuxiao Dong and Ming Ding and Jie Tang},
165
+ year={2023},
166
+ eprint={2311.03079},
167
+ archivePrefix={arXiv},
168
+ primaryClass={cs.CV}
169
+ }
170
+ ```
config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "CogVLMForCausalLM"
4
+ ],
5
+ "auto_map": {
6
+ "AutoConfig": "configuration_cogvlm.CogVLMConfig",
7
+ "AutoModelForCausalLM": "modeling_cogvlm.CogVLMForCausalLM"
8
+ },
9
+ "vision_config": {
10
+ "dropout_prob": 0.0,
11
+ "hidden_act": "gelu",
12
+ "in_channels": 3,
13
+ "num_hidden_layers": 63,
14
+ "hidden_size": 1792,
15
+ "patch_size": 14,
16
+ "num_heads": 16,
17
+ "intermediate_size": 15360,
18
+ "layer_norm_eps": 1e-06,
19
+ "num_positions": 9217,
20
+ "image_size": 1344
21
+ },
22
+ "hidden_size": 4096,
23
+ "intermediate_size": 14336,
24
+ "num_attention_heads": 32,
25
+ "max_position_embeddings": 8192,
26
+ "rms_norm_eps": 1e-05,
27
+ "template_version": "chat",
28
+ "initializer_range": 0.02,
29
+ "bos_token_id": 128000,
30
+ "eos_token_id": [128001, 128009],
31
+ "pad_token_id": 128002,
32
+ "vocab_size": 128256,
33
+ "num_hidden_layers": 32,
34
+ "hidden_act": "silu",
35
+ "use_cache": true,
36
+ "transformers_version": "4.41.0"
37
+ }
configuration.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"framework":"Pytorch","task":"text-generation"}
configuration_cogvlm.py ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Literal
2
+ from transformers import PretrainedConfig
3
+
4
+
5
+ class CogVLMConfig(PretrainedConfig):
6
+ _auto_class = "AutoConfig"
7
+
8
+ def __init__(
9
+ self,
10
+ vocab_size=128256,
11
+ hidden_size=4096,
12
+ intermediate_size=14336,
13
+ num_hidden_layers=32,
14
+ num_attention_heads=32,
15
+ num_multi_query_heads=8,
16
+ hidden_act='silu',
17
+ max_position_embeddings=8192,
18
+ initializer_range=0.02,
19
+ rms_norm_eps=1e-05,
20
+ template_version: Literal["base", "chat"] = "chat",
21
+ bos_token_id=128000,
22
+ eos_token_id=128001,
23
+ tie_word_embeddings=False,
24
+ use_cache=True,
25
+ **kwargs,
26
+ ):
27
+ self.hidden_size = hidden_size
28
+ self.intermediate_size = intermediate_size
29
+ self.num_attention_heads = num_attention_heads
30
+ self.num_multi_query_heads = num_multi_query_heads
31
+ self.max_position_embeddings = max_position_embeddings
32
+ self.rms_norm_eps = rms_norm_eps
33
+ self.initializer_range = initializer_range
34
+ self.vocab_size = vocab_size
35
+ self.num_hidden_layers = num_hidden_layers
36
+ self.hidden_act = hidden_act
37
+ self.template_version = template_version
38
+ self.use_cache = use_cache
39
+ super().__init__(
40
+ bos_token_id=bos_token_id,
41
+ eos_token_id=eos_token_id,
42
+ tie_word_embeddings=tie_word_embeddings,
43
+ **kwargs,
44
+ )
generation_config.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 128000,
4
+ "eos_token_id": [128001, 128009],
5
+ "pad_token_id": 128002,
6
+ "do_sample": true,
7
+ "temperature": 0.6,
8
+ "max_length": 4096,
9
+ "top_p": 0.9,
10
+ "transformers_version": "4.40.2"
11
+ }
model-00001-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:49c4d09ecd71e4137eb15c1c335d69d7675edec0e1c6e2648a0e6489143a1b78
3
+ size 4943123192
model-00002-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5453120bd87f99611d18a0e5be0091edf89561e598fa0a8632c8988d81063626
3
+ size 4999793608
model-00003-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a2cbba9a74961627784f6c31fabffe5c8a7eef4db765656430c514be56a200aa
3
+ size 4949432336
model-00004-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:560980479fe9977f7386c53604fb3bad76fcfc00de2864c3a9da7596fa266f1c
3
+ size 4999793680
model-00005-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4e25bb2e14b9691b85b829765d6504b1c0664ab5401702879e18f7c0ac535397
3
+ size 4999793688
model-00006-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f9becad0534abfafd1c0cf1e5a63640398a776f6672f51fc69fd3fe6a9b1e336
3
+ size 4953020048
model-00007-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:79a3baa4118f2a9f5597fd202382cb68c7e84e70c351c2e8db9ab068873d98e6
3
+ size 4945866640
model-00008-of-00008.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0c08dc31192041bf7a2b2d5fb5a885659a7aad447dc2c27dcea311dcf43ad4f5
3
+ size 4215550536
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
modeling_cogvlm.py ADDED
@@ -0,0 +1,837 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """largely copy from llama and adapt for cogvlm"""
2
+ import warnings
3
+ from typing import TYPE_CHECKING, Optional, Tuple, List, Union, Literal, Dict, Any
4
+
5
+ import math
6
+ import torch
7
+ from torch import nn
8
+ from torch.nn import CrossEntropyLoss
9
+ from torchvision import transforms
10
+ from einops import rearrange
11
+ from torch.utils.checkpoint import checkpoint
12
+
13
+ from transformers import PreTrainedModel, PreTrainedTokenizer
14
+ from transformers.utils.logging import get_logger
15
+ from transformers.activations import ACT2FN
16
+ from transformers.modeling_outputs import BaseModelOutputWithPast, CausalLMOutputWithPast
17
+
18
+ from .configuration_cogvlm import CogVLMConfig
19
+ from .util import FastRotaryEmbedding
20
+ from .visual import EVA2CLIPModel
21
+
22
+ if TYPE_CHECKING:
23
+ from transformers.utils import ModelOutput
24
+
25
+ logger = get_logger(__name__)
26
+
27
+ LANGUAGE_TOKEN_TYPE = 0
28
+ VISION_TOKEN_TYPE = 1
29
+
30
+
31
+ # Copied from transformers.models.bart.modeling_bart._make_causal_mask
32
+ def _make_causal_mask(
33
+ input_ids_shape: torch.Size, dtype: torch.dtype, device: torch.device, past_key_values_length: int = 0
34
+ ):
35
+ """
36
+ Make causal mask used for bi-directional self-attention.
37
+ """
38
+ bsz, tgt_len = input_ids_shape
39
+ mask = torch.full((tgt_len, tgt_len), torch.finfo(dtype).min, device=device)
40
+ mask_cond = torch.arange(mask.size(-1), device=device)
41
+ mask.masked_fill_(mask_cond < (mask_cond + 1).view(mask.size(-1), 1), 0)
42
+ mask = mask.to(dtype)
43
+
44
+ if past_key_values_length > 0:
45
+ mask = torch.cat([torch.zeros(tgt_len, past_key_values_length, dtype=dtype, device=device), mask], dim=-1)
46
+ return mask[None, None, :, :].expand(bsz, 1, tgt_len, tgt_len + past_key_values_length)
47
+
48
+
49
+ # Copied from transformers.models.bart.modeling_bart._expand_mask
50
+ def _expand_mask(mask: torch.Tensor, dtype: torch.dtype, tgt_len: Optional[int] = None):
51
+ """
52
+ Expands attention_mask from `[bsz, seq_len]` to `[bsz, 1, tgt_seq_len, src_seq_len]`.
53
+ """
54
+ bsz, src_len = mask.size()
55
+ tgt_len = tgt_len if tgt_len is not None else src_len
56
+
57
+ expanded_mask = mask[:, None, None, :].expand(bsz, 1, tgt_len, src_len).to(dtype)
58
+
59
+ inverted_mask = 1.0 - expanded_mask
60
+
61
+ return inverted_mask.masked_fill(inverted_mask.to(torch.bool), torch.finfo(dtype).min)
62
+
63
+
64
+ class RMSNorm(nn.Module):
65
+ def __init__(self, hidden_size, eps=1e-5):
66
+ super().__init__()
67
+ self.weight = nn.Parameter(torch.ones(hidden_size))
68
+ self.variance_epsilon = eps
69
+
70
+ def forward(self, hidden_states):
71
+ input_dtype = hidden_states.dtype
72
+ hidden_states = hidden_states.to(torch.float32)
73
+ variance = hidden_states.pow(2).mean(-1, keepdim=True)
74
+ hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)
75
+ return (self.weight * hidden_states).to(input_dtype)
76
+
77
+
78
+ class MLP(nn.Module):
79
+ def __init__(self, config):
80
+ super().__init__()
81
+ self.hidden_size = config.hidden_size
82
+ self.intermediate_size = config.intermediate_size
83
+ self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
84
+ self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
85
+ self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
86
+ self.act_fn = ACT2FN[config.hidden_act]
87
+
88
+ def forward(self, x):
89
+ down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
90
+ return down_proj
91
+
92
+
93
+ def get_expert_mask(token_type_ids: "torch.LongTensor(B, L)") -> "[torch.BoolTensor(B, L), torch.BoolTensor(B, L)]":
94
+ vision_token_mask = torch.zeros_like(token_type_ids, dtype=torch.bool)
95
+ vision_token_mask[:, :-1] = (token_type_ids[:, :-1] == VISION_TOKEN_TYPE) & (token_type_ids[:, 1:] == VISION_TOKEN_TYPE)
96
+ language_token_mask = ~vision_token_mask
97
+ return vision_token_mask, language_token_mask
98
+
99
+
100
+ class VisionExpertMLP(nn.Module):
101
+ def __init__(self, config):
102
+ super().__init__()
103
+ self.language_mlp = MLP(config)
104
+ self.vision_mlp = MLP(config)
105
+
106
+ def forward(self, hidden_states: "torch.Tensor(B, L, D)", token_type_ids: "torch.LongTensor(B, L)"):
107
+ output = torch.empty(hidden_states.shape, dtype=hidden_states.dtype, device=hidden_states.device)
108
+ vision_token_mask, language_token_mask = get_expert_mask(token_type_ids)
109
+ output[vision_token_mask] = self.vision_mlp(hidden_states[vision_token_mask])
110
+ output[language_token_mask] = self.language_mlp(hidden_states[language_token_mask])
111
+ return output
112
+
113
+
114
+ def attention_fn(
115
+ query_layer: "torch.tensor(B, H, L, HD)",
116
+ key_layer: "torch.tensor(B, H, L, HD)",
117
+ value_layer: "torch.tensor(B, H, L, HD)",
118
+ attention_mask: "torch.tensor(B, H, L, HD)",
119
+ *,
120
+ scaling_attention_score: bool = True,
121
+ attention_dropout: nn.Module = None
122
+ ):
123
+ attention_mask_bool = (attention_mask == 0)
124
+ is_low_triangle = (attention_mask_bool == torch.ones_like(attention_mask_bool, dtype=torch.float).tril()).all()
125
+ is_full = (attention_mask_bool > 0).all()
126
+ if not (int(torch.__version__.split('.')[0]) >= 2):
127
+ warnings.warn("It's recommended to use torch2.0 or higher.")
128
+ if int(torch.__version__.split('.')[0]) >= 2 and scaling_attention_score and (is_full or is_low_triangle):
129
+ dropout_p = 0. if attention_dropout is None or not attention_dropout.training else attention_dropout.p
130
+ return torch.nn.functional.scaled_dot_product_attention(
131
+ query_layer, key_layer, value_layer,
132
+ attn_mask=None,
133
+ dropout_p=dropout_p,
134
+ is_causal=not is_full
135
+ )
136
+ else:
137
+ if scaling_attention_score:
138
+ query_layer = query_layer / math.sqrt(query_layer.shape[-1])
139
+ attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
140
+ attention_scores = attention_scores + attention_mask
141
+ attention_scores = nn.functional.softmax(attention_scores, dim=-1, dtype=torch.float32).to(query_layer.dtype)
142
+ if attention_dropout is not None:
143
+ attention_scores = attention_dropout(attention_scores)
144
+ context_layer = torch.matmul(attention_scores, value_layer)
145
+ return context_layer
146
+
147
+
148
+ class VisionExpertAttention(nn.Module):
149
+ def __init__(self, config):
150
+ super().__init__()
151
+ self.config = config
152
+ self.hidden_size = config.hidden_size
153
+ self.num_attention_heads = config.num_attention_heads
154
+ self.num_multi_query_heads = config.num_multi_query_heads
155
+ self.hidden_size_per_attention_head = self.hidden_size // self.num_attention_heads
156
+ self.stride = [self.num_attention_heads, self.num_multi_query_heads, self.num_multi_query_heads]
157
+ self.qkv_size = self.hidden_size + self.hidden_size_per_attention_head * self.num_multi_query_heads * 2
158
+ self.head_dim = self.hidden_size // self.num_attention_heads
159
+ self.max_position_embeddings = config.max_position_embeddings
160
+ self.rotary_emb = FastRotaryEmbedding(dim=self.head_dim, pos_idx_in_fp32=False, base=500000)
161
+ self.vision_expert_query_key_value = nn.Linear(self.hidden_size, self.qkv_size, bias=True)
162
+ self.vision_expert_dense = nn.Linear(self.hidden_size, self.hidden_size, bias=False)
163
+ self.language_expert_query_key_value = nn.Linear(self.hidden_size, self.qkv_size, bias=False)
164
+ self.language_expert_dense = nn.Linear(self.hidden_size, self.hidden_size, bias=False)
165
+
166
+ def _transpose_for_scores(self, tensor):
167
+ """Transpose a 3D tensor [B, L, H*HD] into a 4D tensor with size [B H L HD]."""
168
+ new_tensor_shape = tensor.size()[:-1] + \
169
+ (-1, # flexible for multi-query
170
+ self.hidden_size_per_attention_head)
171
+ tensor = tensor.view(*new_tensor_shape)
172
+ return tensor.permute(0, 2, 1, 3)
173
+
174
+ def forward(
175
+ self,
176
+ hidden_states: torch.Tensor,
177
+ token_type_ids: torch.LongTensor,
178
+ position_ids: torch.LongTensor,
179
+ attention_mask: Optional[torch.Tensor] = None,
180
+ past_key_value: Optional[Tuple[torch.Tensor]] = None,
181
+ output_attentions: bool = False,
182
+ use_cache: bool = False,
183
+ ) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]:
184
+ bsz, q_len, _ = hidden_states.size()
185
+ vision_token_mask, language_token_mask = get_expert_mask(token_type_ids)
186
+
187
+ shape = list(hidden_states.shape)
188
+ shape[-1] = self.qkv_size
189
+ mixed_raw_layer = torch.empty(shape, dtype=hidden_states.dtype, device=hidden_states.device)
190
+ mixed_raw_layer[vision_token_mask] = self.vision_expert_query_key_value(hidden_states[vision_token_mask])
191
+ mixed_raw_layer[language_token_mask] = self.language_expert_query_key_value(hidden_states[language_token_mask])
192
+
193
+ # query_states, key_states, value_states = torch.split(mixed_raw_layer, self.hidden_size, dim=-1)
194
+ factor = mixed_raw_layer.size()[-1] // sum(self.stride)
195
+ query_states, key_states, value_states = torch.split(mixed_raw_layer, [factor * x for x in self.stride], dim=-1)
196
+
197
+ query_states = self._transpose_for_scores(query_states) # B, H, L, HD
198
+ key_states = self._transpose_for_scores(key_states) # B, H, L, HD
199
+ value_states = self._transpose_for_scores(value_states) # B, H, L, HD
200
+
201
+ kv_seq_len = key_states.shape[-2]
202
+ if past_key_value is not None:
203
+ kv_seq_len += past_key_value[0].shape[-2]
204
+
205
+ query_states, key_states = self.rotary_emb(query_states, key_states, position_ids=position_ids, max_seqlen=position_ids.max() + 1)
206
+
207
+ if past_key_value is not None:
208
+ key_states = torch.cat([past_key_value[0], key_states], dim=2)
209
+ value_states = torch.cat([past_key_value[1], value_states], dim=2)
210
+
211
+ past_key_value = (key_states, value_states) if use_cache else None
212
+
213
+ key_states = key_states.unsqueeze(2).expand(-1, -1, self.num_attention_heads // self.num_multi_query_heads, -1, -1).contiguous().view(
214
+ bsz, self.num_attention_heads, *key_states.shape[2:])
215
+ value_states = value_states.unsqueeze(2).expand(-1, -1, self.num_attention_heads // self.num_multi_query_heads, -1,
216
+ -1).contiguous().view(bsz, self.num_attention_heads, *value_states.shape[2:])
217
+
218
+ context_layer = attention_fn(
219
+ query_layer=query_states, key_layer=key_states, value_layer=value_states, attention_mask=attention_mask,
220
+ scaling_attention_score=True, attention_dropout=None)
221
+ if context_layer.size() != (bsz, self.num_attention_heads, q_len, self.head_dim):
222
+ raise ValueError(
223
+ f"`attn_output` should be of size {(bsz, self.num_attention_heads, q_len, self.head_dim)}, but is"
224
+ f" {context_layer.size()}"
225
+ )
226
+ context_layer = context_layer.transpose(1, 2).contiguous().reshape(bsz, q_len, self.hidden_size)
227
+
228
+ attn_output = torch.empty(context_layer.shape, dtype=hidden_states.dtype, device=hidden_states.device)
229
+ attn_output[vision_token_mask] = self.vision_expert_dense(context_layer[vision_token_mask])
230
+ attn_output[language_token_mask] = self.language_expert_dense(context_layer[language_token_mask])
231
+
232
+ if output_attentions:
233
+ warnings.warn("output_attentions is not implemented.")
234
+
235
+ return attn_output, None, past_key_value
236
+
237
+
238
+ class CogVLMDecoderLayer(nn.Module):
239
+ def __init__(self, config):
240
+ super().__init__()
241
+ self.hidden_size = config.hidden_size
242
+ self.self_attn = VisionExpertAttention(config=config)
243
+ self.mlp = VisionExpertMLP(config)
244
+ self.input_layernorm = RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
245
+ self.post_attention_layernorm = RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
246
+
247
+ def forward(
248
+ self,
249
+ hidden_states: torch.Tensor,
250
+ token_type_ids: torch.LongTensor,
251
+ position_ids: torch.LongTensor,
252
+ attention_mask: Optional[torch.Tensor] = None,
253
+ past_key_value: Optional[Tuple[torch.Tensor]] = None,
254
+ output_attentions: Optional[bool] = False,
255
+ use_cache: Optional[bool] = False,
256
+ ) -> Tuple[torch.FloatTensor, Optional[Tuple[torch.FloatTensor, torch.FloatTensor]]]:
257
+ residual = hidden_states
258
+
259
+ hidden_states = self.input_layernorm(hidden_states)
260
+
261
+ # Self Attention
262
+ hidden_states, self_attn_weights, present_key_value = self.self_attn(
263
+ hidden_states=hidden_states,
264
+ token_type_ids=token_type_ids,
265
+ position_ids=position_ids,
266
+ attention_mask=attention_mask,
267
+ past_key_value=past_key_value,
268
+ output_attentions=output_attentions,
269
+ use_cache=use_cache,
270
+ )
271
+ hidden_states = residual + hidden_states
272
+
273
+ # Fully Connected
274
+ residual = hidden_states
275
+ hidden_states = self.post_attention_layernorm(hidden_states)
276
+ hidden_states = self.mlp(hidden_states, token_type_ids=token_type_ids)
277
+ hidden_states = residual + hidden_states
278
+
279
+ outputs = (hidden_states,)
280
+
281
+ if output_attentions:
282
+ outputs += (self_attn_weights,)
283
+
284
+ if use_cache:
285
+ outputs += (present_key_value,)
286
+
287
+ return outputs # type: ignore
288
+
289
+
290
+ class CogVLMPreTrainedModel(PreTrainedModel):
291
+ config_class = CogVLMConfig
292
+ base_model_prefix = "model"
293
+ supports_gradient_checkpointing = False
294
+ _no_split_modules = ["CogVLMDecoderLayer"]
295
+ _skip_keys_device_placement = "past_key_values"
296
+
297
+ def _init_weights(self, module):
298
+ std = self.config.initializer_range
299
+ if isinstance(module, nn.Linear):
300
+ module.weight.data.normal_(mean=0.0, std=std)
301
+ if module.bias is not None:
302
+ module.bias.data.zero_()
303
+ elif isinstance(module, nn.Embedding):
304
+ module.weight.data.normal_(mean=0.0, std=std)
305
+ if module.padding_idx is not None:
306
+ module.weight.data[module.padding_idx].zero_()
307
+
308
+
309
+ def is_empty(images_list: Optional[List[List[torch.Tensor]]]):
310
+ if images_list is None or len(images_list) == 0:
311
+ return True
312
+ for image_list in images_list:
313
+ if len(image_list):
314
+ return False
315
+ return True
316
+
317
+
318
+ def build_position_ids(x: "torch.BoolTensor(B, L)", attention_mask: Optional["torch.BoolTensor(B, L)"] = None) -> "torch.LongTensor(B, L)":
319
+ if attention_mask is not None:
320
+ tmp = x.clone()
321
+ tmp[~(attention_mask.bool())] = -1
322
+ else:
323
+ tmp = x.clone()
324
+ # image boi eoi token as LANGUAGE_TOKEN_TYPE
325
+ is_boi_eoi = torch.zeros_like(x, dtype=torch.bool)
326
+ is_boi_eoi[:, 1:] |= (tmp[:, 1:] == VISION_TOKEN_TYPE) & (tmp[:, :-1] == LANGUAGE_TOKEN_TYPE)
327
+ is_boi_eoi[:, 0] |= (tmp[:, 0] == VISION_TOKEN_TYPE)
328
+ is_boi_eoi[:, :-1] |= (tmp[:, :-1] == VISION_TOKEN_TYPE) & (tmp[:, 1:] == LANGUAGE_TOKEN_TYPE)
329
+ is_boi_eoi[:, -1] |= (tmp[:, -1] == VISION_TOKEN_TYPE)
330
+ tmp[is_boi_eoi] = LANGUAGE_TOKEN_TYPE
331
+ # final position ids
332
+ y = torch.zeros_like(x, dtype=torch.long)
333
+ y[:, 1:] = (tmp[:, 1:] == LANGUAGE_TOKEN_TYPE) | ((tmp[:, 1:] == VISION_TOKEN_TYPE) & (tmp[:, :-1] == LANGUAGE_TOKEN_TYPE))
334
+ y = y.cumsum(dim=-1)
335
+ return y
336
+
337
+
338
+ class CogVLMModel(CogVLMPreTrainedModel):
339
+ def __init__(self, config):
340
+ super().__init__(config)
341
+ self.padding_idx = 128002
342
+ self.vocab_size = config.vocab_size
343
+ self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx)
344
+ self.layers = nn.ModuleList([CogVLMDecoderLayer(config) for _ in range(config.num_hidden_layers)])
345
+ self.norm = RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
346
+
347
+ self.vision = EVA2CLIPModel(config)
348
+
349
+ self.gradient_checkpointing = False
350
+ # Initialize weights and apply final processing
351
+ self.post_init()
352
+
353
+ def encode_images(self, images: List[List[torch.Tensor]]) -> torch.Tensor:
354
+ images_list, images = images, []
355
+
356
+ images = []
357
+ for image_list in images_list:
358
+ for image in image_list:
359
+ images.append(image)
360
+
361
+ images = torch.stack(images)
362
+ images_features = self.vision(images)
363
+ return images_features
364
+
365
+ def forward(
366
+ self,
367
+ input_ids: torch.LongTensor = None,
368
+ images: List[List[torch.Tensor]] = None,
369
+ token_type_ids: Optional[torch.LongTensor] = None,
370
+ attention_mask: Optional[torch.Tensor] = None,
371
+ position_ids: Optional[torch.LongTensor] = None,
372
+ past_key_values: Optional[List[torch.FloatTensor]] = None,
373
+ inputs_embeds: Optional[torch.FloatTensor] = None,
374
+ use_cache: Optional[bool] = None,
375
+ output_attentions: Optional[bool] = None,
376
+ output_hidden_states: Optional[bool] = None,
377
+ return_dict: Optional[bool] = None,
378
+ ) -> Union[Tuple, BaseModelOutputWithPast]:
379
+ """take care of image_encode, token_type_ids, position_ids and (attention_mask = None is fine)"""
380
+
381
+ if past_key_values is not None:
382
+ pass # generate mode with past_key_values. the image features are already mapped
383
+ else:
384
+ # not allow for inputs_embeds, because we want to process image feature
385
+ assert input_ids is not None and inputs_embeds is None, f"{input_ids} {inputs_embeds}"
386
+ if not is_empty(images): # multi-modality
387
+ assert token_type_ids is not None, f"multi-modality requires `token_type_ids`!"
388
+ assert len(input_ids) == len(images), f"{len(input_ids)} {len(images)}"
389
+ inputs_embeds = self.embed_tokens(input_ids)
390
+ images_features = self.encode_images(images)
391
+ images_features = rearrange(images_features, 'b n d -> (b n) d')
392
+ images_features = images_features.to(dtype=inputs_embeds.dtype, device=inputs_embeds.device)
393
+ inputs_embeds = inputs_embeds.index_put([token_type_ids == VISION_TOKEN_TYPE], images_features)
394
+ else: # single-modality
395
+ if token_type_ids is None:
396
+ token_type_ids = torch.ones_like(input_ids, dtype=torch.long, device=input_ids.device) * LANGUAGE_TOKEN_TYPE
397
+ assert not (token_type_ids == VISION_TOKEN_TYPE).any(), f"{(token_type_ids == VISION_TOKEN_TYPE).sum()}"
398
+ inputs_embeds = self.embed_tokens(input_ids)
399
+
400
+ if position_ids is None:
401
+ position_ids = build_position_ids(token_type_ids, attention_mask)
402
+ input_ids = None
403
+ return self.llm_forward(
404
+ input_ids=input_ids,
405
+ token_type_ids=token_type_ids,
406
+ attention_mask=attention_mask,
407
+ position_ids=position_ids,
408
+ past_key_values=past_key_values,
409
+ inputs_embeds=inputs_embeds,
410
+ use_cache=use_cache,
411
+ output_attentions=output_attentions,
412
+ output_hidden_states=output_hidden_states,
413
+ return_dict=return_dict,
414
+ )
415
+
416
+ def llm_forward(
417
+ self,
418
+ input_ids: torch.LongTensor = None,
419
+ token_type_ids: torch.LongTensor = None,
420
+ attention_mask: Optional[torch.Tensor] = None,
421
+ position_ids: Optional[torch.LongTensor] = None,
422
+ past_key_values: Optional[List[torch.FloatTensor]] = None,
423
+ inputs_embeds: Optional[torch.FloatTensor] = None,
424
+ use_cache: Optional[bool] = None,
425
+ output_attentions: Optional[bool] = None,
426
+ output_hidden_states: Optional[bool] = None,
427
+ return_dict: Optional[bool] = None,
428
+ ) -> Union[Tuple, BaseModelOutputWithPast]:
429
+ """largely copy from llama forward and adapt for cogvlm with `token_type_ids`"""
430
+ output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
431
+ output_hidden_states = (
432
+ output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
433
+ )
434
+ use_cache = use_cache if use_cache is not None else self.config.use_cache
435
+
436
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
437
+
438
+ # retrieve input_ids and inputs_embeds
439
+ if input_ids is not None and inputs_embeds is not None:
440
+ raise ValueError("You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time")
441
+ elif input_ids is not None:
442
+ batch_size, seq_length = input_ids.shape
443
+ elif inputs_embeds is not None:
444
+ batch_size, seq_length, _ = inputs_embeds.shape
445
+ else:
446
+ raise ValueError("You have to specify either decoder_input_ids or decoder_inputs_embeds")
447
+
448
+ seq_length_with_past = seq_length
449
+ past_key_values_length = 0
450
+
451
+ if past_key_values is not None:
452
+ past_key_values_length = past_key_values[0][0].shape[2]
453
+ seq_length_with_past = seq_length_with_past + past_key_values_length
454
+
455
+ if position_ids is None:
456
+ device = input_ids.device if input_ids is not None else inputs_embeds.device
457
+ position_ids = torch.arange(
458
+ past_key_values_length, seq_length + past_key_values_length, dtype=torch.long, device=device
459
+ )
460
+ position_ids = position_ids.unsqueeze(0).view(-1, seq_length)
461
+ else:
462
+ position_ids = position_ids.view(-1, seq_length).long()
463
+
464
+ if inputs_embeds is None:
465
+ inputs_embeds = self.embed_tokens(input_ids)
466
+ # embed positions
467
+ if attention_mask is None:
468
+ attention_mask = torch.ones(
469
+ (batch_size, seq_length_with_past), dtype=torch.bool, device=inputs_embeds.device
470
+ )
471
+ attention_mask = self._prepare_decoder_attention_mask(
472
+ attention_mask, (batch_size, seq_length), inputs_embeds, past_key_values_length
473
+ )
474
+
475
+ hidden_states = inputs_embeds
476
+
477
+ # decoder layers
478
+ all_hidden_states = () if output_hidden_states else None
479
+ all_self_attns = () if output_attentions else None
480
+ next_decoder_cache = () if use_cache else None
481
+
482
+ for idx, decoder_layer in enumerate(self.layers):
483
+ if output_hidden_states:
484
+ all_hidden_states += (hidden_states,)
485
+
486
+ past_key_value = past_key_values[idx] if past_key_values is not None else None
487
+
488
+ def custom(index):
489
+ def custom_forward(
490
+ hidden_states,
491
+ token_type_ids=token_type_ids,
492
+ attention_mask=attention_mask,
493
+ position_ids=position_ids,
494
+ past_key_value=past_key_value,
495
+ output_attentions=output_attentions,
496
+ use_cache=use_cache,
497
+ ):
498
+ layer = self.layers[index]
499
+ outputs = layer(
500
+ hidden_states,
501
+ token_type_ids=token_type_ids,
502
+ attention_mask=attention_mask,
503
+ position_ids=position_ids,
504
+ past_key_value=past_key_value,
505
+ output_attentions=output_attentions,
506
+ use_cache=use_cache,
507
+ )
508
+ return outputs
509
+
510
+ return custom_forward
511
+ # layer_outputs = decoder_layer(
512
+ # hidden_states,
513
+ # token_type_ids=token_type_ids,
514
+ # attention_mask=attention_mask,
515
+ # position_ids=position_ids,
516
+ # past_key_value=past_key_value,
517
+ # output_attentions=output_attentions,
518
+ # use_cache=use_cache,
519
+ # )
520
+ layer_outputs = checkpoint(custom(idx),
521
+ hidden_states,
522
+ use_reentrant=False
523
+ )
524
+ hidden_states = layer_outputs[0]
525
+
526
+ if use_cache:
527
+ next_decoder_cache += (layer_outputs[2 if output_attentions else 1],)
528
+
529
+ if output_attentions:
530
+ all_self_attns += (layer_outputs[1],)
531
+
532
+ hidden_states = self.norm(hidden_states)
533
+
534
+ # add hidden states from the last decoder layer
535
+ if output_hidden_states:
536
+ all_hidden_states += (hidden_states,)
537
+
538
+ next_cache = next_decoder_cache if use_cache else None
539
+ if not return_dict:
540
+ return tuple(v for v in [hidden_states, next_cache, all_hidden_states, all_self_attns] if v is not None)
541
+ return BaseModelOutputWithPast(
542
+ last_hidden_state=hidden_states,
543
+ past_key_values=next_cache,
544
+ hidden_states=all_hidden_states,
545
+ attentions=all_self_attns,
546
+ )
547
+
548
+ def get_input_embeddings(self):
549
+ return self.embed_tokens
550
+
551
+ def set_input_embeddings(self, value):
552
+ self.embed_tokens = value
553
+
554
+ # noinspection PyMethodMayBeStatic
555
+ # Copied from transformers.models.bart.modeling_bart.BartDecoder._prepare_decoder_attention_mask
556
+ def _prepare_decoder_attention_mask(self, attention_mask, input_shape, inputs_embeds, past_key_values_length):
557
+ # create causal mask
558
+ # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]
559
+ combined_attention_mask = None
560
+ if input_shape[-1] > 1:
561
+ combined_attention_mask = _make_causal_mask(
562
+ input_shape,
563
+ inputs_embeds.dtype,
564
+ device=inputs_embeds.device,
565
+ past_key_values_length=past_key_values_length,
566
+ )
567
+
568
+ if attention_mask is not None:
569
+ # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]
570
+ expanded_attn_mask = _expand_mask(attention_mask, inputs_embeds.dtype, tgt_len=input_shape[-1]).to(
571
+ inputs_embeds.device
572
+ )
573
+ combined_attention_mask = (
574
+ expanded_attn_mask if combined_attention_mask is None else expanded_attn_mask + combined_attention_mask
575
+ )
576
+
577
+ return combined_attention_mask
578
+
579
+
580
+ def _history_to_prompt(signal_type, history, query):
581
+ if signal_type == 'base':
582
+ return query
583
+ elif signal_type == 'vqa':
584
+ answer_format = 'Short answer:'
585
+ elif signal_type == 'chat':
586
+ answer_format = 'Answer:'
587
+ else:
588
+ assert False, f"Unknown signal type {signal_type}"
589
+
590
+ prompt = ''
591
+ for i, (old_query, response) in enumerate(history):
592
+ prompt += 'Question: ' + old_query + " {} ".format(answer_format) + response + "\n"
593
+ prompt += 'Question: {} {}'.format(query, answer_format)
594
+ return prompt
595
+
596
+
597
+ class CogVLMForCausalLM(CogVLMPreTrainedModel):
598
+ _auto_class = "AutoModelForCausalLM"
599
+
600
+ def __init__(self, config):
601
+ super().__init__(config)
602
+ self.model = CogVLMModel(config)
603
+ self.vocab_size = config.vocab_size
604
+ self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)
605
+
606
+ # Initialize weights and apply final processing
607
+ self.post_init()
608
+
609
+ def get_input_embeddings(self):
610
+ return self.model.embed_tokens
611
+
612
+ def set_input_embeddings(self, value):
613
+ self.model.embed_tokens = value
614
+
615
+ def get_output_embeddings(self):
616
+ return self.lm_head
617
+
618
+ def set_output_embeddings(self, new_embeddings):
619
+ self.lm_head = new_embeddings
620
+
621
+ def set_decoder(self, decoder):
622
+ self.model = decoder
623
+
624
+ def get_decoder(self):
625
+ return self.model
626
+
627
+ def forward(
628
+ self,
629
+ input_ids: torch.LongTensor = None,
630
+ images: List[List[torch.Tensor]] = None,
631
+ token_type_ids: Optional[torch.LongTensor] = None,
632
+ attention_mask: Optional[torch.Tensor] = None,
633
+ position_ids: Optional[torch.LongTensor] = None,
634
+ past_key_values: Optional[List[torch.FloatTensor]] = None,
635
+ inputs_embeds: Optional[torch.FloatTensor] = None,
636
+ use_cache: Optional[bool] = None,
637
+ output_attentions: Optional[bool] = None,
638
+ output_hidden_states: Optional[bool] = None,
639
+ return_dict: Optional[bool] = None,
640
+ labels: Optional[torch.LongTensor] = None,
641
+ ) -> Union[Tuple, CausalLMOutputWithPast]:
642
+ output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
643
+ output_hidden_states = (
644
+ output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
645
+ )
646
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
647
+
648
+ # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn)
649
+ outputs = self.model(
650
+ input_ids=input_ids,
651
+ images=images,
652
+ token_type_ids=token_type_ids,
653
+ attention_mask=attention_mask,
654
+ position_ids=position_ids,
655
+ past_key_values=past_key_values,
656
+ inputs_embeds=inputs_embeds,
657
+ use_cache=use_cache,
658
+ output_attentions=output_attentions,
659
+ output_hidden_states=output_hidden_states,
660
+ return_dict=return_dict,
661
+ )
662
+
663
+ hidden_states = outputs[0]
664
+ logits = self.lm_head(hidden_states)
665
+ logits = logits.float()
666
+
667
+ loss = None
668
+ if labels is not None:
669
+ # Shift so that tokens < n predict n
670
+ shift_logits = logits[..., :-1, :].contiguous()
671
+ shift_labels = labels[..., 1:].contiguous()
672
+ # Flatten the tokens
673
+ loss_fct = CrossEntropyLoss()
674
+ shift_logits = shift_logits.view(-1, self.config.vocab_size)
675
+ shift_labels = shift_labels.view(-1)
676
+ # Enable model parallelism
677
+ shift_labels = shift_labels.to(shift_logits.device)
678
+ loss = loss_fct(shift_logits, shift_labels)
679
+
680
+ if not return_dict:
681
+ output = (logits,) + outputs[1:]
682
+ return (loss,) + output if loss is not None else output
683
+
684
+ return CausalLMOutputWithPast(
685
+ loss=loss,
686
+ logits=logits,
687
+ past_key_values=outputs.past_key_values,
688
+ hidden_states=outputs.hidden_states,
689
+ attentions=outputs.attentions,
690
+ )
691
+
692
+ def _prepare_attention_mask_for_generation(
693
+ self,
694
+ inputs: torch.Tensor,
695
+ pad_token_id: Optional[int],
696
+ eos_token_id: Optional[Union[int, List[int]]],
697
+ ) -> torch.LongTensor:
698
+ return torch.ones(inputs.shape[:2], dtype=torch.long, device=inputs.device) # type: ignore
699
+
700
+ def prepare_inputs_for_generation(
701
+ self, input_ids, token_type_ids, images=None, past_key_values=None, attention_mask=None, inputs_embeds=None, **kwargs
702
+ ):
703
+ # build position_ids if needed
704
+ position_ids = kwargs.get("position_ids", None)
705
+ if position_ids is None:
706
+ position_ids = build_position_ids(token_type_ids, attention_mask)
707
+
708
+ if past_key_values:
709
+ input_ids = input_ids[:, -1:]
710
+ token_type_ids = token_type_ids[:, -1:]
711
+ position_ids = position_ids[:, -1:]
712
+
713
+ # if `inputs_embeds` are passed, we only want to use them in the 1st generation step
714
+ if inputs_embeds is not None and past_key_values is None:
715
+ model_inputs = {"inputs_embeds": inputs_embeds}
716
+ else:
717
+ model_inputs = {"input_ids": input_ids}
718
+
719
+ model_inputs.update(
720
+ {
721
+ "token_type_ids": token_type_ids,
722
+ "images": images,
723
+ "position_ids": position_ids,
724
+ "past_key_values": past_key_values,
725
+ "use_cache": kwargs.get("use_cache"),
726
+ "attention_mask": attention_mask,
727
+ }
728
+ )
729
+ return model_inputs
730
+
731
+ def _update_model_kwargs_for_generation(
732
+ self,
733
+ outputs: "ModelOutput",
734
+ model_kwargs: Dict[str, Any],
735
+ is_encoder_decoder: bool = False,
736
+ standardize_cache_format: bool = False,
737
+ ) -> Dict[str, Any]:
738
+ # update past_key_values
739
+ model_kwargs["past_key_values"] = self._extract_past_from_model_output(
740
+ outputs, standardize_cache_format=standardize_cache_format
741
+ )
742
+ if getattr(outputs, "state", None) is not None:
743
+ model_kwargs["state"] = outputs.state
744
+
745
+ # update token_type_ids with last value
746
+ if "token_type_ids" in model_kwargs:
747
+ token_type_ids = model_kwargs["token_type_ids"]
748
+ new_token_type_ids = torch.ones(size=(token_type_ids.shape[0], 1), dtype=token_type_ids.dtype, device=token_type_ids.device) * LANGUAGE_TOKEN_TYPE
749
+ model_kwargs["token_type_ids"] = torch.cat([token_type_ids, new_token_type_ids], dim=-1)
750
+
751
+ if not is_encoder_decoder:
752
+ # update attention mask
753
+ if "attention_mask" in model_kwargs:
754
+ attention_mask = model_kwargs["attention_mask"]
755
+ model_kwargs["attention_mask"] = torch.cat(
756
+ [attention_mask, attention_mask.new_ones((attention_mask.shape[0], 1))], dim=-1
757
+ )
758
+ else:
759
+ # update decoder attention mask
760
+ if "decoder_attention_mask" in model_kwargs:
761
+ decoder_attention_mask = model_kwargs["decoder_attention_mask"]
762
+ model_kwargs["decoder_attention_mask"] = torch.cat(
763
+ [decoder_attention_mask, decoder_attention_mask.new_ones((decoder_attention_mask.shape[0], 1))],
764
+ dim=-1,
765
+ )
766
+
767
+ return model_kwargs
768
+
769
+ def _reorder_cache(self, past_key_values, beam_idx):
770
+ reordered_past = ()
771
+ for layer_past in past_key_values:
772
+ reordered_past += (
773
+ tuple(past_state.index_select(0, beam_idx.to(past_state.device)) for past_state in layer_past),
774
+ )
775
+ return reordered_past
776
+
777
+ def build_conversation_input_ids(
778
+ self,
779
+ tokenizer: "PreTrainedTokenizer",
780
+ *,
781
+ query: str,
782
+ history: Optional[List[Tuple[str, str]]] = None,
783
+ images: Optional[List["PIL.Image"]] = None,
784
+ template_version: Optional[Literal["base", "chat", "vqa"]] = None,
785
+ answer: str = None,
786
+ ):
787
+ image_size: int = self.config.vision_config['image_size']
788
+ patch_size: int = self.config.vision_config['patch_size']
789
+ template_version = template_version or self.config.template_version
790
+ assert images is None or len(images) <= 1, f"not support multi images by now."
791
+ history = history or []
792
+ text = _history_to_prompt(template_version, history, query)
793
+ input_ids = [tokenizer.bos_token_id]
794
+ token_type_ids = [LANGUAGE_TOKEN_TYPE]
795
+ if images is not None and len(images) == 1:
796
+ # vision
797
+ transform = transforms.Compose(
798
+ [
799
+ transforms.Resize(
800
+ (image_size, image_size), interpolation=transforms.InterpolationMode.BICUBIC
801
+ ),
802
+ transforms.ToTensor(),
803
+ transforms.Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711)),
804
+ ]
805
+ )
806
+ images = [transform(images[0])]
807
+ # language
808
+ vision_token_num = (image_size // patch_size // 2) * (image_size // patch_size // 2) + 2
809
+
810
+ tokenizer.pad_token_id = 128002 # llama3 adapt for cogvlm
811
+
812
+ input_ids += [tokenizer.pad_token_id] * vision_token_num
813
+ token_type_ids += [VISION_TOKEN_TYPE] * vision_token_num
814
+ text_ids = tokenizer.encode(text, add_special_tokens=False)
815
+
816
+ if answer is not None:
817
+ answer_ids = tokenizer.encode(answer, add_special_tokens=False)
818
+ answer_ids += [tokenizer.eos_token_id]
819
+ text_ids += answer_ids
820
+
821
+
822
+ input_ids += text_ids
823
+ token_type_ids += [LANGUAGE_TOKEN_TYPE] * len(text_ids)
824
+ attention_mask = [1] * len(input_ids)
825
+ if answer is not None:
826
+ labels = [-100 for _ in range(len(input_ids) - len(answer_ids))] + answer_ids
827
+ labels = torch.tensor(labels, dtype=torch.long)
828
+ else:
829
+ labels = None
830
+
831
+ return {
832
+ 'input_ids': torch.tensor(input_ids, dtype=torch.long),
833
+ 'token_type_ids': torch.tensor(token_type_ids, dtype=torch.long),
834
+ 'attention_mask': torch.tensor(attention_mask, dtype=torch.long),
835
+ 'images': images,
836
+ 'labels': labels,
837
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "bos_token": "<|begin_of_text|>",
3
+ "eos_token": "<|end_of_text|>"
4
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,2062 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "128000": {
4
+ "content": "<|begin_of_text|>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "128001": {
12
+ "content": "<|end_of_text|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "128002": {
20
+ "content": "<|reserved_special_token_0|>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "128003": {
28
+ "content": "<|reserved_special_token_1|>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "128004": {
36
+ "content": "<|reserved_special_token_2|>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "128005": {
44
+ "content": "<|reserved_special_token_3|>",
45
+ "lstrip": false,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ },
51
+ "128006": {
52
+ "content": "<|start_header_id|>",
53
+ "lstrip": false,
54
+ "normalized": false,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": true
58
+ },
59
+ "128007": {
60
+ "content": "<|end_header_id|>",
61
+ "lstrip": false,
62
+ "normalized": false,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": true
66
+ },
67
+ "128008": {
68
+ "content": "<|reserved_special_token_4|>",
69
+ "lstrip": false,
70
+ "normalized": false,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": true
74
+ },
75
+ "128009": {
76
+ "content": "<|eot_id|>",
77
+ "lstrip": false,
78
+ "normalized": false,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": true
82
+ },
83
+ "128010": {
84
+ "content": "<|reserved_special_token_5|>",
85
+ "lstrip": false,
86
+ "normalized": false,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": true
90
+ },
91
+ "128011": {
92
+ "content": "<|reserved_special_token_6|>",
93
+ "lstrip": false,
94
+ "normalized": false,
95
+ "rstrip": false,
96
+ "single_word": false,
97
+ "special": true
98
+ },
99
+ "128012": {
100
+ "content": "<|reserved_special_token_7|>",
101
+ "lstrip": false,
102
+ "normalized": false,
103
+ "rstrip": false,
104
+ "single_word": false,
105
+ "special": true
106
+ },
107
+ "128013": {
108
+ "content": "<|reserved_special_token_8|>",
109
+ "lstrip": false,
110
+ "normalized": false,
111
+ "rstrip": false,
112
+ "single_word": false,
113
+ "special": true
114
+ },
115
+ "128014": {
116
+ "content": "<|reserved_special_token_9|>",
117
+ "lstrip": false,
118
+ "normalized": false,
119
+ "rstrip": false,
120
+ "single_word": false,
121
+ "special": true
122
+ },
123
+ "128015": {
124
+ "content": "<|reserved_special_token_10|>",
125
+ "lstrip": false,
126
+ "normalized": false,
127
+ "rstrip": false,
128
+ "single_word": false,
129
+ "special": true
130
+ },
131
+ "128016": {
132
+ "content": "<|reserved_special_token_11|>",
133
+ "lstrip": false,
134
+ "normalized": false,
135
+ "rstrip": false,
136
+ "single_word": false,
137
+ "special": true
138
+ },
139
+ "128017": {
140
+ "content": "<|reserved_special_token_12|>",
141
+ "lstrip": false,
142
+ "normalized": false,
143
+ "rstrip": false,
144
+ "single_word": false,
145
+ "special": true
146
+ },
147
+ "128018": {
148
+ "content": "<|reserved_special_token_13|>",
149
+ "lstrip": false,
150
+ "normalized": false,
151
+ "rstrip": false,
152
+ "single_word": false,
153
+ "special": true
154
+ },
155
+ "128019": {
156
+ "content": "<|reserved_special_token_14|>",
157
+ "lstrip": false,
158
+ "normalized": false,
159
+ "rstrip": false,
160
+ "single_word": false,
161
+ "special": true
162
+ },
163
+ "128020": {
164
+ "content": "<|reserved_special_token_15|>",
165
+ "lstrip": false,
166
+ "normalized": false,
167
+ "rstrip": false,
168
+ "single_word": false,
169
+ "special": true
170
+ },
171
+ "128021": {
172
+ "content": "<|reserved_special_token_16|>",
173
+ "lstrip": false,
174
+ "normalized": false,
175
+ "rstrip": false,
176
+ "single_word": false,
177
+ "special": true
178
+ },
179
+ "128022": {
180
+ "content": "<|reserved_special_token_17|>",
181
+ "lstrip": false,
182
+ "normalized": false,
183
+ "rstrip": false,
184
+ "single_word": false,
185
+ "special": true
186
+ },
187
+ "128023": {
188
+ "content": "<|reserved_special_token_18|>",
189
+ "lstrip": false,
190
+ "normalized": false,
191
+ "rstrip": false,
192
+ "single_word": false,
193
+ "special": true
194
+ },
195
+ "128024": {
196
+ "content": "<|reserved_special_token_19|>",
197
+ "lstrip": false,
198
+ "normalized": false,
199
+ "rstrip": false,
200
+ "single_word": false,
201
+ "special": true
202
+ },
203
+ "128025": {
204
+ "content": "<|reserved_special_token_20|>",
205
+ "lstrip": false,
206
+ "normalized": false,
207
+ "rstrip": false,
208
+ "single_word": false,
209
+ "special": true
210
+ },
211
+ "128026": {
212
+ "content": "<|reserved_special_token_21|>",
213
+ "lstrip": false,
214
+ "normalized": false,
215
+ "rstrip": false,
216
+ "single_word": false,
217
+ "special": true
218
+ },
219
+ "128027": {
220
+ "content": "<|reserved_special_token_22|>",
221
+ "lstrip": false,
222
+ "normalized": false,
223
+ "rstrip": false,
224
+ "single_word": false,
225
+ "special": true
226
+ },
227
+ "128028": {
228
+ "content": "<|reserved_special_token_23|>",
229
+ "lstrip": false,
230
+ "normalized": false,
231
+ "rstrip": false,
232
+ "single_word": false,
233
+ "special": true
234
+ },
235
+ "128029": {
236
+ "content": "<|reserved_special_token_24|>",
237
+ "lstrip": false,
238
+ "normalized": false,
239
+ "rstrip": false,
240
+ "single_word": false,
241
+ "special": true
242
+ },
243
+ "128030": {
244
+ "content": "<|reserved_special_token_25|>",
245
+ "lstrip": false,
246
+ "normalized": false,
247
+ "rstrip": false,
248
+ "single_word": false,
249
+ "special": true
250
+ },
251
+ "128031": {
252
+ "content": "<|reserved_special_token_26|>",
253
+ "lstrip": false,
254
+ "normalized": false,
255
+ "rstrip": false,
256
+ "single_word": false,
257
+ "special": true
258
+ },
259
+ "128032": {
260
+ "content": "<|reserved_special_token_27|>",
261
+ "lstrip": false,
262
+ "normalized": false,
263
+ "rstrip": false,
264
+ "single_word": false,
265
+ "special": true
266
+ },
267
+ "128033": {
268
+ "content": "<|reserved_special_token_28|>",
269
+ "lstrip": false,
270
+ "normalized": false,
271
+ "rstrip": false,
272
+ "single_word": false,
273
+ "special": true
274
+ },
275
+ "128034": {
276
+ "content": "<|reserved_special_token_29|>",
277
+ "lstrip": false,
278
+ "normalized": false,
279
+ "rstrip": false,
280
+ "single_word": false,
281
+ "special": true
282
+ },
283
+ "128035": {
284
+ "content": "<|reserved_special_token_30|>",
285
+ "lstrip": false,
286
+ "normalized": false,
287
+ "rstrip": false,
288
+ "single_word": false,
289
+ "special": true
290
+ },
291
+ "128036": {
292
+ "content": "<|reserved_special_token_31|>",
293
+ "lstrip": false,
294
+ "normalized": false,
295
+ "rstrip": false,
296
+ "single_word": false,
297
+ "special": true
298
+ },
299
+ "128037": {
300
+ "content": "<|reserved_special_token_32|>",
301
+ "lstrip": false,
302
+ "normalized": false,
303
+ "rstrip": false,
304
+ "single_word": false,
305
+ "special": true
306
+ },
307
+ "128038": {
308
+ "content": "<|reserved_special_token_33|>",
309
+ "lstrip": false,
310
+ "normalized": false,
311
+ "rstrip": false,
312
+ "single_word": false,
313
+ "special": true
314
+ },
315
+ "128039": {
316
+ "content": "<|reserved_special_token_34|>",
317
+ "lstrip": false,
318
+ "normalized": false,
319
+ "rstrip": false,
320
+ "single_word": false,
321
+ "special": true
322
+ },
323
+ "128040": {
324
+ "content": "<|reserved_special_token_35|>",
325
+ "lstrip": false,
326
+ "normalized": false,
327
+ "rstrip": false,
328
+ "single_word": false,
329
+ "special": true
330
+ },
331
+ "128041": {
332
+ "content": "<|reserved_special_token_36|>",
333
+ "lstrip": false,
334
+ "normalized": false,
335
+ "rstrip": false,
336
+ "single_word": false,
337
+ "special": true
338
+ },
339
+ "128042": {
340
+ "content": "<|reserved_special_token_37|>",
341
+ "lstrip": false,
342
+ "normalized": false,
343
+ "rstrip": false,
344
+ "single_word": false,
345
+ "special": true
346
+ },
347
+ "128043": {
348
+ "content": "<|reserved_special_token_38|>",
349
+ "lstrip": false,
350
+ "normalized": false,
351
+ "rstrip": false,
352
+ "single_word": false,
353
+ "special": true
354
+ },
355
+ "128044": {
356
+ "content": "<|reserved_special_token_39|>",
357
+ "lstrip": false,
358
+ "normalized": false,
359
+ "rstrip": false,
360
+ "single_word": false,
361
+ "special": true
362
+ },
363
+ "128045": {
364
+ "content": "<|reserved_special_token_40|>",
365
+ "lstrip": false,
366
+ "normalized": false,
367
+ "rstrip": false,
368
+ "single_word": false,
369
+ "special": true
370
+ },
371
+ "128046": {
372
+ "content": "<|reserved_special_token_41|>",
373
+ "lstrip": false,
374
+ "normalized": false,
375
+ "rstrip": false,
376
+ "single_word": false,
377
+ "special": true
378
+ },
379
+ "128047": {
380
+ "content": "<|reserved_special_token_42|>",
381
+ "lstrip": false,
382
+ "normalized": false,
383
+ "rstrip": false,
384
+ "single_word": false,
385
+ "special": true
386
+ },
387
+ "128048": {
388
+ "content": "<|reserved_special_token_43|>",
389
+ "lstrip": false,
390
+ "normalized": false,
391
+ "rstrip": false,
392
+ "single_word": false,
393
+ "special": true
394
+ },
395
+ "128049": {
396
+ "content": "<|reserved_special_token_44|>",
397
+ "lstrip": false,
398
+ "normalized": false,
399
+ "rstrip": false,
400
+ "single_word": false,
401
+ "special": true
402
+ },
403
+ "128050": {
404
+ "content": "<|reserved_special_token_45|>",
405
+ "lstrip": false,
406
+ "normalized": false,
407
+ "rstrip": false,
408
+ "single_word": false,
409
+ "special": true
410
+ },
411
+ "128051": {
412
+ "content": "<|reserved_special_token_46|>",
413
+ "lstrip": false,
414
+ "normalized": false,
415
+ "rstrip": false,
416
+ "single_word": false,
417
+ "special": true
418
+ },
419
+ "128052": {
420
+ "content": "<|reserved_special_token_47|>",
421
+ "lstrip": false,
422
+ "normalized": false,
423
+ "rstrip": false,
424
+ "single_word": false,
425
+ "special": true
426
+ },
427
+ "128053": {
428
+ "content": "<|reserved_special_token_48|>",
429
+ "lstrip": false,
430
+ "normalized": false,
431
+ "rstrip": false,
432
+ "single_word": false,
433
+ "special": true
434
+ },
435
+ "128054": {
436
+ "content": "<|reserved_special_token_49|>",
437
+ "lstrip": false,
438
+ "normalized": false,
439
+ "rstrip": false,
440
+ "single_word": false,
441
+ "special": true
442
+ },
443
+ "128055": {
444
+ "content": "<|reserved_special_token_50|>",
445
+ "lstrip": false,
446
+ "normalized": false,
447
+ "rstrip": false,
448
+ "single_word": false,
449
+ "special": true
450
+ },
451
+ "128056": {
452
+ "content": "<|reserved_special_token_51|>",
453
+ "lstrip": false,
454
+ "normalized": false,
455
+ "rstrip": false,
456
+ "single_word": false,
457
+ "special": true
458
+ },
459
+ "128057": {
460
+ "content": "<|reserved_special_token_52|>",
461
+ "lstrip": false,
462
+ "normalized": false,
463
+ "rstrip": false,
464
+ "single_word": false,
465
+ "special": true
466
+ },
467
+ "128058": {
468
+ "content": "<|reserved_special_token_53|>",
469
+ "lstrip": false,
470
+ "normalized": false,
471
+ "rstrip": false,
472
+ "single_word": false,
473
+ "special": true
474
+ },
475
+ "128059": {
476
+ "content": "<|reserved_special_token_54|>",
477
+ "lstrip": false,
478
+ "normalized": false,
479
+ "rstrip": false,
480
+ "single_word": false,
481
+ "special": true
482
+ },
483
+ "128060": {
484
+ "content": "<|reserved_special_token_55|>",
485
+ "lstrip": false,
486
+ "normalized": false,
487
+ "rstrip": false,
488
+ "single_word": false,
489
+ "special": true
490
+ },
491
+ "128061": {
492
+ "content": "<|reserved_special_token_56|>",
493
+ "lstrip": false,
494
+ "normalized": false,
495
+ "rstrip": false,
496
+ "single_word": false,
497
+ "special": true
498
+ },
499
+ "128062": {
500
+ "content": "<|reserved_special_token_57|>",
501
+ "lstrip": false,
502
+ "normalized": false,
503
+ "rstrip": false,
504
+ "single_word": false,
505
+ "special": true
506
+ },
507
+ "128063": {
508
+ "content": "<|reserved_special_token_58|>",
509
+ "lstrip": false,
510
+ "normalized": false,
511
+ "rstrip": false,
512
+ "single_word": false,
513
+ "special": true
514
+ },
515
+ "128064": {
516
+ "content": "<|reserved_special_token_59|>",
517
+ "lstrip": false,
518
+ "normalized": false,
519
+ "rstrip": false,
520
+ "single_word": false,
521
+ "special": true
522
+ },
523
+ "128065": {
524
+ "content": "<|reserved_special_token_60|>",
525
+ "lstrip": false,
526
+ "normalized": false,
527
+ "rstrip": false,
528
+ "single_word": false,
529
+ "special": true
530
+ },
531
+ "128066": {
532
+ "content": "<|reserved_special_token_61|>",
533
+ "lstrip": false,
534
+ "normalized": false,
535
+ "rstrip": false,
536
+ "single_word": false,
537
+ "special": true
538
+ },
539
+ "128067": {
540
+ "content": "<|reserved_special_token_62|>",
541
+ "lstrip": false,
542
+ "normalized": false,
543
+ "rstrip": false,
544
+ "single_word": false,
545
+ "special": true
546
+ },
547
+ "128068": {
548
+ "content": "<|reserved_special_token_63|>",
549
+ "lstrip": false,
550
+ "normalized": false,
551
+ "rstrip": false,
552
+ "single_word": false,
553
+ "special": true
554
+ },
555
+ "128069": {
556
+ "content": "<|reserved_special_token_64|>",
557
+ "lstrip": false,
558
+ "normalized": false,
559
+ "rstrip": false,
560
+ "single_word": false,
561
+ "special": true
562
+ },
563
+ "128070": {
564
+ "content": "<|reserved_special_token_65|>",
565
+ "lstrip": false,
566
+ "normalized": false,
567
+ "rstrip": false,
568
+ "single_word": false,
569
+ "special": true
570
+ },
571
+ "128071": {
572
+ "content": "<|reserved_special_token_66|>",
573
+ "lstrip": false,
574
+ "normalized": false,
575
+ "rstrip": false,
576
+ "single_word": false,
577
+ "special": true
578
+ },
579
+ "128072": {
580
+ "content": "<|reserved_special_token_67|>",
581
+ "lstrip": false,
582
+ "normalized": false,
583
+ "rstrip": false,
584
+ "single_word": false,
585
+ "special": true
586
+ },
587
+ "128073": {
588
+ "content": "<|reserved_special_token_68|>",
589
+ "lstrip": false,
590
+ "normalized": false,
591
+ "rstrip": false,
592
+ "single_word": false,
593
+ "special": true
594
+ },
595
+ "128074": {
596
+ "content": "<|reserved_special_token_69|>",
597
+ "lstrip": false,
598
+ "normalized": false,
599
+ "rstrip": false,
600
+ "single_word": false,
601
+ "special": true
602
+ },
603
+ "128075": {
604
+ "content": "<|reserved_special_token_70|>",
605
+ "lstrip": false,
606
+ "normalized": false,
607
+ "rstrip": false,
608
+ "single_word": false,
609
+ "special": true
610
+ },
611
+ "128076": {
612
+ "content": "<|reserved_special_token_71|>",
613
+ "lstrip": false,
614
+ "normalized": false,
615
+ "rstrip": false,
616
+ "single_word": false,
617
+ "special": true
618
+ },
619
+ "128077": {
620
+ "content": "<|reserved_special_token_72|>",
621
+ "lstrip": false,
622
+ "normalized": false,
623
+ "rstrip": false,
624
+ "single_word": false,
625
+ "special": true
626
+ },
627
+ "128078": {
628
+ "content": "<|reserved_special_token_73|>",
629
+ "lstrip": false,
630
+ "normalized": false,
631
+ "rstrip": false,
632
+ "single_word": false,
633
+ "special": true
634
+ },
635
+ "128079": {
636
+ "content": "<|reserved_special_token_74|>",
637
+ "lstrip": false,
638
+ "normalized": false,
639
+ "rstrip": false,
640
+ "single_word": false,
641
+ "special": true
642
+ },
643
+ "128080": {
644
+ "content": "<|reserved_special_token_75|>",
645
+ "lstrip": false,
646
+ "normalized": false,
647
+ "rstrip": false,
648
+ "single_word": false,
649
+ "special": true
650
+ },
651
+ "128081": {
652
+ "content": "<|reserved_special_token_76|>",
653
+ "lstrip": false,
654
+ "normalized": false,
655
+ "rstrip": false,
656
+ "single_word": false,
657
+ "special": true
658
+ },
659
+ "128082": {
660
+ "content": "<|reserved_special_token_77|>",
661
+ "lstrip": false,
662
+ "normalized": false,
663
+ "rstrip": false,
664
+ "single_word": false,
665
+ "special": true
666
+ },
667
+ "128083": {
668
+ "content": "<|reserved_special_token_78|>",
669
+ "lstrip": false,
670
+ "normalized": false,
671
+ "rstrip": false,
672
+ "single_word": false,
673
+ "special": true
674
+ },
675
+ "128084": {
676
+ "content": "<|reserved_special_token_79|>",
677
+ "lstrip": false,
678
+ "normalized": false,
679
+ "rstrip": false,
680
+ "single_word": false,
681
+ "special": true
682
+ },
683
+ "128085": {
684
+ "content": "<|reserved_special_token_80|>",
685
+ "lstrip": false,
686
+ "normalized": false,
687
+ "rstrip": false,
688
+ "single_word": false,
689
+ "special": true
690
+ },
691
+ "128086": {
692
+ "content": "<|reserved_special_token_81|>",
693
+ "lstrip": false,
694
+ "normalized": false,
695
+ "rstrip": false,
696
+ "single_word": false,
697
+ "special": true
698
+ },
699
+ "128087": {
700
+ "content": "<|reserved_special_token_82|>",
701
+ "lstrip": false,
702
+ "normalized": false,
703
+ "rstrip": false,
704
+ "single_word": false,
705
+ "special": true
706
+ },
707
+ "128088": {
708
+ "content": "<|reserved_special_token_83|>",
709
+ "lstrip": false,
710
+ "normalized": false,
711
+ "rstrip": false,
712
+ "single_word": false,
713
+ "special": true
714
+ },
715
+ "128089": {
716
+ "content": "<|reserved_special_token_84|>",
717
+ "lstrip": false,
718
+ "normalized": false,
719
+ "rstrip": false,
720
+ "single_word": false,
721
+ "special": true
722
+ },
723
+ "128090": {
724
+ "content": "<|reserved_special_token_85|>",
725
+ "lstrip": false,
726
+ "normalized": false,
727
+ "rstrip": false,
728
+ "single_word": false,
729
+ "special": true
730
+ },
731
+ "128091": {
732
+ "content": "<|reserved_special_token_86|>",
733
+ "lstrip": false,
734
+ "normalized": false,
735
+ "rstrip": false,
736
+ "single_word": false,
737
+ "special": true
738
+ },
739
+ "128092": {
740
+ "content": "<|reserved_special_token_87|>",
741
+ "lstrip": false,
742
+ "normalized": false,
743
+ "rstrip": false,
744
+ "single_word": false,
745
+ "special": true
746
+ },
747
+ "128093": {
748
+ "content": "<|reserved_special_token_88|>",
749
+ "lstrip": false,
750
+ "normalized": false,
751
+ "rstrip": false,
752
+ "single_word": false,
753
+ "special": true
754
+ },
755
+ "128094": {
756
+ "content": "<|reserved_special_token_89|>",
757
+ "lstrip": false,
758
+ "normalized": false,
759
+ "rstrip": false,
760
+ "single_word": false,
761
+ "special": true
762
+ },
763
+ "128095": {
764
+ "content": "<|reserved_special_token_90|>",
765
+ "lstrip": false,
766
+ "normalized": false,
767
+ "rstrip": false,
768
+ "single_word": false,
769
+ "special": true
770
+ },
771
+ "128096": {
772
+ "content": "<|reserved_special_token_91|>",
773
+ "lstrip": false,
774
+ "normalized": false,
775
+ "rstrip": false,
776
+ "single_word": false,
777
+ "special": true
778
+ },
779
+ "128097": {
780
+ "content": "<|reserved_special_token_92|>",
781
+ "lstrip": false,
782
+ "normalized": false,
783
+ "rstrip": false,
784
+ "single_word": false,
785
+ "special": true
786
+ },
787
+ "128098": {
788
+ "content": "<|reserved_special_token_93|>",
789
+ "lstrip": false,
790
+ "normalized": false,
791
+ "rstrip": false,
792
+ "single_word": false,
793
+ "special": true
794
+ },
795
+ "128099": {
796
+ "content": "<|reserved_special_token_94|>",
797
+ "lstrip": false,
798
+ "normalized": false,
799
+ "rstrip": false,
800
+ "single_word": false,
801
+ "special": true
802
+ },
803
+ "128100": {
804
+ "content": "<|reserved_special_token_95|>",
805
+ "lstrip": false,
806
+ "normalized": false,
807
+ "rstrip": false,
808
+ "single_word": false,
809
+ "special": true
810
+ },
811
+ "128101": {
812
+ "content": "<|reserved_special_token_96|>",
813
+ "lstrip": false,
814
+ "normalized": false,
815
+ "rstrip": false,
816
+ "single_word": false,
817
+ "special": true
818
+ },
819
+ "128102": {
820
+ "content": "<|reserved_special_token_97|>",
821
+ "lstrip": false,
822
+ "normalized": false,
823
+ "rstrip": false,
824
+ "single_word": false,
825
+ "special": true
826
+ },
827
+ "128103": {
828
+ "content": "<|reserved_special_token_98|>",
829
+ "lstrip": false,
830
+ "normalized": false,
831
+ "rstrip": false,
832
+ "single_word": false,
833
+ "special": true
834
+ },
835
+ "128104": {
836
+ "content": "<|reserved_special_token_99|>",
837
+ "lstrip": false,
838
+ "normalized": false,
839
+ "rstrip": false,
840
+ "single_word": false,
841
+ "special": true
842
+ },
843
+ "128105": {
844
+ "content": "<|reserved_special_token_100|>",
845
+ "lstrip": false,
846
+ "normalized": false,
847
+ "rstrip": false,
848
+ "single_word": false,
849
+ "special": true
850
+ },
851
+ "128106": {
852
+ "content": "<|reserved_special_token_101|>",
853
+ "lstrip": false,
854
+ "normalized": false,
855
+ "rstrip": false,
856
+ "single_word": false,
857
+ "special": true
858
+ },
859
+ "128107": {
860
+ "content": "<|reserved_special_token_102|>",
861
+ "lstrip": false,
862
+ "normalized": false,
863
+ "rstrip": false,
864
+ "single_word": false,
865
+ "special": true
866
+ },
867
+ "128108": {
868
+ "content": "<|reserved_special_token_103|>",
869
+ "lstrip": false,
870
+ "normalized": false,
871
+ "rstrip": false,
872
+ "single_word": false,
873
+ "special": true
874
+ },
875
+ "128109": {
876
+ "content": "<|reserved_special_token_104|>",
877
+ "lstrip": false,
878
+ "normalized": false,
879
+ "rstrip": false,
880
+ "single_word": false,
881
+ "special": true
882
+ },
883
+ "128110": {
884
+ "content": "<|reserved_special_token_105|>",
885
+ "lstrip": false,
886
+ "normalized": false,
887
+ "rstrip": false,
888
+ "single_word": false,
889
+ "special": true
890
+ },
891
+ "128111": {
892
+ "content": "<|reserved_special_token_106|>",
893
+ "lstrip": false,
894
+ "normalized": false,
895
+ "rstrip": false,
896
+ "single_word": false,
897
+ "special": true
898
+ },
899
+ "128112": {
900
+ "content": "<|reserved_special_token_107|>",
901
+ "lstrip": false,
902
+ "normalized": false,
903
+ "rstrip": false,
904
+ "single_word": false,
905
+ "special": true
906
+ },
907
+ "128113": {
908
+ "content": "<|reserved_special_token_108|>",
909
+ "lstrip": false,
910
+ "normalized": false,
911
+ "rstrip": false,
912
+ "single_word": false,
913
+ "special": true
914
+ },
915
+ "128114": {
916
+ "content": "<|reserved_special_token_109|>",
917
+ "lstrip": false,
918
+ "normalized": false,
919
+ "rstrip": false,
920
+ "single_word": false,
921
+ "special": true
922
+ },
923
+ "128115": {
924
+ "content": "<|reserved_special_token_110|>",
925
+ "lstrip": false,
926
+ "normalized": false,
927
+ "rstrip": false,
928
+ "single_word": false,
929
+ "special": true
930
+ },
931
+ "128116": {
932
+ "content": "<|reserved_special_token_111|>",
933
+ "lstrip": false,
934
+ "normalized": false,
935
+ "rstrip": false,
936
+ "single_word": false,
937
+ "special": true
938
+ },
939
+ "128117": {
940
+ "content": "<|reserved_special_token_112|>",
941
+ "lstrip": false,
942
+ "normalized": false,
943
+ "rstrip": false,
944
+ "single_word": false,
945
+ "special": true
946
+ },
947
+ "128118": {
948
+ "content": "<|reserved_special_token_113|>",
949
+ "lstrip": false,
950
+ "normalized": false,
951
+ "rstrip": false,
952
+ "single_word": false,
953
+ "special": true
954
+ },
955
+ "128119": {
956
+ "content": "<|reserved_special_token_114|>",
957
+ "lstrip": false,
958
+ "normalized": false,
959
+ "rstrip": false,
960
+ "single_word": false,
961
+ "special": true
962
+ },
963
+ "128120": {
964
+ "content": "<|reserved_special_token_115|>",
965
+ "lstrip": false,
966
+ "normalized": false,
967
+ "rstrip": false,
968
+ "single_word": false,
969
+ "special": true
970
+ },
971
+ "128121": {
972
+ "content": "<|reserved_special_token_116|>",
973
+ "lstrip": false,
974
+ "normalized": false,
975
+ "rstrip": false,
976
+ "single_word": false,
977
+ "special": true
978
+ },
979
+ "128122": {
980
+ "content": "<|reserved_special_token_117|>",
981
+ "lstrip": false,
982
+ "normalized": false,
983
+ "rstrip": false,
984
+ "single_word": false,
985
+ "special": true
986
+ },
987
+ "128123": {
988
+ "content": "<|reserved_special_token_118|>",
989
+ "lstrip": false,
990
+ "normalized": false,
991
+ "rstrip": false,
992
+ "single_word": false,
993
+ "special": true
994
+ },
995
+ "128124": {
996
+ "content": "<|reserved_special_token_119|>",
997
+ "lstrip": false,
998
+ "normalized": false,
999
+ "rstrip": false,
1000
+ "single_word": false,
1001
+ "special": true
1002
+ },
1003
+ "128125": {
1004
+ "content": "<|reserved_special_token_120|>",
1005
+ "lstrip": false,
1006
+ "normalized": false,
1007
+ "rstrip": false,
1008
+ "single_word": false,
1009
+ "special": true
1010
+ },
1011
+ "128126": {
1012
+ "content": "<|reserved_special_token_121|>",
1013
+ "lstrip": false,
1014
+ "normalized": false,
1015
+ "rstrip": false,
1016
+ "single_word": false,
1017
+ "special": true
1018
+ },
1019
+ "128127": {
1020
+ "content": "<|reserved_special_token_122|>",
1021
+ "lstrip": false,
1022
+ "normalized": false,
1023
+ "rstrip": false,
1024
+ "single_word": false,
1025
+ "special": true
1026
+ },
1027
+ "128128": {
1028
+ "content": "<|reserved_special_token_123|>",
1029
+ "lstrip": false,
1030
+ "normalized": false,
1031
+ "rstrip": false,
1032
+ "single_word": false,
1033
+ "special": true
1034
+ },
1035
+ "128129": {
1036
+ "content": "<|reserved_special_token_124|>",
1037
+ "lstrip": false,
1038
+ "normalized": false,
1039
+ "rstrip": false,
1040
+ "single_word": false,
1041
+ "special": true
1042
+ },
1043
+ "128130": {
1044
+ "content": "<|reserved_special_token_125|>",
1045
+ "lstrip": false,
1046
+ "normalized": false,
1047
+ "rstrip": false,
1048
+ "single_word": false,
1049
+ "special": true
1050
+ },
1051
+ "128131": {
1052
+ "content": "<|reserved_special_token_126|>",
1053
+ "lstrip": false,
1054
+ "normalized": false,
1055
+ "rstrip": false,
1056
+ "single_word": false,
1057
+ "special": true
1058
+ },
1059
+ "128132": {
1060
+ "content": "<|reserved_special_token_127|>",
1061
+ "lstrip": false,
1062
+ "normalized": false,
1063
+ "rstrip": false,
1064
+ "single_word": false,
1065
+ "special": true
1066
+ },
1067
+ "128133": {
1068
+ "content": "<|reserved_special_token_128|>",
1069
+ "lstrip": false,
1070
+ "normalized": false,
1071
+ "rstrip": false,
1072
+ "single_word": false,
1073
+ "special": true
1074
+ },
1075
+ "128134": {
1076
+ "content": "<|reserved_special_token_129|>",
1077
+ "lstrip": false,
1078
+ "normalized": false,
1079
+ "rstrip": false,
1080
+ "single_word": false,
1081
+ "special": true
1082
+ },
1083
+ "128135": {
1084
+ "content": "<|reserved_special_token_130|>",
1085
+ "lstrip": false,
1086
+ "normalized": false,
1087
+ "rstrip": false,
1088
+ "single_word": false,
1089
+ "special": true
1090
+ },
1091
+ "128136": {
1092
+ "content": "<|reserved_special_token_131|>",
1093
+ "lstrip": false,
1094
+ "normalized": false,
1095
+ "rstrip": false,
1096
+ "single_word": false,
1097
+ "special": true
1098
+ },
1099
+ "128137": {
1100
+ "content": "<|reserved_special_token_132|>",
1101
+ "lstrip": false,
1102
+ "normalized": false,
1103
+ "rstrip": false,
1104
+ "single_word": false,
1105
+ "special": true
1106
+ },
1107
+ "128138": {
1108
+ "content": "<|reserved_special_token_133|>",
1109
+ "lstrip": false,
1110
+ "normalized": false,
1111
+ "rstrip": false,
1112
+ "single_word": false,
1113
+ "special": true
1114
+ },
1115
+ "128139": {
1116
+ "content": "<|reserved_special_token_134|>",
1117
+ "lstrip": false,
1118
+ "normalized": false,
1119
+ "rstrip": false,
1120
+ "single_word": false,
1121
+ "special": true
1122
+ },
1123
+ "128140": {
1124
+ "content": "<|reserved_special_token_135|>",
1125
+ "lstrip": false,
1126
+ "normalized": false,
1127
+ "rstrip": false,
1128
+ "single_word": false,
1129
+ "special": true
1130
+ },
1131
+ "128141": {
1132
+ "content": "<|reserved_special_token_136|>",
1133
+ "lstrip": false,
1134
+ "normalized": false,
1135
+ "rstrip": false,
1136
+ "single_word": false,
1137
+ "special": true
1138
+ },
1139
+ "128142": {
1140
+ "content": "<|reserved_special_token_137|>",
1141
+ "lstrip": false,
1142
+ "normalized": false,
1143
+ "rstrip": false,
1144
+ "single_word": false,
1145
+ "special": true
1146
+ },
1147
+ "128143": {
1148
+ "content": "<|reserved_special_token_138|>",
1149
+ "lstrip": false,
1150
+ "normalized": false,
1151
+ "rstrip": false,
1152
+ "single_word": false,
1153
+ "special": true
1154
+ },
1155
+ "128144": {
1156
+ "content": "<|reserved_special_token_139|>",
1157
+ "lstrip": false,
1158
+ "normalized": false,
1159
+ "rstrip": false,
1160
+ "single_word": false,
1161
+ "special": true
1162
+ },
1163
+ "128145": {
1164
+ "content": "<|reserved_special_token_140|>",
1165
+ "lstrip": false,
1166
+ "normalized": false,
1167
+ "rstrip": false,
1168
+ "single_word": false,
1169
+ "special": true
1170
+ },
1171
+ "128146": {
1172
+ "content": "<|reserved_special_token_141|>",
1173
+ "lstrip": false,
1174
+ "normalized": false,
1175
+ "rstrip": false,
1176
+ "single_word": false,
1177
+ "special": true
1178
+ },
1179
+ "128147": {
1180
+ "content": "<|reserved_special_token_142|>",
1181
+ "lstrip": false,
1182
+ "normalized": false,
1183
+ "rstrip": false,
1184
+ "single_word": false,
1185
+ "special": true
1186
+ },
1187
+ "128148": {
1188
+ "content": "<|reserved_special_token_143|>",
1189
+ "lstrip": false,
1190
+ "normalized": false,
1191
+ "rstrip": false,
1192
+ "single_word": false,
1193
+ "special": true
1194
+ },
1195
+ "128149": {
1196
+ "content": "<|reserved_special_token_144|>",
1197
+ "lstrip": false,
1198
+ "normalized": false,
1199
+ "rstrip": false,
1200
+ "single_word": false,
1201
+ "special": true
1202
+ },
1203
+ "128150": {
1204
+ "content": "<|reserved_special_token_145|>",
1205
+ "lstrip": false,
1206
+ "normalized": false,
1207
+ "rstrip": false,
1208
+ "single_word": false,
1209
+ "special": true
1210
+ },
1211
+ "128151": {
1212
+ "content": "<|reserved_special_token_146|>",
1213
+ "lstrip": false,
1214
+ "normalized": false,
1215
+ "rstrip": false,
1216
+ "single_word": false,
1217
+ "special": true
1218
+ },
1219
+ "128152": {
1220
+ "content": "<|reserved_special_token_147|>",
1221
+ "lstrip": false,
1222
+ "normalized": false,
1223
+ "rstrip": false,
1224
+ "single_word": false,
1225
+ "special": true
1226
+ },
1227
+ "128153": {
1228
+ "content": "<|reserved_special_token_148|>",
1229
+ "lstrip": false,
1230
+ "normalized": false,
1231
+ "rstrip": false,
1232
+ "single_word": false,
1233
+ "special": true
1234
+ },
1235
+ "128154": {
1236
+ "content": "<|reserved_special_token_149|>",
1237
+ "lstrip": false,
1238
+ "normalized": false,
1239
+ "rstrip": false,
1240
+ "single_word": false,
1241
+ "special": true
1242
+ },
1243
+ "128155": {
1244
+ "content": "<|reserved_special_token_150|>",
1245
+ "lstrip": false,
1246
+ "normalized": false,
1247
+ "rstrip": false,
1248
+ "single_word": false,
1249
+ "special": true
1250
+ },
1251
+ "128156": {
1252
+ "content": "<|reserved_special_token_151|>",
1253
+ "lstrip": false,
1254
+ "normalized": false,
1255
+ "rstrip": false,
1256
+ "single_word": false,
1257
+ "special": true
1258
+ },
1259
+ "128157": {
1260
+ "content": "<|reserved_special_token_152|>",
1261
+ "lstrip": false,
1262
+ "normalized": false,
1263
+ "rstrip": false,
1264
+ "single_word": false,
1265
+ "special": true
1266
+ },
1267
+ "128158": {
1268
+ "content": "<|reserved_special_token_153|>",
1269
+ "lstrip": false,
1270
+ "normalized": false,
1271
+ "rstrip": false,
1272
+ "single_word": false,
1273
+ "special": true
1274
+ },
1275
+ "128159": {
1276
+ "content": "<|reserved_special_token_154|>",
1277
+ "lstrip": false,
1278
+ "normalized": false,
1279
+ "rstrip": false,
1280
+ "single_word": false,
1281
+ "special": true
1282
+ },
1283
+ "128160": {
1284
+ "content": "<|reserved_special_token_155|>",
1285
+ "lstrip": false,
1286
+ "normalized": false,
1287
+ "rstrip": false,
1288
+ "single_word": false,
1289
+ "special": true
1290
+ },
1291
+ "128161": {
1292
+ "content": "<|reserved_special_token_156|>",
1293
+ "lstrip": false,
1294
+ "normalized": false,
1295
+ "rstrip": false,
1296
+ "single_word": false,
1297
+ "special": true
1298
+ },
1299
+ "128162": {
1300
+ "content": "<|reserved_special_token_157|>",
1301
+ "lstrip": false,
1302
+ "normalized": false,
1303
+ "rstrip": false,
1304
+ "single_word": false,
1305
+ "special": true
1306
+ },
1307
+ "128163": {
1308
+ "content": "<|reserved_special_token_158|>",
1309
+ "lstrip": false,
1310
+ "normalized": false,
1311
+ "rstrip": false,
1312
+ "single_word": false,
1313
+ "special": true
1314
+ },
1315
+ "128164": {
1316
+ "content": "<|reserved_special_token_159|>",
1317
+ "lstrip": false,
1318
+ "normalized": false,
1319
+ "rstrip": false,
1320
+ "single_word": false,
1321
+ "special": true
1322
+ },
1323
+ "128165": {
1324
+ "content": "<|reserved_special_token_160|>",
1325
+ "lstrip": false,
1326
+ "normalized": false,
1327
+ "rstrip": false,
1328
+ "single_word": false,
1329
+ "special": true
1330
+ },
1331
+ "128166": {
1332
+ "content": "<|reserved_special_token_161|>",
1333
+ "lstrip": false,
1334
+ "normalized": false,
1335
+ "rstrip": false,
1336
+ "single_word": false,
1337
+ "special": true
1338
+ },
1339
+ "128167": {
1340
+ "content": "<|reserved_special_token_162|>",
1341
+ "lstrip": false,
1342
+ "normalized": false,
1343
+ "rstrip": false,
1344
+ "single_word": false,
1345
+ "special": true
1346
+ },
1347
+ "128168": {
1348
+ "content": "<|reserved_special_token_163|>",
1349
+ "lstrip": false,
1350
+ "normalized": false,
1351
+ "rstrip": false,
1352
+ "single_word": false,
1353
+ "special": true
1354
+ },
1355
+ "128169": {
1356
+ "content": "<|reserved_special_token_164|>",
1357
+ "lstrip": false,
1358
+ "normalized": false,
1359
+ "rstrip": false,
1360
+ "single_word": false,
1361
+ "special": true
1362
+ },
1363
+ "128170": {
1364
+ "content": "<|reserved_special_token_165|>",
1365
+ "lstrip": false,
1366
+ "normalized": false,
1367
+ "rstrip": false,
1368
+ "single_word": false,
1369
+ "special": true
1370
+ },
1371
+ "128171": {
1372
+ "content": "<|reserved_special_token_166|>",
1373
+ "lstrip": false,
1374
+ "normalized": false,
1375
+ "rstrip": false,
1376
+ "single_word": false,
1377
+ "special": true
1378
+ },
1379
+ "128172": {
1380
+ "content": "<|reserved_special_token_167|>",
1381
+ "lstrip": false,
1382
+ "normalized": false,
1383
+ "rstrip": false,
1384
+ "single_word": false,
1385
+ "special": true
1386
+ },
1387
+ "128173": {
1388
+ "content": "<|reserved_special_token_168|>",
1389
+ "lstrip": false,
1390
+ "normalized": false,
1391
+ "rstrip": false,
1392
+ "single_word": false,
1393
+ "special": true
1394
+ },
1395
+ "128174": {
1396
+ "content": "<|reserved_special_token_169|>",
1397
+ "lstrip": false,
1398
+ "normalized": false,
1399
+ "rstrip": false,
1400
+ "single_word": false,
1401
+ "special": true
1402
+ },
1403
+ "128175": {
1404
+ "content": "<|reserved_special_token_170|>",
1405
+ "lstrip": false,
1406
+ "normalized": false,
1407
+ "rstrip": false,
1408
+ "single_word": false,
1409
+ "special": true
1410
+ },
1411
+ "128176": {
1412
+ "content": "<|reserved_special_token_171|>",
1413
+ "lstrip": false,
1414
+ "normalized": false,
1415
+ "rstrip": false,
1416
+ "single_word": false,
1417
+ "special": true
1418
+ },
1419
+ "128177": {
1420
+ "content": "<|reserved_special_token_172|>",
1421
+ "lstrip": false,
1422
+ "normalized": false,
1423
+ "rstrip": false,
1424
+ "single_word": false,
1425
+ "special": true
1426
+ },
1427
+ "128178": {
1428
+ "content": "<|reserved_special_token_173|>",
1429
+ "lstrip": false,
1430
+ "normalized": false,
1431
+ "rstrip": false,
1432
+ "single_word": false,
1433
+ "special": true
1434
+ },
1435
+ "128179": {
1436
+ "content": "<|reserved_special_token_174|>",
1437
+ "lstrip": false,
1438
+ "normalized": false,
1439
+ "rstrip": false,
1440
+ "single_word": false,
1441
+ "special": true
1442
+ },
1443
+ "128180": {
1444
+ "content": "<|reserved_special_token_175|>",
1445
+ "lstrip": false,
1446
+ "normalized": false,
1447
+ "rstrip": false,
1448
+ "single_word": false,
1449
+ "special": true
1450
+ },
1451
+ "128181": {
1452
+ "content": "<|reserved_special_token_176|>",
1453
+ "lstrip": false,
1454
+ "normalized": false,
1455
+ "rstrip": false,
1456
+ "single_word": false,
1457
+ "special": true
1458
+ },
1459
+ "128182": {
1460
+ "content": "<|reserved_special_token_177|>",
1461
+ "lstrip": false,
1462
+ "normalized": false,
1463
+ "rstrip": false,
1464
+ "single_word": false,
1465
+ "special": true
1466
+ },
1467
+ "128183": {
1468
+ "content": "<|reserved_special_token_178|>",
1469
+ "lstrip": false,
1470
+ "normalized": false,
1471
+ "rstrip": false,
1472
+ "single_word": false,
1473
+ "special": true
1474
+ },
1475
+ "128184": {
1476
+ "content": "<|reserved_special_token_179|>",
1477
+ "lstrip": false,
1478
+ "normalized": false,
1479
+ "rstrip": false,
1480
+ "single_word": false,
1481
+ "special": true
1482
+ },
1483
+ "128185": {
1484
+ "content": "<|reserved_special_token_180|>",
1485
+ "lstrip": false,
1486
+ "normalized": false,
1487
+ "rstrip": false,
1488
+ "single_word": false,
1489
+ "special": true
1490
+ },
1491
+ "128186": {
1492
+ "content": "<|reserved_special_token_181|>",
1493
+ "lstrip": false,
1494
+ "normalized": false,
1495
+ "rstrip": false,
1496
+ "single_word": false,
1497
+ "special": true
1498
+ },
1499
+ "128187": {
1500
+ "content": "<|reserved_special_token_182|>",
1501
+ "lstrip": false,
1502
+ "normalized": false,
1503
+ "rstrip": false,
1504
+ "single_word": false,
1505
+ "special": true
1506
+ },
1507
+ "128188": {
1508
+ "content": "<|reserved_special_token_183|>",
1509
+ "lstrip": false,
1510
+ "normalized": false,
1511
+ "rstrip": false,
1512
+ "single_word": false,
1513
+ "special": true
1514
+ },
1515
+ "128189": {
1516
+ "content": "<|reserved_special_token_184|>",
1517
+ "lstrip": false,
1518
+ "normalized": false,
1519
+ "rstrip": false,
1520
+ "single_word": false,
1521
+ "special": true
1522
+ },
1523
+ "128190": {
1524
+ "content": "<|reserved_special_token_185|>",
1525
+ "lstrip": false,
1526
+ "normalized": false,
1527
+ "rstrip": false,
1528
+ "single_word": false,
1529
+ "special": true
1530
+ },
1531
+ "128191": {
1532
+ "content": "<|reserved_special_token_186|>",
1533
+ "lstrip": false,
1534
+ "normalized": false,
1535
+ "rstrip": false,
1536
+ "single_word": false,
1537
+ "special": true
1538
+ },
1539
+ "128192": {
1540
+ "content": "<|reserved_special_token_187|>",
1541
+ "lstrip": false,
1542
+ "normalized": false,
1543
+ "rstrip": false,
1544
+ "single_word": false,
1545
+ "special": true
1546
+ },
1547
+ "128193": {
1548
+ "content": "<|reserved_special_token_188|>",
1549
+ "lstrip": false,
1550
+ "normalized": false,
1551
+ "rstrip": false,
1552
+ "single_word": false,
1553
+ "special": true
1554
+ },
1555
+ "128194": {
1556
+ "content": "<|reserved_special_token_189|>",
1557
+ "lstrip": false,
1558
+ "normalized": false,
1559
+ "rstrip": false,
1560
+ "single_word": false,
1561
+ "special": true
1562
+ },
1563
+ "128195": {
1564
+ "content": "<|reserved_special_token_190|>",
1565
+ "lstrip": false,
1566
+ "normalized": false,
1567
+ "rstrip": false,
1568
+ "single_word": false,
1569
+ "special": true
1570
+ },
1571
+ "128196": {
1572
+ "content": "<|reserved_special_token_191|>",
1573
+ "lstrip": false,
1574
+ "normalized": false,
1575
+ "rstrip": false,
1576
+ "single_word": false,
1577
+ "special": true
1578
+ },
1579
+ "128197": {
1580
+ "content": "<|reserved_special_token_192|>",
1581
+ "lstrip": false,
1582
+ "normalized": false,
1583
+ "rstrip": false,
1584
+ "single_word": false,
1585
+ "special": true
1586
+ },
1587
+ "128198": {
1588
+ "content": "<|reserved_special_token_193|>",
1589
+ "lstrip": false,
1590
+ "normalized": false,
1591
+ "rstrip": false,
1592
+ "single_word": false,
1593
+ "special": true
1594
+ },
1595
+ "128199": {
1596
+ "content": "<|reserved_special_token_194|>",
1597
+ "lstrip": false,
1598
+ "normalized": false,
1599
+ "rstrip": false,
1600
+ "single_word": false,
1601
+ "special": true
1602
+ },
1603
+ "128200": {
1604
+ "content": "<|reserved_special_token_195|>",
1605
+ "lstrip": false,
1606
+ "normalized": false,
1607
+ "rstrip": false,
1608
+ "single_word": false,
1609
+ "special": true
1610
+ },
1611
+ "128201": {
1612
+ "content": "<|reserved_special_token_196|>",
1613
+ "lstrip": false,
1614
+ "normalized": false,
1615
+ "rstrip": false,
1616
+ "single_word": false,
1617
+ "special": true
1618
+ },
1619
+ "128202": {
1620
+ "content": "<|reserved_special_token_197|>",
1621
+ "lstrip": false,
1622
+ "normalized": false,
1623
+ "rstrip": false,
1624
+ "single_word": false,
1625
+ "special": true
1626
+ },
1627
+ "128203": {
1628
+ "content": "<|reserved_special_token_198|>",
1629
+ "lstrip": false,
1630
+ "normalized": false,
1631
+ "rstrip": false,
1632
+ "single_word": false,
1633
+ "special": true
1634
+ },
1635
+ "128204": {
1636
+ "content": "<|reserved_special_token_199|>",
1637
+ "lstrip": false,
1638
+ "normalized": false,
1639
+ "rstrip": false,
1640
+ "single_word": false,
1641
+ "special": true
1642
+ },
1643
+ "128205": {
1644
+ "content": "<|reserved_special_token_200|>",
1645
+ "lstrip": false,
1646
+ "normalized": false,
1647
+ "rstrip": false,
1648
+ "single_word": false,
1649
+ "special": true
1650
+ },
1651
+ "128206": {
1652
+ "content": "<|reserved_special_token_201|>",
1653
+ "lstrip": false,
1654
+ "normalized": false,
1655
+ "rstrip": false,
1656
+ "single_word": false,
1657
+ "special": true
1658
+ },
1659
+ "128207": {
1660
+ "content": "<|reserved_special_token_202|>",
1661
+ "lstrip": false,
1662
+ "normalized": false,
1663
+ "rstrip": false,
1664
+ "single_word": false,
1665
+ "special": true
1666
+ },
1667
+ "128208": {
1668
+ "content": "<|reserved_special_token_203|>",
1669
+ "lstrip": false,
1670
+ "normalized": false,
1671
+ "rstrip": false,
1672
+ "single_word": false,
1673
+ "special": true
1674
+ },
1675
+ "128209": {
1676
+ "content": "<|reserved_special_token_204|>",
1677
+ "lstrip": false,
1678
+ "normalized": false,
1679
+ "rstrip": false,
1680
+ "single_word": false,
1681
+ "special": true
1682
+ },
1683
+ "128210": {
1684
+ "content": "<|reserved_special_token_205|>",
1685
+ "lstrip": false,
1686
+ "normalized": false,
1687
+ "rstrip": false,
1688
+ "single_word": false,
1689
+ "special": true
1690
+ },
1691
+ "128211": {
1692
+ "content": "<|reserved_special_token_206|>",
1693
+ "lstrip": false,
1694
+ "normalized": false,
1695
+ "rstrip": false,
1696
+ "single_word": false,
1697
+ "special": true
1698
+ },
1699
+ "128212": {
1700
+ "content": "<|reserved_special_token_207|>",
1701
+ "lstrip": false,
1702
+ "normalized": false,
1703
+ "rstrip": false,
1704
+ "single_word": false,
1705
+ "special": true
1706
+ },
1707
+ "128213": {
1708
+ "content": "<|reserved_special_token_208|>",
1709
+ "lstrip": false,
1710
+ "normalized": false,
1711
+ "rstrip": false,
1712
+ "single_word": false,
1713
+ "special": true
1714
+ },
1715
+ "128214": {
1716
+ "content": "<|reserved_special_token_209|>",
1717
+ "lstrip": false,
1718
+ "normalized": false,
1719
+ "rstrip": false,
1720
+ "single_word": false,
1721
+ "special": true
1722
+ },
1723
+ "128215": {
1724
+ "content": "<|reserved_special_token_210|>",
1725
+ "lstrip": false,
1726
+ "normalized": false,
1727
+ "rstrip": false,
1728
+ "single_word": false,
1729
+ "special": true
1730
+ },
1731
+ "128216": {
1732
+ "content": "<|reserved_special_token_211|>",
1733
+ "lstrip": false,
1734
+ "normalized": false,
1735
+ "rstrip": false,
1736
+ "single_word": false,
1737
+ "special": true
1738
+ },
1739
+ "128217": {
1740
+ "content": "<|reserved_special_token_212|>",
1741
+ "lstrip": false,
1742
+ "normalized": false,
1743
+ "rstrip": false,
1744
+ "single_word": false,
1745
+ "special": true
1746
+ },
1747
+ "128218": {
1748
+ "content": "<|reserved_special_token_213|>",
1749
+ "lstrip": false,
1750
+ "normalized": false,
1751
+ "rstrip": false,
1752
+ "single_word": false,
1753
+ "special": true
1754
+ },
1755
+ "128219": {
1756
+ "content": "<|reserved_special_token_214|>",
1757
+ "lstrip": false,
1758
+ "normalized": false,
1759
+ "rstrip": false,
1760
+ "single_word": false,
1761
+ "special": true
1762
+ },
1763
+ "128220": {
1764
+ "content": "<|reserved_special_token_215|>",
1765
+ "lstrip": false,
1766
+ "normalized": false,
1767
+ "rstrip": false,
1768
+ "single_word": false,
1769
+ "special": true
1770
+ },
1771
+ "128221": {
1772
+ "content": "<|reserved_special_token_216|>",
1773
+ "lstrip": false,
1774
+ "normalized": false,
1775
+ "rstrip": false,
1776
+ "single_word": false,
1777
+ "special": true
1778
+ },
1779
+ "128222": {
1780
+ "content": "<|reserved_special_token_217|>",
1781
+ "lstrip": false,
1782
+ "normalized": false,
1783
+ "rstrip": false,
1784
+ "single_word": false,
1785
+ "special": true
1786
+ },
1787
+ "128223": {
1788
+ "content": "<|reserved_special_token_218|>",
1789
+ "lstrip": false,
1790
+ "normalized": false,
1791
+ "rstrip": false,
1792
+ "single_word": false,
1793
+ "special": true
1794
+ },
1795
+ "128224": {
1796
+ "content": "<|reserved_special_token_219|>",
1797
+ "lstrip": false,
1798
+ "normalized": false,
1799
+ "rstrip": false,
1800
+ "single_word": false,
1801
+ "special": true
1802
+ },
1803
+ "128225": {
1804
+ "content": "<|reserved_special_token_220|>",
1805
+ "lstrip": false,
1806
+ "normalized": false,
1807
+ "rstrip": false,
1808
+ "single_word": false,
1809
+ "special": true
1810
+ },
1811
+ "128226": {
1812
+ "content": "<|reserved_special_token_221|>",
1813
+ "lstrip": false,
1814
+ "normalized": false,
1815
+ "rstrip": false,
1816
+ "single_word": false,
1817
+ "special": true
1818
+ },
1819
+ "128227": {
1820
+ "content": "<|reserved_special_token_222|>",
1821
+ "lstrip": false,
1822
+ "normalized": false,
1823
+ "rstrip": false,
1824
+ "single_word": false,
1825
+ "special": true
1826
+ },
1827
+ "128228": {
1828
+ "content": "<|reserved_special_token_223|>",
1829
+ "lstrip": false,
1830
+ "normalized": false,
1831
+ "rstrip": false,
1832
+ "single_word": false,
1833
+ "special": true
1834
+ },
1835
+ "128229": {
1836
+ "content": "<|reserved_special_token_224|>",
1837
+ "lstrip": false,
1838
+ "normalized": false,
1839
+ "rstrip": false,
1840
+ "single_word": false,
1841
+ "special": true
1842
+ },
1843
+ "128230": {
1844
+ "content": "<|reserved_special_token_225|>",
1845
+ "lstrip": false,
1846
+ "normalized": false,
1847
+ "rstrip": false,
1848
+ "single_word": false,
1849
+ "special": true
1850
+ },
1851
+ "128231": {
1852
+ "content": "<|reserved_special_token_226|>",
1853
+ "lstrip": false,
1854
+ "normalized": false,
1855
+ "rstrip": false,
1856
+ "single_word": false,
1857
+ "special": true
1858
+ },
1859
+ "128232": {
1860
+ "content": "<|reserved_special_token_227|>",
1861
+ "lstrip": false,
1862
+ "normalized": false,
1863
+ "rstrip": false,
1864
+ "single_word": false,
1865
+ "special": true
1866
+ },
1867
+ "128233": {
1868
+ "content": "<|reserved_special_token_228|>",
1869
+ "lstrip": false,
1870
+ "normalized": false,
1871
+ "rstrip": false,
1872
+ "single_word": false,
1873
+ "special": true
1874
+ },
1875
+ "128234": {
1876
+ "content": "<|reserved_special_token_229|>",
1877
+ "lstrip": false,
1878
+ "normalized": false,
1879
+ "rstrip": false,
1880
+ "single_word": false,
1881
+ "special": true
1882
+ },
1883
+ "128235": {
1884
+ "content": "<|reserved_special_token_230|>",
1885
+ "lstrip": false,
1886
+ "normalized": false,
1887
+ "rstrip": false,
1888
+ "single_word": false,
1889
+ "special": true
1890
+ },
1891
+ "128236": {
1892
+ "content": "<|reserved_special_token_231|>",
1893
+ "lstrip": false,
1894
+ "normalized": false,
1895
+ "rstrip": false,
1896
+ "single_word": false,
1897
+ "special": true
1898
+ },
1899
+ "128237": {
1900
+ "content": "<|reserved_special_token_232|>",
1901
+ "lstrip": false,
1902
+ "normalized": false,
1903
+ "rstrip": false,
1904
+ "single_word": false,
1905
+ "special": true
1906
+ },
1907
+ "128238": {
1908
+ "content": "<|reserved_special_token_233|>",
1909
+ "lstrip": false,
1910
+ "normalized": false,
1911
+ "rstrip": false,
1912
+ "single_word": false,
1913
+ "special": true
1914
+ },
1915
+ "128239": {
1916
+ "content": "<|reserved_special_token_234|>",
1917
+ "lstrip": false,
1918
+ "normalized": false,
1919
+ "rstrip": false,
1920
+ "single_word": false,
1921
+ "special": true
1922
+ },
1923
+ "128240": {
1924
+ "content": "<|reserved_special_token_235|>",
1925
+ "lstrip": false,
1926
+ "normalized": false,
1927
+ "rstrip": false,
1928
+ "single_word": false,
1929
+ "special": true
1930
+ },
1931
+ "128241": {
1932
+ "content": "<|reserved_special_token_236|>",
1933
+ "lstrip": false,
1934
+ "normalized": false,
1935
+ "rstrip": false,
1936
+ "single_word": false,
1937
+ "special": true
1938
+ },
1939
+ "128242": {
1940
+ "content": "<|reserved_special_token_237|>",
1941
+ "lstrip": false,
1942
+ "normalized": false,
1943
+ "rstrip": false,
1944
+ "single_word": false,
1945
+ "special": true
1946
+ },
1947
+ "128243": {
1948
+ "content": "<|reserved_special_token_238|>",
1949
+ "lstrip": false,
1950
+ "normalized": false,
1951
+ "rstrip": false,
1952
+ "single_word": false,
1953
+ "special": true
1954
+ },
1955
+ "128244": {
1956
+ "content": "<|reserved_special_token_239|>",
1957
+ "lstrip": false,
1958
+ "normalized": false,
1959
+ "rstrip": false,
1960
+ "single_word": false,
1961
+ "special": true
1962
+ },
1963
+ "128245": {
1964
+ "content": "<|reserved_special_token_240|>",
1965
+ "lstrip": false,
1966
+ "normalized": false,
1967
+ "rstrip": false,
1968
+ "single_word": false,
1969
+ "special": true
1970
+ },
1971
+ "128246": {
1972
+ "content": "<|reserved_special_token_241|>",
1973
+ "lstrip": false,
1974
+ "normalized": false,
1975
+ "rstrip": false,
1976
+ "single_word": false,
1977
+ "special": true
1978
+ },
1979
+ "128247": {
1980
+ "content": "<|reserved_special_token_242|>",
1981
+ "lstrip": false,
1982
+ "normalized": false,
1983
+ "rstrip": false,
1984
+ "single_word": false,
1985
+ "special": true
1986
+ },
1987
+ "128248": {
1988
+ "content": "<|reserved_special_token_243|>",
1989
+ "lstrip": false,
1990
+ "normalized": false,
1991
+ "rstrip": false,
1992
+ "single_word": false,
1993
+ "special": true
1994
+ },
1995
+ "128249": {
1996
+ "content": "<|reserved_special_token_244|>",
1997
+ "lstrip": false,
1998
+ "normalized": false,
1999
+ "rstrip": false,
2000
+ "single_word": false,
2001
+ "special": true
2002
+ },
2003
+ "128250": {
2004
+ "content": "<|reserved_special_token_245|>",
2005
+ "lstrip": false,
2006
+ "normalized": false,
2007
+ "rstrip": false,
2008
+ "single_word": false,
2009
+ "special": true
2010
+ },
2011
+ "128251": {
2012
+ "content": "<|reserved_special_token_246|>",
2013
+ "lstrip": false,
2014
+ "normalized": false,
2015
+ "rstrip": false,
2016
+ "single_word": false,
2017
+ "special": true
2018
+ },
2019
+ "128252": {
2020
+ "content": "<|reserved_special_token_247|>",
2021
+ "lstrip": false,
2022
+ "normalized": false,
2023
+ "rstrip": false,
2024
+ "single_word": false,
2025
+ "special": true
2026
+ },
2027
+ "128253": {
2028
+ "content": "<|reserved_special_token_248|>",
2029
+ "lstrip": false,
2030
+ "normalized": false,
2031
+ "rstrip": false,
2032
+ "single_word": false,
2033
+ "special": true
2034
+ },
2035
+ "128254": {
2036
+ "content": "<|reserved_special_token_249|>",
2037
+ "lstrip": false,
2038
+ "normalized": false,
2039
+ "rstrip": false,
2040
+ "single_word": false,
2041
+ "special": true
2042
+ },
2043
+ "128255": {
2044
+ "content": "<|reserved_special_token_250|>",
2045
+ "lstrip": false,
2046
+ "normalized": false,
2047
+ "rstrip": false,
2048
+ "single_word": false,
2049
+ "special": true
2050
+ }
2051
+ },
2052
+ "bos_token": "<|begin_of_text|>",
2053
+ "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% else %}{{ eos_token }}{% endif %}",
2054
+ "clean_up_tokenization_spaces": true,
2055
+ "eos_token": "<|end_of_text|>",
2056
+ "model_input_names": [
2057
+ "input_ids",
2058
+ "attention_mask"
2059
+ ],
2060
+ "model_max_length": 1000000000000000019884624838656,
2061
+ "tokenizer_class": "PreTrainedTokenizerFast"
2062
+ }
util.py ADDED
@@ -0,0 +1,483 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Optional, Tuple, Union
2
+
3
+ import torch
4
+ from einops import rearrange, repeat
5
+ import torch.nn.functional as F
6
+
7
+ import triton
8
+ import triton.language as tl
9
+
10
+
11
+ # @triton.autotune(
12
+ # configs=[
13
+ # triton.Config({"BLOCK_M": 2}),
14
+ # triton.Config({"BLOCK_M": 4}),
15
+ # triton.Config({"BLOCK_M": 8}),
16
+ # triton.Config({"BLOCK_M": 16}),
17
+ # ],
18
+ # key=["CACHE_KEY_SEQLEN", "BLOCK_K", "INTERLEAVED"],
19
+ # )
20
+ @triton.jit
21
+ def rotary_kernel(
22
+ OUT, # Pointers to matrices
23
+ X,
24
+ COS,
25
+ SIN,
26
+ CU_SEQLENS,
27
+ SEQLEN_OFFSETS, # this could be int or a pointer
28
+ # Matrix dimensions
29
+ seqlen,
30
+ nheads,
31
+ rotary_dim,
32
+ seqlen_ro,
33
+ CACHE_KEY_SEQLEN,
34
+ # strides
35
+ stride_out_batch,
36
+ stride_out_nheads,
37
+ stride_out_seqlen,
38
+ stride_out_headdim,
39
+ stride_x_batch,
40
+ stride_x_nheads,
41
+ stride_x_seqlen,
42
+ stride_x_headdim,
43
+ # Meta-parameters
44
+ BLOCK_K: tl.constexpr,
45
+ IS_SEQLEN_OFFSETS_TENSOR: tl.constexpr,
46
+ IS_VARLEN: tl.constexpr,
47
+ INTERLEAVED: tl.constexpr,
48
+ CONJUGATE: tl.constexpr,
49
+ BLOCK_M: tl.constexpr,
50
+ ):
51
+ pid_m = tl.program_id(axis=0)
52
+ pid_batch = tl.program_id(axis=1)
53
+ pid_head = tl.program_id(axis=2)
54
+ rotary_dim_half = rotary_dim // 2
55
+
56
+ if not IS_VARLEN:
57
+ X = X + pid_batch * stride_x_batch + pid_head * stride_x_nheads
58
+ OUT = OUT + pid_batch * stride_out_batch + pid_head * stride_out_nheads
59
+ COS = COS + pid_batch * seqlen_ro * rotary_dim_half
60
+ SIN = SIN + pid_batch * seqlen_ro * rotary_dim_half
61
+ else:
62
+ start_idx = tl.load(CU_SEQLENS + pid_batch)
63
+ seqlen = tl.load(CU_SEQLENS + pid_batch + 1) - start_idx
64
+ X = X + start_idx * stride_x_seqlen + pid_head * stride_x_nheads
65
+ OUT = OUT + start_idx * stride_out_seqlen + pid_head * stride_out_nheads
66
+
67
+ if pid_m * BLOCK_M >= seqlen:
68
+ return
69
+ rm = pid_m * BLOCK_M + tl.arange(0, BLOCK_M)
70
+ if not IS_SEQLEN_OFFSETS_TENSOR:
71
+ rm_cs = rm + SEQLEN_OFFSETS
72
+ else:
73
+ rm_cs = rm + tl.load(SEQLEN_OFFSETS + pid_batch)
74
+ rk = tl.arange(0, BLOCK_K)
75
+ rk_half = tl.arange(0, BLOCK_K // 2)
76
+
77
+ if not INTERLEAVED:
78
+ # Load the 1st and 2nd halves of X, do calculation, then store to 1st and 2nd halves of OUT
79
+ X = X + (rm[:, None] * stride_x_seqlen + rk_half[None, :] * stride_x_headdim)
80
+ COS = COS + (rm_cs[:, None] * rotary_dim_half + rk_half[None, :])
81
+ SIN = SIN + (rm_cs[:, None] * rotary_dim_half + rk_half[None, :])
82
+ cos = tl.load(
83
+ COS, mask=(rm_cs[:, None] < seqlen_ro) & (rk_half[None, :] < rotary_dim_half), other=1.0
84
+ )
85
+ sin = tl.load(
86
+ SIN, mask=(rm_cs[:, None] < seqlen_ro) & (rk_half[None, :] < rotary_dim_half), other=0.0
87
+ )
88
+ x0 = tl.load(
89
+ X, mask=(rm[:, None] < seqlen) & (rk_half[None, :] < rotary_dim_half), other=0.0
90
+ )
91
+ x1 = tl.load(
92
+ X + rotary_dim_half * stride_x_headdim,
93
+ mask=(rm[:, None] < seqlen) & (rk_half[None, :] < rotary_dim_half),
94
+ other=0.0,
95
+ )
96
+ if CONJUGATE:
97
+ sin = -sin
98
+ o0 = x0 * cos - x1 * sin
99
+ o1 = x0 * sin + x1 * cos
100
+ # write back result
101
+ OUT = OUT + (rm[:, None] * stride_out_seqlen + rk_half[None, :] * stride_out_headdim)
102
+ tl.store(OUT, o0, mask=(rm[:, None] < seqlen) & (rk_half[None, :] < rotary_dim_half))
103
+ tl.store(
104
+ OUT + rotary_dim_half * stride_out_headdim,
105
+ o1,
106
+ mask=(rm[:, None] < seqlen) & (rk_half[None, :] < rotary_dim_half),
107
+ )
108
+ else:
109
+ # We don't want to load X[0, 2, 4, ...] and X[1, 3, 5, ...] separately since both are slow.
110
+ # Instead, we load x0 = X[0, 1, 2, 3, ...] and x1 = X[1, 0, 3, 2, ...].
111
+ # Loading x0 will be fast but x1 will be slow.
112
+ # Then we load cos = COS[0, 0, 1, 1, ...] and sin = SIN[0, 0, 1, 1, ...].
113
+ # Then we do the calculation and use tl.where to pick put the right outputs for the even
114
+ # and for the odd indices.
115
+ rk_swap = rk + ((rk + 1) % 2) * 2 - 1 # 1, 0, 3, 2, 5, 4, ...
116
+ rk_repeat = tl.arange(0, BLOCK_K) // 2
117
+ X0 = X + (rm[:, None] * stride_x_seqlen + rk[None, :] * stride_x_headdim)
118
+ X1 = X + (rm[:, None] * stride_x_seqlen + rk_swap[None, :] * stride_x_headdim)
119
+ COS = COS + (rm_cs[:, None] * rotary_dim_half + rk_repeat[None, :])
120
+ SIN = SIN + (rm_cs[:, None] * rotary_dim_half + rk_repeat[None, :])
121
+ cos = tl.load(
122
+ COS,
123
+ mask=(rm_cs[:, None] < seqlen_ro) & (rk_repeat[None, :] < rotary_dim_half),
124
+ other=1.0,
125
+ ).to(tl.float32)
126
+ sin = tl.load(
127
+ SIN,
128
+ mask=(rm_cs[:, None] < seqlen_ro) & (rk_repeat[None, :] < rotary_dim_half),
129
+ other=0.0,
130
+ ).to(tl.float32)
131
+ x0 = tl.load(X0, mask=(rm[:, None] < seqlen) & (rk[None, :] < rotary_dim), other=0.0).to(
132
+ tl.float32
133
+ )
134
+ x1 = tl.load(
135
+ X1, mask=(rm[:, None] < seqlen) & (rk_swap[None, :] < rotary_dim), other=0.0
136
+ ).to(tl.float32)
137
+ if CONJUGATE:
138
+ sin = -sin
139
+ x0_cos = x0 * cos
140
+ x1_sin = x1 * sin
141
+ out = tl.where(rk[None, :] % 2 == 0, x0_cos - x1_sin, x0_cos + x1_sin)
142
+ OUT = OUT + (rm[:, None] * stride_out_seqlen + rk[None, :] * stride_out_headdim)
143
+ tl.store(OUT, out, mask=(rm[:, None] < seqlen) & (rk[None, :] < rotary_dim))
144
+
145
+
146
+ def apply_rotary(
147
+ x: torch.Tensor,
148
+ cos: torch.Tensor,
149
+ sin: torch.Tensor,
150
+ seqlen_offsets: Union[int, torch.Tensor] = 0,
151
+ cu_seqlens: Optional[torch.Tensor] = None,
152
+ max_seqlen: Optional[int] = None,
153
+ interleaved=False,
154
+ inplace=False,
155
+ conjugate=False,
156
+ ) -> torch.Tensor:
157
+ """
158
+ Arguments:
159
+ x: (batch, seqlen, nheads, headdim) if cu_seqlens is None
160
+ else (total_seqlen, nheads, headdim).
161
+ cos: (seqlen_ro, rotary_dim / 2)
162
+ sin: (seqlen_ro, rotary_dim / 2)
163
+ seqlen_offsets: integer or integer tensor of size (batch,)
164
+ cu_seqlens: (batch + 1,) or None
165
+ max_seqlen: int
166
+ Returns:
167
+ y: (batch, seqlen, nheads, headdim)
168
+ """
169
+
170
+ batch, nheads, seqlen, headdim = x.shape
171
+
172
+ batch_ro, seqlen_ro, rotary_dim = cos.shape
173
+
174
+ assert batch == batch_ro
175
+ assert sin.shape == cos.shape
176
+ rotary_dim *= 2
177
+ assert rotary_dim <= headdim, "rotary_dim must be <= headdim"
178
+ assert headdim <= 256, "Only support headdim <= 256"
179
+
180
+ assert seqlen_ro >= seqlen, "seqlen_ro must be >= seqlen"
181
+
182
+ assert (
183
+ cos.dtype == sin.dtype
184
+ ), f"cos and sin must have the same dtype, got {cos.dtype} and {sin.dtype}"
185
+ assert (
186
+ x.dtype == cos.dtype
187
+ ), f"Input and cos/sin must have the same dtype, got {x.dtype} and {cos.dtype}"
188
+
189
+ cos, sin = cos.contiguous(), sin.contiguous()
190
+ if isinstance(seqlen_offsets, torch.Tensor):
191
+ assert seqlen_offsets.shape == (batch,)
192
+ assert seqlen_offsets.dtype in [torch.int32, torch.int64]
193
+ seqlen_offsets = seqlen_offsets.contiguous()
194
+ else:
195
+ assert seqlen_offsets + seqlen <= seqlen_ro
196
+
197
+ output = torch.empty_like(x) if not inplace else x
198
+ if rotary_dim < headdim and not inplace:
199
+ output[..., rotary_dim:].copy_(x[..., rotary_dim:])
200
+
201
+ BLOCK_K = (
202
+ 32
203
+ if rotary_dim <= 32
204
+ else (64 if rotary_dim <= 64 else (128 if rotary_dim <= 128 else 256))
205
+ )
206
+ grid = lambda META: (triton.cdiv(seqlen, META["BLOCK_M"]), batch, nheads) # noqa
207
+ BLOCK_M = 4 if interleaved else (8 if rotary_dim <= 64 else 4)
208
+
209
+ # Need this, otherwise Triton tries to launch from cuda:0 and we get
210
+ # ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?)
211
+ with torch.cuda.device(x.device.index):
212
+ rotary_kernel[grid](
213
+ output, # data ptrs
214
+ x,
215
+ cos,
216
+ sin,
217
+ cu_seqlens,
218
+ seqlen_offsets,
219
+ seqlen, # shapes
220
+ nheads,
221
+ rotary_dim,
222
+ seqlen_ro,
223
+ seqlen // 128, # key for triton cache (limit number of compilations)
224
+ output.stride(0), # batch_strides
225
+ output.stride(-3), # nheads_stride
226
+ output.stride(-2), # seqlen_stride
227
+ output.stride(-1), # headdim_stride
228
+ x.stride(0), # batch_strides
229
+ x.stride(-3), # nheads stride
230
+ x.stride(-2), # seqlen stride
231
+ x.stride(-1), # headdim stride
232
+ BLOCK_K,
233
+ isinstance(seqlen_offsets, torch.Tensor),
234
+ False,
235
+ interleaved,
236
+ conjugate,
237
+ BLOCK_M,
238
+ )
239
+ return output
240
+
241
+
242
+ class ApplyRotaryEmb(torch.autograd.Function):
243
+ @staticmethod
244
+ def forward(
245
+ ctx,
246
+ x,
247
+ cos,
248
+ sin,
249
+ interleaved=False,
250
+ inplace=False,
251
+ seqlen_offsets: Union[int, torch.Tensor] = 0,
252
+ cu_seqlens: Optional[torch.Tensor] = None,
253
+ max_seqlen: Optional[int] = None,
254
+ ):
255
+ out = apply_rotary(
256
+ x,
257
+ cos,
258
+ sin,
259
+ seqlen_offsets=seqlen_offsets,
260
+ cu_seqlens=cu_seqlens,
261
+ max_seqlen=max_seqlen,
262
+ interleaved=interleaved,
263
+ inplace=inplace,
264
+ )
265
+ if isinstance(seqlen_offsets, int):
266
+ ctx.save_for_backward(cos, sin, cu_seqlens) # Can't save int with save_for_backward
267
+ ctx.seqlen_offsets = seqlen_offsets
268
+ else:
269
+ ctx.save_for_backward(cos, sin, cu_seqlens, seqlen_offsets)
270
+ ctx.seqlen_offsets = None
271
+ ctx.interleaved = interleaved
272
+ ctx.inplace = inplace
273
+ ctx.max_seqlen = max_seqlen
274
+ return out if not inplace else x
275
+
276
+ @staticmethod
277
+ def backward(ctx, do):
278
+ seqlen_offsets = ctx.seqlen_offsets
279
+ if seqlen_offsets is None:
280
+ cos, sin, cu_seqlens, seqlen_offsets = ctx.saved_tensors
281
+ else:
282
+ cos, sin, cu_seqlens = ctx.saved_tensors
283
+ # TD [2023-09-02]: For some reason Triton (2.0.0.post1) errors with
284
+ # "[CUDA]: invalid device context", and cloning makes it work. Idk why. Triton 2.1.0 works.
285
+ if not ctx.interleaved and not ctx.inplace:
286
+ do = do.clone()
287
+ dx = apply_rotary(
288
+ do,
289
+ cos,
290
+ sin,
291
+ seqlen_offsets=seqlen_offsets,
292
+ cu_seqlens=cu_seqlens,
293
+ max_seqlen=ctx.max_seqlen,
294
+ interleaved=ctx.interleaved,
295
+ inplace=ctx.inplace,
296
+ conjugate=True,
297
+ )
298
+ return dx, None, None, None, None, None, None, None
299
+
300
+
301
+ def apply_rotary_emb(
302
+ x,
303
+ cos,
304
+ sin,
305
+ interleaved=False,
306
+ inplace=False,
307
+ seqlen_offsets: Union[int, torch.Tensor] = 0,
308
+ cu_seqlens: Optional[torch.Tensor] = None,
309
+ max_seqlen: Optional[int] = None,
310
+ ):
311
+ """
312
+ Arguments:
313
+ x: (batch_size, seqlen, nheads, headdim) if cu_seqlens is None
314
+ else (total_seqlen, nheads, headdim)
315
+ cos, sin: (seqlen_rotary, rotary_dim / 2)
316
+ interleaved: if True, rotate pairs of even and odd dimensions (GPT-J style) instead
317
+ of 1st half and 2nd half (GPT-NeoX style).
318
+ inplace: if True, apply rotary embedding in-place.
319
+ seqlen_offsets: (batch_size,) or int. Each sequence in x is shifted by this amount.
320
+ Most commonly used in inference when we have KV cache.
321
+ cu_seqlens: (batch + 1,) or None
322
+ max_seqlen: int
323
+ Return:
324
+ out: (batch_size, seqlen, nheads, headdim) if cu_seqlens is None
325
+ else (total_seqlen, nheads, headdim)
326
+ rotary_dim must be <= headdim
327
+ Apply rotary embedding to the first rotary_dim of x.
328
+ """
329
+ return ApplyRotaryEmb.apply(
330
+ x, cos, sin, interleaved, inplace, seqlen_offsets, cu_seqlens, max_seqlen
331
+ )
332
+
333
+
334
+ # For backward compatibility
335
+ apply_rotary_emb_func = apply_rotary_emb
336
+
337
+
338
+ class FastRotaryEmbedding(torch.nn.Module):
339
+ """
340
+ The rotary position embeddings from RoFormer_ (Su et. al).
341
+ A crucial insight from the method is that the query and keys are
342
+ transformed by rotation matrices which depend on the relative positions.
343
+
344
+ Other implementations are available in the Rotary Transformer repo_ and in
345
+ GPT-NeoX_, GPT-NeoX was an inspiration
346
+
347
+ .. _RoFormer: https://arxiv.org/abs/2104.09864
348
+ .. _repo: https://github.com/ZhuiyiTechnology/roformer
349
+ .. _GPT-NeoX: https://github.com/EleutherAI/gpt-neox
350
+
351
+ If scale_base is not None, this implements XPos (Sun et al., https://arxiv.org/abs/2212.10554).
352
+ A recommended value for scale_base is 512: https://github.com/HazyResearch/flash-attention/issues/96
353
+ Reference: https://github.com/sunyt32/torchscale/blob/main/torchscale/component/xpos_relative_position.py
354
+ """
355
+
356
+ def __init__(
357
+ self,
358
+ dim: int,
359
+ base=10000,
360
+ interleaved=False,
361
+ scale_base=None,
362
+ pos_idx_in_fp32=True,
363
+ device=None,
364
+ ):
365
+ """
366
+ interleaved: if True, rotate pairs of even and odd dimensions (GPT-J style) instead
367
+ of 1st half and 2nd half (GPT-NeoX style).
368
+ pos_idx_in_fp32: if True, the position indices [0.0, ..., seqlen - 1] are in fp32,
369
+ otherwise they might be in lower precision.
370
+ This option was added because previously (before 2023-07-02), when we construct
371
+ the position indices, we use the dtype of self.inv_freq. In most cases this would
372
+ be fp32, but if the model is trained in pure bf16 (not mixed precision), then
373
+ self.inv_freq would be bf16, and the position indices are also in bf16.
374
+ Because of the limited precision of bf16 (e.g. 1995.0 is rounded to 2000.0), the
375
+ embeddings for some positions will coincide.
376
+ To maintain compatibility with models previously trained in pure bf16,
377
+ we add this option.
378
+ """
379
+ super().__init__()
380
+ self.dim = dim
381
+ self.base = base
382
+ self.pos_idx_in_fp32 = pos_idx_in_fp32
383
+ # Generate and save the inverse frequency buffer (non trainable)
384
+ inv_freq = self._compute_inv_freq(device)
385
+ self.register_buffer("inv_freq", inv_freq)
386
+ self.interleaved = interleaved
387
+ self.scale_base = scale_base
388
+ scale = (
389
+ (torch.arange(0, dim, 2, device=device, dtype=torch.float32) + 0.4 * dim) / (1.4 * dim)
390
+ if scale_base is not None
391
+ else None
392
+ )
393
+ self.register_buffer("scale", scale, persistent=False)
394
+
395
+ self._seq_len_cached = 0
396
+ self._cos_cached = None
397
+ self._sin_cached = None
398
+ self._cos_k_cached = None
399
+ self._sin_k_cached = None
400
+ self.cos = None
401
+ self.sin = None
402
+
403
+ def _compute_inv_freq(self, device=None):
404
+ return 1.0 / (
405
+ self.base
406
+ ** (torch.arange(0, self.dim, 2, device=device) / self.dim)
407
+ # ** (torch.arange(0, self.dim, 2, device=device).float() / self.dim)
408
+ )
409
+
410
+ def _update_cos_sin_cache(self, seqlen, position_id, device=None, dtype=None):
411
+
412
+ if (
413
+ seqlen > self._seq_len_cached
414
+ ):
415
+ self._seq_len_cached = seqlen
416
+ # We want fp32 here, not self.inv_freq.dtype, since the model could be loaded in bf16
417
+ # And the output of arange can be quite large, so bf16 would lose a lot of precision.
418
+ # However, for compatibility reason, we add an option to use the dtype of self.inv_freq.
419
+ if self.pos_idx_in_fp32:
420
+ t = torch.arange(seqlen, device=device, dtype=torch.float32)
421
+ # We want fp32 here as well since inv_freq will be multiplied with t, and the output
422
+ # will be large. Having it in bf16 will lose a lot of precision and cause the
423
+ # cos & sin output to change significantly.
424
+ # We want to recompute self.inv_freq if it was not loaded in fp32
425
+ if self.inv_freq.dtype != torch.float32:
426
+ inv_freq = self._compute_inv_freq(device=device)
427
+ else:
428
+ inv_freq = self.inv_freq
429
+ else:
430
+ t = torch.arange(seqlen, device=device, dtype=self.inv_freq.dtype)
431
+ inv_freq = self.inv_freq
432
+ freqs = torch.einsum("i,j->ij", t, inv_freq)
433
+ if self.scale is None:
434
+ self._cos_cached = torch.cos(freqs).to(dtype)
435
+ self._sin_cached = torch.sin(freqs).to(dtype)
436
+
437
+ else:
438
+ power = (
439
+ torch.arange(seqlen, dtype=self.scale.dtype, device=self.scale.device)
440
+ - seqlen // 2
441
+ ) / self.scale_base
442
+ scale = self.scale.to(device=power.device) ** rearrange(power, "s -> s 1")
443
+ # We want the multiplication by scale to happen in fp32
444
+ self._cos_cached = (torch.cos(freqs) * scale).to(dtype)
445
+ self._sin_cached = (torch.sin(freqs) * scale).to(dtype)
446
+ self._cos_k_cached = (torch.cos(freqs) / scale).to(dtype)
447
+ self._sin_k_cached = (torch.sin(freqs) / scale).to(dtype)
448
+
449
+ def forward(
450
+ self,
451
+ q: torch.Tensor,
452
+ k: torch.Tensor,
453
+ position_ids: torch.Tensor,
454
+ max_seqlen,
455
+ ) -> Tuple[torch.Tensor, torch.Tensor]:
456
+ """
457
+ q: (batch, nheads, seqlen, headdim)
458
+ k: (batch, nheads, seqlen, headdim)
459
+ position_id: (batch, seqlen)
460
+ max_seqlen: int
461
+ layer_id: int
462
+ only if layer_id == 0, then update cons and sin
463
+ Apply rotary embedding *inplace* to q k.
464
+ """
465
+
466
+ self._update_cos_sin_cache(max_seqlen, position_ids, device=q.device, dtype=q.dtype)
467
+ cos, sin = F.embedding(position_ids, self._cos_cached), F.embedding(position_ids, self._sin_cached)
468
+
469
+ q = apply_rotary_emb_func(
470
+ q,
471
+ cos,
472
+ sin,
473
+ interleaved=self.interleaved,
474
+ inplace=True
475
+ )
476
+ k = apply_rotary_emb_func(
477
+ k,
478
+ cos,
479
+ sin,
480
+ interleaved=self.interleaved,
481
+ inplace=True
482
+ )
483
+ return q, k
visual.py ADDED
@@ -0,0 +1,143 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ from torch import nn
3
+ from argparse import Namespace
4
+ import xformers.ops as xops
5
+ from transformers.activations import ACT2FN
6
+
7
+
8
+ class PatchEmbedding(nn.Module):
9
+ def __init__(self, config):
10
+ super().__init__()
11
+ self.proj = nn.Conv2d(config.in_channels, config.hidden_size, kernel_size=config.patch_size, stride=config.patch_size)
12
+ self.cls_embedding = nn.Parameter(torch.zeros(1, config.hidden_size))
13
+ self.position_embedding = nn.Embedding(config.num_positions, config.hidden_size)
14
+
15
+ def forward(self, images: "tensor(B, C, H, W)") -> "tensor(B, L, D)":
16
+ x = self.proj(images)
17
+ x = x.flatten(2).transpose(1, 2)
18
+ cls_token = self.cls_embedding.expand(x.shape[0], -1, -1)
19
+ x = torch.cat((cls_token, x), dim=1)
20
+ x += self.position_embedding.weight.unsqueeze(0)
21
+ return x
22
+
23
+
24
+ class Attention(nn.Module):
25
+ def __init__(self, config):
26
+ super().__init__()
27
+ self.num_heads = config.num_heads
28
+ head_dim = config.hidden_size // config.num_heads
29
+ self.scale = head_dim ** -0.5
30
+ self.query_key_value = nn.Linear(config.hidden_size, config.hidden_size * 3)
31
+ self.dense = nn.Linear(config.hidden_size, config.hidden_size)
32
+ self.output_dropout = torch.nn.Dropout(config.dropout_prob)
33
+
34
+ def forward(self, x: "tensor(B, L, D)") -> "tensor(B, L, D)":
35
+ B, L, _ = x.shape
36
+ qkv = self.query_key_value(x)
37
+ qkv = qkv.reshape(B, L, 3, self.num_heads, -1).permute(2, 0, 1, 3, 4) # 3, B, L, H, D
38
+ q, k, v = qkv[0], qkv[1], qkv[2]
39
+
40
+ out = xops.memory_efficient_attention(
41
+ q, k, v, scale=self.scale,
42
+ )
43
+ output = self.dense(out.view(B, L, -1))
44
+ output = self.output_dropout(output)
45
+ return output
46
+
47
+ def attention(self, q, k, v):
48
+ attn_weights = torch.matmul(q * self.scale, k.transpose(-2, -1))
49
+ attn_weights = attn_weights.softmax(dim=-1)
50
+ output = torch.matmul(attn_weights, v)
51
+ return output
52
+
53
+
54
+ class MLP(nn.Module):
55
+ def __init__(self, config):
56
+ super().__init__()
57
+ self.config = config
58
+ self.activation_fn = ACT2FN[config.hidden_act]
59
+ self.fc1 = nn.Linear(config.hidden_size, config.intermediate_size)
60
+ self.fc2 = nn.Linear(config.intermediate_size, config.hidden_size)
61
+
62
+ def forward(self, x: torch.Tensor) -> torch.Tensor:
63
+ x = self.fc1(x)
64
+ x = self.activation_fn(x)
65
+ x = self.fc2(x)
66
+ return x
67
+
68
+
69
+ class TransformerLayer(nn.Module):
70
+ def __init__(self, config):
71
+ super().__init__()
72
+ self.input_layernorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
73
+ self.attention = Attention(config)
74
+ self.mlp = MLP(config)
75
+ self.post_attention_layernorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
76
+
77
+ def forward(self, hidden_states):
78
+ attention_input = hidden_states
79
+ attention_output = self.input_layernorm(self.attention(attention_input))
80
+ hidden_states = attention_input + attention_output
81
+ mlp_input = hidden_states
82
+ mlp_output = self.post_attention_layernorm(self.mlp(mlp_input))
83
+ output = mlp_input + mlp_output
84
+ return output
85
+
86
+
87
+ class Transformer(nn.Module):
88
+ def __init__(self, config):
89
+ super().__init__()
90
+ self.layers = nn.ModuleList([TransformerLayer(config) for _ in range(config.num_hidden_layers)])
91
+
92
+ def forward(self, hidden_states):
93
+ for layer_module in self.layers:
94
+ hidden_states = layer_module(hidden_states)
95
+ return hidden_states
96
+
97
+
98
+ class GLU(nn.Module):
99
+ def __init__(self, config, in_features):
100
+ super().__init__()
101
+ self.linear_proj = nn.Linear(in_features, config.hidden_size, bias=False)
102
+ self.norm1 = nn.LayerNorm(config.hidden_size)
103
+ self.act1 = nn.GELU()
104
+ self.act2 = nn.functional.silu
105
+ self.dense_h_to_4h = nn.Linear(config.hidden_size, config.intermediate_size, bias=False)
106
+ self.gate_proj = nn.Linear(config.hidden_size, config.intermediate_size, bias=False)
107
+ self.dense_4h_to_h = nn.Linear(config.intermediate_size, config.hidden_size, bias=False)
108
+
109
+ def forward(self, x):
110
+ x = self.linear_proj(x)
111
+ x = self.act1(self.norm1(x))
112
+ x = self.act2(self.gate_proj(x)) * self.dense_h_to_4h(x)
113
+ x = self.dense_4h_to_h(x)
114
+ return x
115
+
116
+
117
+ class EVA2CLIPModel(nn.Module):
118
+ def __init__(self, config):
119
+ super().__init__()
120
+ vision_config = Namespace(**config.vision_config)
121
+ self.patch_embedding = PatchEmbedding(vision_config)
122
+ self.transformer = Transformer(vision_config)
123
+ self.linear_proj = GLU(config, in_features=vision_config.hidden_size)
124
+ self.conv = nn.Conv2d(in_channels=vision_config.hidden_size, out_channels=vision_config.hidden_size, kernel_size=2, stride=2)
125
+ self.boi = nn.Parameter(torch.zeros(1, 1, config.hidden_size))
126
+ self.eoi = nn.Parameter(torch.zeros(1, 1, config.hidden_size))
127
+
128
+ def forward(self, images: "tensor(B, C, H, W)") -> "tensor(B, L, D)":
129
+ x = self.patch_embedding(images)
130
+ x = self.transformer(x)
131
+ x = x[:, 1:]
132
+
133
+ b, s, h = x.shape
134
+ grid_size = int(s**0.5)
135
+ x = x.view(b, grid_size, grid_size, h).permute(0, 3, 1, 2)
136
+ x = self.conv(x)
137
+
138
+ x = x.flatten(2).transpose(1, 2)
139
+ x = self.linear_proj(x)
140
+ boi = self.boi.expand(x.shape[0], -1, -1)
141
+ eoi = self.eoi.expand(x.shape[0], -1, -1)
142
+ x = torch.cat((boi, x, eoi), dim=1)
143
+ return x