zR
commited on
Commit
•
d16a569
1
Parent(s):
35871ed
change qingying page
Browse files- README.md +1 -3
- README_zh.md +58 -2
README.md
CHANGED
@@ -109,7 +109,7 @@ inference: false
|
|
109 |
|
110 |
## Model Introduction
|
111 |
|
112 |
-
CogVideoX is an open-source version of the video generation model originating from [QingYing](https://chatglm.cn/video?fr=osm_cogvideo). The table below displays the list of video generation models we currently offer, along with their foundational information.
|
113 |
|
114 |
<table style="border-collapse: collapse; width: 100%;">
|
115 |
<tr>
|
@@ -194,8 +194,6 @@ CogVideoX is an open-source version of the video generation model originating fr
|
|
194 |
+ Using [SAT](https://github.com/THUDM/SwissArmyTransformer) for inference and fine-tuning of SAT version
|
195 |
models. Feel free to visit our GitHub for more information.
|
196 |
|
197 |
-
|
198 |
-
|
199 |
## Quick Start 🤗
|
200 |
|
201 |
This model supports deployment using the huggingface diffusers library. You can deploy it by following these steps.
|
|
|
109 |
|
110 |
## Model Introduction
|
111 |
|
112 |
+
CogVideoX is an open-source version of the video generation model originating from [QingYing](https://chatglm.cn/video?lang=en?fr=osm_cogvideo). The table below displays the list of video generation models we currently offer, along with their foundational information.
|
113 |
|
114 |
<table style="border-collapse: collapse; width: 100%;">
|
115 |
<tr>
|
|
|
194 |
+ Using [SAT](https://github.com/THUDM/SwissArmyTransformer) for inference and fine-tuning of SAT version
|
195 |
models. Feel free to visit our GitHub for more information.
|
196 |
|
|
|
|
|
197 |
## Quick Start 🤗
|
198 |
|
199 |
This model supports deployment using the huggingface diffusers library. You can deploy it by following these steps.
|
README_zh.md
CHANGED
@@ -116,8 +116,8 @@ CogVideoX是 [清影](https://chatglm.cn/video?fr=osm_cogvideo) 同源的开源
|
|
116 |
</tr>
|
117 |
<tr>
|
118 |
<td style="text-align: center;">单GPU显存消耗<br></td>
|
119 |
-
<td style="text-align: center;">FP16: 18GB using <a href="https://github.com/THUDM/SwissArmyTransformer">SAT</a> / <b>12.5GB* using diffusers</b><br><b>INT8: 7.8GB* using diffusers</b></td>
|
120 |
-
<td style="text-align: center;">BF16: 26GB using <a href="https://github.com/THUDM/SwissArmyTransformer">SAT</a> / <b>20.7GB* using diffusers</b><br><b>INT8: 11.4GB* using diffusers</b></td>
|
121 |
</tr>
|
122 |
<tr>
|
123 |
<td style="text-align: center;">多GPU推理显存消耗</td>
|
@@ -226,6 +226,62 @@ video = pipe(
|
|
226 |
|
227 |
export_to_video(video, "output.mp4", fps=8)
|
228 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
229 |
|
230 |
## 深入研究
|
231 |
|
|
|
116 |
</tr>
|
117 |
<tr>
|
118 |
<td style="text-align: center;">单GPU显存消耗<br></td>
|
119 |
+
<td style="text-align: center;">FP16: 18GB using <a href="https://github.com/THUDM/SwissArmyTransformer">SAT</a> / <b>12.5GB* using diffusers</b><br><b>INT8: 7.8GB* using diffusers with torchao</b></td>
|
120 |
+
<td style="text-align: center;">BF16: 26GB using <a href="https://github.com/THUDM/SwissArmyTransformer">SAT</a> / <b>20.7GB* using diffusers</b><br><b>INT8: 11.4GB* using diffusers with torchao</b></td>
|
121 |
</tr>
|
122 |
<tr>
|
123 |
<td style="text-align: center;">多GPU推理显存消耗</td>
|
|
|
226 |
|
227 |
export_to_video(video, "output.mp4", fps=8)
|
228 |
```
|
229 |
+
## Quantized Inference
|
230 |
+
|
231 |
+
[PytorchAO](https://github.com/pytorch/ao) 和 [Optimum-quanto](https://github.com/huggingface/optimum-quanto/)
|
232 |
+
可以用于对文本编码器、Transformer 和 VAE 模块进行量化,从而降低 CogVideoX 的内存需求。这使得在免费的 T4 Colab 或较小 VRAM 的
|
233 |
+
GPU 上运行该模型成为可能!值得注意的是,TorchAO 量化与 `torch.compile` 完全兼容,这可以显著加快推理速度。
|
234 |
+
|
235 |
+
```diff
|
236 |
+
# To get started, PytorchAO needs to be installed from the GitHub source and PyTorch Nightly.
|
237 |
+
# Source and nightly installation is only required until next release.
|
238 |
+
|
239 |
+
import torch
|
240 |
+
from diffusers import AutoencoderKLCogVideoX, CogVideoXTransformer3DModel, CogVideoXPipeline
|
241 |
+
from diffusers.utils import export_to_video
|
242 |
+
+ from transformers import T5EncoderModel
|
243 |
+
+ from torchao.quantization import quantize_, int8_weight_only, int8_dynamic_activation_int8_weight
|
244 |
+
|
245 |
+
+ quantization = int8_weight_only
|
246 |
+
|
247 |
+
+ text_encoder = T5EncoderModel.from_pretrained("THUDM/CogVideoX-5b", subfolder="text_encoder", torch_dtype=torch.bfloat16)
|
248 |
+
+ quantize_(text_encoder, quantization())
|
249 |
+
|
250 |
+
+ transformer = CogVideoXTransformer3DModel.from_pretrained("THUDM/CogVideoX-5b", subfolder="transformer", torch_dtype=torch.bfloat16)
|
251 |
+
+ quantize_(transformer, quantization())
|
252 |
+
|
253 |
+
+ vae = AutoencoderKLCogVideoX.from_pretrained("THUDM/CogVideoX-5b", subfolder="vae", torch_dtype=torch.bfloat16)
|
254 |
+
+ quantize_(vae, quantization())
|
255 |
+
|
256 |
+
# Create pipeline and run inference
|
257 |
+
pipe = CogVideoXPipeline.from_pretrained(
|
258 |
+
"THUDM/CogVideoX-5b",
|
259 |
+
+ text_encoder=text_encoder,
|
260 |
+
+ transformer=transformer,
|
261 |
+
+ vae=vae,
|
262 |
+
torch_dtype=torch.bfloat16,
|
263 |
+
)
|
264 |
+
pipe.enable_model_cpu_offload()
|
265 |
+
pipe.vae.enable_tiling()
|
266 |
+
|
267 |
+
prompt = "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical atmosphere of this unique musical performance."
|
268 |
+
|
269 |
+
video = pipe(
|
270 |
+
prompt=prompt,
|
271 |
+
num_videos_per_prompt=1,
|
272 |
+
num_inference_steps=50,
|
273 |
+
num_frames=49,
|
274 |
+
guidance_scale=6,
|
275 |
+
generator=torch.Generator(device="cuda").manual_seed(42),
|
276 |
+
).frames[0]
|
277 |
+
|
278 |
+
export_to_video(video, "output.mp4", fps=8)
|
279 |
+
```
|
280 |
+
|
281 |
+
此外,这些模型可以通过使用PytorchAO以量化数据类型序列化并存储,从而节省磁盘空间。你可以在以下链接中找到示例和基准测试。
|
282 |
+
|
283 |
+
- [torchao](https://gist.github.com/a-r-r-o-w/4d9732d17412888c885480c6521a9897)
|
284 |
+
- [quanto](https://gist.github.com/a-r-r-o-w/31be62828b00a9292821b85c1017effa)
|
285 |
|
286 |
## 深入研究
|
287 |
|