File size: 4,242 Bytes
1ae41a2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b28a00c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
license: apache-2.0
---
# ExVideo

ExVideo is a post-tuning technique aimed at enhancing the capability of video generation models. We have extended CogVideoX-5B to generate videos up to 129 frames long. 

This is our second publicly released model, incorporating LoRA into the structure of CogVideoX-5B.


* [Project Page](https://ecnu-cilab.github.io/ExVideoProjectPage/)
* [Source Code](https://github.com/modelscope/DiffSynth-Studio)
* [Technical report](https://arxiv.org/abs/2406.14130)

## Usages

```python
from diffsynth import ModelManager, CogVideoPipeline, save_video, download_models
import torch


download_models(["CogVideoX-5B", "ExVideo-CogVideoX-LoRA-129f-v1"])
model_manager = ModelManager(torch_dtype=torch.bfloat16)
model_manager.load_models([
    "models/CogVideo/CogVideoX-5b/text_encoder",
    "models/CogVideo/CogVideoX-5b/transformer",
    "models/CogVideo/CogVideoX-5b/vae/diffusion_pytorch_model.safetensors",
])
model_manager.load_lora("models/lora/ExVideo-CogVideoX-LoRA-129f-v1.safetensors")
pipe = CogVideoPipeline.from_model_manager(model_manager)

torch.manual_seed(6)
video = pipe(
    prompt="an astronaut riding a horse on Mars.",
    height=480, width=720, num_frames=129,
    cfg_scale=7.0, num_inference_steps=100,
)
save_video(video, "video_with_lora.mp4", fps=8, quality=5)
```

Please refer to [DiffSynth](https://github.com/modelscope/DiffSynth-Studio) for more information.

## Examples

Prompt: an astronaut riding a horse on Mars.

<video src="https://github.com/user-attachments/assets/321ee04b-8c17-479e-8a95-8cbcf21f8d7e" controls="controls"></video>

Prompt: Static camera, two men shake hands happily, the background is in a modern office.

<video src="https://github.com/user-attachments/assets/32cc1ef8-5b59-4817-9095-14032365479b" controls="controls"></video>

Prompt: The camera captures the northern lights dancing across an Arctic sky, with stars twinkling above a snow-covered landscape, creating a serene and magical atmosphere.

<video src="https://github.com/user-attachments/assets/92504158-50f0-481f-8188-b1cfca953faf" controls="controls"></video>

Prompt: FPV aerial shot, the sunshine shines on the snow capped mountains, a quiet atmosphere.

<video src="https://github.com/user-attachments/assets/750698c9-c59d-451a-8599-32e70c2c566f" controls="controls"></video>

Prompt: A Chinese mother, draped in a soft, pastel-colored robe, gently rocks back and forth in a cozy rocking chair positioned in the tranquil setting of a nursery. The dimly lit bedroom is adorned with whimsical mobiles dangling from the ceiling, casting shadows that dance on the walls. Her baby, swaddled in a delicate, patterned blanket, rests against her chest, the child's earlier cries now replaced by contented coos as the mother's soothing voice lulls the little one to sleep. The scent of lavender fills the air, adding to the serene atmosphere, while a warm, orange glow from a nearby nightlight illuminates the scene with a gentle hue, capturing a moment of tender love and comfort.

<video src="https://github.com/user-attachments/assets/0ef36ab9-e636-46bd-bc98-450a112c3c77" controls="controls"></video>

Comparing the model with/without the ExVideo extension module, we found that the original model exhibited noticeable detail loss when generating long videos. The ExVideo extension module significantly enhances the detail of the videos.

<div style="display: flex; flex-wrap: wrap; justify-content: space-around;">
    <div style="width: 45%; margin-bottom: 20px; transition: transform 0.3s;">
        <video width="100%" controls>
            <source src="https://github.com/user-attachments/assets/86b16c00-92ce-4acb-a172-b3e6cefbc29d" type="video/mp4">
        </video>
        <div style="text-align: center; margin-top: 10px; font-size: 11px;">Without ExVideo extension module</div>
    </div>
    <div style="width: 45%; margin-bottom: 20px; transition: transform 0.3s;">
        <video width="100%" controls>
            <source src="https://github.com/user-attachments/assets/321ee04b-8c17-479e-8a95-8cbcf21f8d7e" type="video/mp4">
        </video>
        <div style="text-align: center; margin-top: 10px; font-size: 11px;">With ExVideo extension module</div>
    </div>
</div>