# MuseV [English](README.md) [中文](README-zh.md) MuseV:基于视觉条件并行去噪的无限长度和高保真虚拟人视频生成。 Zhiqiang Xia \*, Zhaokang Chen\*, Bin Wu†, Chao Li, Kwok-Wai Hung, Chao Zhan, Yingjie He, Wenjiang Zhou (*co-first author, †Corresponding Author, benbinwu@tencent.com) **[github](https://github.com/TMElyralab/MuseV)** **[huggingface](https://huggingface.co/TMElyralab/MuseV)** **[HuggingfaceSpace](https://huggingface.co/spaces/AnchorFake/MuseVDemo)** **[project](comming soon)** **Technical report (comming soon)** 我们在2023年3月相信扩散模型可以模拟世界,也开始基于扩散模型研发世界视觉模拟器。`MuseV`是在 2023 年 7 月左右实现的一个里程碑。受到 Sora 进展的启发,我们决定开源 MuseV。MuseV 站在开源的肩膀上成长,也希望能够借此反馈社区。接下来,我们将转向有前景的扩散+变换器方案。 我们已经发布 MuseTalk. `MuseTalk`是一个实时高质量的唇同步模型,可与 `MuseV` 一起构建完整的`虚拟人生成解决方案`。请保持关注! # 概述 `MuseV` 是基于扩散模型的虚拟人视频生成框架,具有以下特点: 1. 支持使用新颖的视觉条件并行去噪方案进行无限长度生成,不会再有误差累计的问题,尤其适用于固定相机位的场景。 1. 提供了基于人物类型数据集训练的虚拟人视频生成预训练模型。 1. 支持图像到视频、文本到图像到视频、视频到视频的生成。 1. 兼容 `Stable Diffusio`n 文图生成生态系统,包括 `base_model`、`lora`、`controlnet` 等。 1. 支持多参考图像技术,包括 `IPAdapter`、`ReferenceOnly`、`ReferenceNet`、`IPAdapterFaceID`。 1. 我们后面也会推出训练代码。 # 重要更新 1. `musev_referencenet_pose`: `unet`, `ip_adapter` 的模型名字指定错误,请使用 `musev_referencenet_pose`而不是`musev_referencenet`,请使用最新的main分支。 # 进展 - [2024年3月27日] 发布 `MuseV` 项目和训练好的模型 `musev`、`muse_referencenet`、`muse_referencenet_pose`。 - [03/30/2024] 在 huggingface space 上新增 [gui](https://huggingface.co/spaces/AnchorFake/MuseVDemo) 交互方式来生成视频. ## 模型 ### 模型结构示意图 ![model_structure](./data/models/musev_structure.png) ### 并行去噪算法示意图 ![parallel_denoise](./data//models/parallel_denoise.png) ## 测试用例 生成结果的所有帧直接由`MuseV`生成,没有时序超分辨、空间超分辨等任何后处理。 以下所有测试用例都维护在 `configs/tasks/example.yaml`,可以直接运行复现。 ### 输入文本、图像的视频生成 #### 人类
image | video | prompt |
(masterpiece, best quality, highres:1),(1girl, solo:1),(beautiful face, soft skin, costume:1),(eye blinks:{eye_blinks_factor}),(head wave:1.3) | ||
(masterpiece, best quality, highres:1),(1girl, solo:1),(beautiful face, soft skin, costume:1),(eye blinks:{eye_blinks_factor}),(head wave:1.3) | ||
(masterpiece, best quality, highres:1), peaceful beautiful sea scene | ||
(masterpiece, best quality, highres:1), peaceful beautiful sea scene | ||
(masterpiece, best quality, highres:1), playing guitar | ||
(masterpiece, best quality, highres:1), playing guitar | ||
(masterpiece, best quality, highres:1), playing guitar | ||
(masterpiece, best quality, highres:1), playing guitar | ||
(masterpiece, best quality, highres:1),(1man, solo:1),(beautiful face, soft skin, costume:1),(eye blinks:{eye_blinks_factor}),(head wave:1.3) | ||
(masterpiece, best quality, highres:1),(1girl, solo:1),(beautiful face, soft skin, costume:1),(eye blinks:{eye_blinks_factor}),(head wave:1.3) | ||
(masterpiece, best quality, highres:1),(1man, solo:1),(beautiful face, soft skin, costume:1),(eye blinks:{eye_blinks_factor}),(head wave:1.3) | ||
(masterpiece, best quality, highres:1),(1girl, solo:1),(beautiful face, soft skin, costume:1),(eye blinks:{eye_blinks_factor}),(head wave:1.3) | ||
(masterpiece, best quality, highres:1),(1girl, solo:1),(beautiful face, soft skin, costume:1),(eye blinks:{eye_blinks_factor}),(head wave:1.3) |
image | video | prompt |
(masterpiece, best quality, highres:1), peaceful beautiful waterfall, an endless waterfall | ||
(masterpiece, best quality, highres:1), peaceful beautiful river | ||
(masterpiece, best quality, highres:1), peaceful beautiful sea scene |
image | video | prompt |
(masterpiece, best quality, highres:1) | ||
(masterpiece, best quality, highres:1) |
name | video |
talk | |
talk | |
sing |