VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
Abstract
Video generation has increasingly gained interest in both academia and industry. Although commercial tools can generate plausible videos, there is a limited number of open-source models available for researchers and engineers. In this work, we introduce two diffusion models for high-quality video generation, namely text-to-video (T2V) and image-to-video (I2V) models. T2V models synthesize a video based on a given text input, while I2V models incorporate an additional image input. Our proposed T2V model can generate realistic and cinematic-quality videos with a resolution of 1024 times 576, outperforming other open-source T2V models in terms of quality. The I2V model is designed to produce videos that strictly adhere to the content of the provided reference image, preserving its content, structure, and style. This model is the first open-source I2V foundation model capable of transforming a given image into a video clip while maintaining content preservation constraints. We believe that these open-source video generation models will contribute significantly to the technological advancements within the community.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models (2023)
- Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models (2023)
- SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction (2023)
- Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation (2023)
- LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation (2023)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper