Abstract
We introduce Lumiere -- a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion -- a pivotal challenge in video synthesis. To this end, we introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model. This is in contrast to existing video models which synthesize distant keyframes followed by temporal super-resolution -- an approach that inherently makes global temporal consistency difficult to achieve. By deploying both spatial and (importantly) temporal down- and up-sampling and leveraging a pre-trained text-to-image diffusion model, our model learns to directly generate a full-frame-rate, low-resolution video by processing it in multiple space-time scales. We demonstrate state-of-the-art text-to-video generation results, and show that our design easily facilitates a wide range of content creation tasks and video editing applications, including image-to-video, video inpainting, and stylized generation.
Community
No model just show once again? Google you are being way way behind in AI race.
decide
The video is amazing, but where is the action?! You can take top 1 in the AI race, but why don't you want it yourself? What should push you to do this?!
I would like to perform tests on the model, very rich hard tests.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution (2024)
- Photorealistic Video Generation with Diffusion Models (2023)
- Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer (2023)
- BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models (2023)
- VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models (2023)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
Cool!
Lumiere's Breakthrough: Space-Time Diffusion for Stunning Video Generation
Links π:
π Subscribe: https://www.youtube.com/@Arxflix
π Twitter: https://x.com/arxflix
π LMNT (Partner): https://lmnt.com/
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper