WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens Paper • 2401.09985 • Published Jan 18 • 14
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects Paper • 2401.09962 • Published Jan 18 • 7
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution Paper • 2401.10404 • Published Jan 18 • 10
Lumiere: A Space-Time Diffusion Model for Video Generation Paper • 2401.12945 • Published Jan 23 • 86
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning Paper • 2402.00769 • Published Feb 1 • 20
VideoPrism: A Foundational Visual Encoder for Video Understanding Paper • 2402.13217 • Published Feb 20 • 21
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis Paper • 2402.14797 • Published Feb 22 • 19
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models Paper • 2402.17177 • Published Feb 27 • 88
Sora Generates Videos with Stunning Geometrical Consistency Paper • 2402.17403 • Published Feb 27 • 16
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners Paper • 2402.17723 • Published Feb 27 • 16
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers Paper • 2402.19479 • Published Feb 29 • 32
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models Paper • 2403.03100 • Published Mar 5 • 34
Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation Paper • 2403.02827 • Published Mar 5 • 6
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding Paper • 2403.09626 • Published Mar 14 • 13
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations Paper • 2108.01073 • Published Aug 2, 2021 • 7
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization Paper • 2404.09956 • Published Apr 15 • 11
MotionMaster: Training-free Camera Motion Transfer For Video Generation Paper • 2404.15789 • Published Apr 24 • 10
LLM-AD: Large Language Model based Audio Description System Paper • 2405.00983 • Published May 2 • 16
FIFO-Diffusion: Generating Infinite Videos from Text without Training Paper • 2405.11473 • Published May 19 • 53
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation Paper • 2405.14598 • Published May 23 • 11
Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition Paper • 2405.15216 • Published May 24 • 12
I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models Paper • 2405.16537 • Published May 26 • 15
Looking Backward: Streaming Video-to-Video Translation with Feature Banks Paper • 2405.15757 • Published May 24 • 14
Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer Paper • 2405.17405 • Published May 27 • 14
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control Paper • 2405.17414 • Published May 27 • 10
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning Paper • 2405.18386 • Published May 28 • 20
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback Paper • 2405.18750 • Published May 29 • 20
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture Paper • 2405.18991 • Published May 29 • 12
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model Paper • 2405.20222 • Published May 30 • 10
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark Paper • 2405.19707 • Published May 30 • 4
Learning Temporally Consistent Video Depth from Video Diffusion Priors Paper • 2406.01493 • Published Jun 3 • 17
ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation Paper • 2406.00908 • Published Jun 3 • 11
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions Paper • 2406.04325 • Published Jun 6 • 71
VideoTetris: Towards Compositional Text-to-Video Generation Paper • 2406.04277 • Published Jun 6 • 22
MotionClone: Training-Free Motion Cloning for Controllable Video Generation Paper • 2406.05338 • Published Jun 8 • 39
NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing Paper • 2406.06523 • Published Jun 10 • 50
Hierarchical Patch Diffusion Models for High-Resolution Video Generation Paper • 2406.07792 • Published Jun 12 • 13
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation Paper • 2406.07686 • Published Jun 11 • 14
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation Paper • 2406.08656 • Published Jun 12 • 7
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality Paper • 2406.08845 • Published Jun 13 • 8
ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning Paper • 2406.14130 • Published Jun 20 • 10
MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation Paper • 2406.15252 • Published Jun 21 • 14
DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models Paper • 2407.01519 • Published Jul 1 • 22
SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix Paper • 2407.00367 • Published Jun 29 • 9
VIMI: Grounding Video Generation through Multi-modal Instruction Paper • 2407.06304 • Published Jul 8 • 9
VEnhancer: Generative Space-Time Enhancement for Video Generation Paper • 2407.07667 • Published Jul 10 • 12
Still-Moving: Customized Video Generation without Customized Video Data Paper • 2407.08674 • Published Jul 11 • 12
CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation Paper • 2407.06188 • Published Jul 8 • 1
TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models Paper • 2407.09012 • Published Jul 12 • 8
Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models Paper • 2407.10285 • Published Jul 14 • 4
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control Paper • 2407.12781 • Published Jul 17 • 12
Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion Paper • 2407.13759 • Published Jul 18 • 17
Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models Paper • 2407.15642 • Published Jul 22 • 10
MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence Paper • 2407.16655 • Published Jul 23 • 27
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation Paper • 2407.14505 • Published Jul 19 • 23
FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention Paper • 2407.19918 • Published Jul 29 • 47
Tora: Trajectory-oriented Diffusion Transformer for Video Generation Paper • 2407.21705 • Published Jul 31 • 25
Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion Paper • 2408.00458 • Published Aug 1 • 10
UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model Paper • 2408.00762 • Published Aug 1 • 9
VidGen-1M: A Large-Scale Dataset for Text-to-video Generation Paper • 2408.02629 • Published Aug 5 • 13
ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer Paper • 2408.03284 • Published Aug 6 • 9
Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics Paper • 2408.04631 • Published Aug 8 • 8
Kalman-Inspired Feature Propagation for Video Face Super-Resolution Paper • 2408.05205 • Published Aug 9 • 8
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer Paper • 2408.06072 • Published Aug 12 • 35
FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance Paper • 2408.08189 • Published Aug 15 • 14
Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data Paper • 2408.10119 • Published Aug 19 • 15
TWLV-I: Analysis and Insights from Holistic Evaluation on Video Foundation Models Paper • 2408.11318 • Published Aug 21 • 54
TrackGo: A Flexible and Efficient Method for Controllable Video Generation Paper • 2408.11475 • Published Aug 21 • 16
Real-Time Video Generation with Pyramid Attention Broadcast Paper • 2408.12588 • Published Aug 22 • 14
CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities Paper • 2408.13239 • Published Aug 23 • 10
Training-free Long Video Generation with Chain of Diffusion Model Experts Paper • 2408.13423 • Published Aug 24 • 20
TVG: A Training-free Transition Video Generation Method with Diffusion Models Paper • 2408.13413 • Published Aug 24 • 13
Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation Paper • 2408.15239 • Published Aug 27 • 27
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model Paper • 2409.01199 • Published Sep 2 • 12
Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation Paper • 2409.01055 • Published Sep 2 • 6
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency Paper • 2409.02634 • Published Sep 4 • 89
OSV: One Step is Enough for High-Quality Image to Video Generation Paper • 2409.11367 • Published Sep 17 • 13
Towards Diverse and Efficient Audio Captioning via Diffusion Models Paper • 2409.09401 • Published Sep 14 • 6
LVCD: Reference-based Lineart Video Colorization with Diffusion Models Paper • 2409.12960 • Published Sep 19 • 22
Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation Paper • 2409.12532 • Published Sep 19 • 5
MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling Paper • 2409.16160 • Published Sep 24 • 32
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation Paper • 2409.18964 • Published Sep 27 • 25
VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide Paper • 2410.04364 • Published Oct 6 • 26
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark Paper • 2410.03051 • Published Oct 4 • 3
Pyramidal Flow Matching for Efficient Video Generative Modeling Paper • 2410.05954 • Published Oct 8 • 37
T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design Paper • 2410.05677 • Published Oct 8 • 14
Loong: Generating Minute-level Long Videos with Autoregressive Language Models Paper • 2410.02757 • Published Oct 3 • 36
Animate-X: Universal Character Image Animation with Enhanced Motion Representation Paper • 2410.10306 • Published 27 days ago • 50
Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention Paper • 2410.10774 • Published 26 days ago • 23
LVD-2M: A Long-take Video Dataset with Temporally Dense Captions Paper • 2410.10816 • Published 26 days ago • 19
DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control Paper • 2410.13830 • Published 23 days ago • 23
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding Paper • 2410.17434 • Published 18 days ago • 24
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality Paper • 2410.19355 • Published 16 days ago • 20
MarDini: Masked Autoregressive Diffusion for Video Generation at Scale Paper • 2410.20280 • Published 14 days ago • 21
SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation Paper • 2410.23277 • Published 10 days ago • 7
Adaptive Caching for Faster Video Generation with Diffusion Transformers Paper • 2411.02397 • Published 5 days ago • 17