video - a zzfive Collection

Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

zzfive 's Collections

3d

image

LLMs

video

agent

cv

audio

robot

video

updated 4 days ago

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18 • 14
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects

Paper • 2401.09962 • Published Jan 18 • 7
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Paper • 2401.10404 • Published Jan 18 • 10
ActAnywhere: Subject-Aware Video Background Generation

Paper • 2401.10822 • Published Jan 19 • 13
Lumiere: A Space-Time Diffusion Model for Video Generation

Paper • 2401.12945 • Published Jan 23 • 86
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning

Paper • 2402.00769 • Published Feb 1 • 20
VideoPrism: A Foundational Visual Encoder for Video Understanding

Paper • 2402.13217 • Published Feb 20 • 21
Video ReCap: Recursive Captioning of Hour-Long Videos

Paper • 2402.13250 • Published Feb 20 • 24
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

Paper • 2402.14797 • Published Feb 22 • 19
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27 • 88
Sora Generates Videos with Stunning Geometrical Consistency

Paper • 2402.17403 • Published Feb 27 • 16
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

Paper • 2402.17723 • Published Feb 27 • 16
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Paper • 2402.19479 • Published Feb 29 • 32
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Paper • 2403.03100 • Published Mar 5 • 34
Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation

Paper • 2403.02827 • Published Mar 5 • 6
Video Editing via Factorized Diffusion Distillation

Paper • 2403.09334 • Published Mar 14 • 21
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

Paper • 2403.09626 • Published Mar 14 • 13
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

Paper • 2108.01073 • Published Aug 2, 2021 • 7
AnimateDiff-Lightning: Cross-Model Diffusion Distillation

Paper • 2403.12706 • Published Mar 19 • 17
Streaming Dense Video Captioning

Paper • 2404.01297 • Published Apr 1 • 11
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15 • 11
MotionMaster: Training-free Camera Motion Transfer For Video Generation

Paper • 2404.15789 • Published Apr 24 • 10
LLM-AD: Large Language Model based Audio Description System

Paper • 2405.00983 • Published May 2 • 16
FIFO-Diffusion: Generating Infinite Videos from Text without Training

Paper • 2405.11473 • Published May 19 • 53
ReVideo: Remake a Video with Motion and Content Control

Paper • 2405.13865 • Published May 22 • 22
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation

Paper • 2405.14598 • Published May 23 • 11
Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition

Paper • 2405.15216 • Published May 24 • 12
I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models

Paper • 2405.16537 • Published May 26 • 15
Looking Backward: Streaming Video-to-Video Translation with Feature Banks

Paper • 2405.15757 • Published May 24 • 14
Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer

Paper • 2405.17405 • Published May 27 • 14
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control

Paper • 2405.17414 • Published May 27 • 10
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning

Paper • 2405.18386 • Published May 28 • 20
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback

Paper • 2405.18750 • Published May 29 • 20
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

Paper • 2405.18991 • Published May 29 • 12
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

Paper • 2405.20222 • Published May 30 • 10
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark

Paper • 2405.19707 • Published May 30 • 4
Learning Temporally Consistent Video Depth from Video Diffusion Priors

Paper • 2406.01493 • Published Jun 3 • 17
ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation

Paper • 2406.00908 • Published Jun 3 • 11
Searching Priors Makes Text-to-Video Synthesis Better

Paper • 2406.03215 • Published Jun 5 • 11
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 71
SF-V: Single Forward Video Generation Model

Paper • 2406.04324 • Published Jun 6 • 23
VideoTetris: Towards Compositional Text-to-Video Generation

Paper • 2406.04277 • Published Jun 6 • 22
MotionClone: Training-Free Motion Cloning for Controllable Video Generation

Paper • 2406.05338 • Published Jun 8 • 39
NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing

Paper • 2406.06523 • Published Jun 10 • 50
Hierarchical Patch Diffusion Models for High-Resolution Video Generation

Paper • 2406.07792 • Published Jun 12 • 13
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation

Paper • 2406.07686 • Published Jun 11 • 14
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation

Paper • 2406.08656 • Published Jun 12 • 7
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality

Paper • 2406.08845 • Published Jun 13 • 8
ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning

Paper • 2406.14130 • Published Jun 20 • 10
MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

Paper • 2406.15252 • Published Jun 21 • 14
Video-Infinity: Distributed Long Video Generation

Paper • 2406.16260 • Published Jun 24 • 28
DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models

Paper • 2407.01519 • Published Jul 1 • 22
SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix

Paper • 2407.00367 • Published Jun 29 • 9
VIMI: Grounding Video Generation through Multi-modal Instruction

Paper • 2407.06304 • Published Jul 8 • 9
VEnhancer: Generative Space-Time Enhancement for Video Generation

Paper • 2407.07667 • Published Jul 10 • 12
Still-Moving: Customized Video Generation without Customized Video Data

Paper • 2407.08674 • Published Jul 11 • 12
CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation

Paper • 2407.06188 • Published Jul 8 • 1
TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models

Paper • 2407.09012 • Published Jul 12 • 8
Video Occupancy Models

Paper • 2407.09533 • Published Jun 25 • 6
Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models

Paper • 2407.10285 • Published Jul 14 • 4
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control

Paper • 2407.12781 • Published Jul 17 • 12
Towards Understanding Unsafe Video Generation

Paper • 2407.12581 • Published Jul 17
Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion

Paper • 2407.13759 • Published Jul 18 • 17
Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models

Paper • 2407.15642 • Published Jul 22 • 10
MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence

Paper • 2407.16655 • Published Jul 23 • 27
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation

Paper • 2407.14505 • Published Jul 19 • 23
FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention

Paper • 2407.19918 • Published Jul 29 • 47
Tora: Trajectory-oriented Diffusion Transformer for Video Generation

Paper • 2407.21705 • Published Jul 31 • 25
Fine-gained Zero-shot Video Sampling

Paper • 2407.21475 • Published Jul 31 • 5
Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion

Paper • 2408.00458 • Published Aug 1 • 10
UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model

Paper • 2408.00762 • Published Aug 1 • 9
VidGen-1M: A Large-Scale Dataset for Text-to-video Generation

Paper • 2408.02629 • Published Aug 5 • 13
ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer

Paper • 2408.03284 • Published Aug 6 • 9
Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics

Paper • 2408.04631 • Published Aug 8 • 8
Kalman-Inspired Feature Propagation for Video Face Super-Resolution

Paper • 2408.05205 • Published Aug 9 • 8
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Paper • 2408.06072 • Published Aug 12 • 35
FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance

Paper • 2408.08189 • Published Aug 15 • 14
Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data

Paper • 2408.10119 • Published Aug 19 • 15
TWLV-I: Analysis and Insights from Holistic Evaluation on Video Foundation Models

Paper • 2408.11318 • Published Aug 21 • 54
TrackGo: A Flexible and Efficient Method for Controllable Video Generation

Paper • 2408.11475 • Published Aug 21 • 16
Real-Time Video Generation with Pyramid Attention Broadcast

Paper • 2408.12588 • Published Aug 22 • 14
CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities

Paper • 2408.13239 • Published Aug 23 • 10
Training-free Long Video Generation with Chain of Diffusion Model Experts

Paper • 2408.13423 • Published Aug 24 • 20
TVG: A Training-free Transition Video Generation Method with Diffusion Models

Paper • 2408.13413 • Published Aug 24 • 13
Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation

Paper • 2408.15239 • Published Aug 27 • 27
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model

Paper • 2409.01199 • Published Sep 2 • 12
Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation

Paper • 2409.01055 • Published Sep 2 • 6
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency

Paper • 2409.02634 • Published Sep 4 • 89
OSV: One Step is Enough for High-Quality Image to Video Generation

Paper • 2409.11367 • Published Sep 17 • 13
Towards Diverse and Efficient Audio Captioning via Diffusion Models

Paper • 2409.09401 • Published Sep 14 • 6
LVCD: Reference-based Lineart Video Colorization with Diffusion Models

Paper • 2409.12960 • Published Sep 19 • 22
Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation

Paper • 2409.12532 • Published Sep 19 • 5
MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling

Paper • 2409.16160 • Published Sep 24 • 32
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation

Paper • 2409.18964 • Published Sep 27 • 25
VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide

Paper • 2410.04364 • Published Oct 6 • 26
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

Paper • 2410.03051 • Published Oct 4 • 3
Pyramidal Flow Matching for Efficient Video Generative Modeling

Paper • 2410.05954 • Published Oct 8 • 37
T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design

Paper • 2410.05677 • Published Oct 8 • 14
Loong: Generating Minute-level Long Videos with Autoregressive Language Models

Paper • 2410.02757 • Published Oct 3 • 36
Animate-X: Universal Character Image Animation with Enhanced Motion Representation

Paper • 2410.10306 • Published 27 days ago • 50
Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention

Paper • 2410.10774 • Published 26 days ago • 23
LVD-2M: A Long-take Video Dataset with Temporally Dense Captions

Paper • 2410.10816 • Published 26 days ago • 19
Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published 23 days ago • 86
DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control

Paper • 2410.13830 • Published 23 days ago • 23
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding

Paper • 2410.17434 • Published 18 days ago • 24
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality

Paper • 2410.19355 • Published 16 days ago • 20
MarDini: Masked Autoregressive Diffusion for Video Generation at Scale

Paper • 2410.20280 • Published 14 days ago • 21
SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation

Paper • 2410.23277 • Published 10 days ago • 7
Fashion-VDM: Video Diffusion Model for Virtual Try-On

Paper • 2411.00225 • Published 9 days ago • 7
Adaptive Caching for Faster Video Generation with Diffusion Transformers

Paper • 2411.02397 • Published 5 days ago • 17

Collection guide
Browse collections

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs