-
Large-Scale Automatic Audiobook Creation
Paper • 2309.03926 • Published • 53 -
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Paper • 2402.08093 • Published • 54 -
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Paper • 2403.03100 • Published • 34 -
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
Paper • 2406.05370 • Published • 14
Collections
Discover the best community collections!
Collections including paper arxiv:2403.03100
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper • 2401.09985 • Published • 14 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper • 2401.09962 • Published • 7 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper • 2401.10404 • Published • 10 -
ActAnywhere: Subject-Aware Video Background Generation
Paper • 2401.10822 • Published • 13
-
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
Paper • 2312.02087 • Published • 20 -
FaceStudio: Put Your Face Everywhere in Seconds
Paper • 2312.02663 • Published • 30 -
Orthogonal Adaptation for Modular Customization of Diffusion Models
Paper • 2312.02432 • Published • 12 -
ReconFusion: 3D Reconstruction with Diffusion Priors
Paper • 2312.02981 • Published • 8
-
A survey on Kornia: an Open Source Differentiable Computer Vision Library for PyTorch
Paper • 2009.10521 • Published • 1 -
Kornia: an Open Source Differentiable Computer Vision Library for PyTorch
Paper • 1910.02190 • Published • 1 -
Learning Symmetrization for Equivariance with Orbit Distance Minimization
Paper • 2311.07143 • Published • 1 -
GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting
Paper • 2311.11700 • Published • 4