-
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
Paper • 2409.00750 • Published • 2 -
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Paper • 2410.06885 • Published • 40 -
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
Paper • 2409.10058 • Published • 1 -
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Paper • 2406.18009 • Published • 19
Collections
Discover the best community collections!
Collections including paper arxiv:2406.18009
-
MotionLLM: Understanding Human Behaviors from Human Motions and Videos
Paper • 2405.20340 • Published • 19 -
Spectrally Pruned Gaussian Fields with Neural Compensation
Paper • 2405.00676 • Published • 8 -
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Paper • 2404.18212 • Published • 27 -
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper • 2405.00732 • Published • 118
-
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Paper • 2406.06525 • Published • 64 -
Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step
Paper • 2406.04314 • Published • 26 -
Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching
Paper • 2405.11252 • Published • 12 -
Reward Steering with Evolutionary Heuristics for Decoding-time Alignment
Paper • 2406.15193 • Published • 12
-
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
Paper • 2406.06469 • Published • 23 -
Mixture-of-Agents Enhances Large Language Model Capabilities
Paper • 2406.04692 • Published • 55 -
CRAG -- Comprehensive RAG Benchmark
Paper • 2406.04744 • Published • 41 -
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Paper • 2406.04325 • Published • 71
-
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Paper • 2402.17485 • Published • 188 -
MusicHiFi: Fast High-Fidelity Stereo Vocoding
Paper • 2403.10493 • Published • 16 -
Music Consistency Models
Paper • 2404.13358 • Published • 12 -
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Paper • 2406.02430 • Published • 29
-
Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like
Paper • 2402.07383 • Published • 13 -
Matcha-TTS: A fast TTS architecture with conditional flow matching
Paper • 2309.03199 • Published • 11 -
Natural language guidance of high-fidelity text-to-speech with synthetic annotations
Paper • 2402.01912 • Published • 11 -
Fast Timing-Conditioned Latent Audio Diffusion
Paper • 2402.04825 • Published • 7