BigVGAN Collection BigVGAN is a universal neural vocoder that generates audio waveform using mel spectrogram as input. • 11 items • Updated Oct 1 • 10
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs Paper • 2407.04051 • Published Jul 4 • 35
Standard-format-preference-dataset Collection We collect the open-source datasets and process them into the standard format. • 14 items • Updated May 8 • 21
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models Paper • 2403.03100 • Published Mar 5 • 34
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition Paper • 2402.15504 • Published Feb 23 • 21
Seamless Communication Collection A significant step towards removing language barriers through expressive, fast and high-quality AI translation. • 16 items • Updated Jan 16 • 150
Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition Paper • 2309.15223 • Published Sep 26, 2023 • 19
Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack Paper • 2309.15807 • Published Sep 27, 2023 • 32
LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models Paper • 2309.15103 • Published Sep 26, 2023 • 42
A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models Paper • 2309.11674 • Published Sep 20, 2023 • 31
LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent Paper • 2309.12311 • Published Sep 21, 2023 • 17
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset Paper • 2309.11998 • Published Sep 21, 2023 • 24
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions Paper • 2309.10150 • Published Sep 18, 2023 • 24
Multimodal Foundation Models: From Specialists to General-Purpose Assistants Paper • 2309.10020 • Published Sep 18, 2023 • 40