Llama 3.1 Collection This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Sep 25 • 609
Proactive Detection of Voice Cloning with Localized Watermarking Paper • 2401.17264 • Published Jan 30 • 16
Masked Audio Generation using a Single Non-Autoregressive Transformer Paper • 2401.04577 • Published Jan 9 • 41
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 258
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit Paper • 2312.09911 • Published Dec 15, 2023 • 53
Order Matters in the Presence of Dataset Imbalance for Multilingual Learning Paper • 2312.06134 • Published Dec 11, 2023 • 2
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration Paper • 2311.04257 • Published Nov 7, 2023 • 20
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis Paper • 2312.03491 • Published Dec 6, 2023 • 34
Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models Paper • 2312.03632 • Published Dec 6, 2023 • 4
Mamba: Linear-Time Sequence Modeling with Selective State Spaces Paper • 2312.00752 • Published Dec 1, 2023 • 138
Merlin:Empowering Multimodal LLMs with Foresight Minds Paper • 2312.00589 • Published Nov 30, 2023 • 24
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis Paper • 2311.12454 • Published Nov 21, 2023 • 29
UniAudio: An Audio Foundation Model Toward Universal Audio Generation Paper • 2310.00704 • Published Oct 1, 2023 • 19
Music ControlNet: Multiple Time-varying Controls for Music Generation Paper • 2311.07069 • Published Nov 13, 2023 • 43