-
Music ControlNet: Multiple Time-varying Controls for Music Generation
Paper • 2311.07069 • Published • 43 -
FLAP: Fast Language-Audio Pre-training
Paper • 2311.01615 • Published • 16 -
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models
Paper • 2310.11954 • Published • 24 -
MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies
Paper • 2308.01546 • Published • 17
Collections
Discover the best community collections!
Collections including paper arxiv:2311.01615
-
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Paper • 2311.00430 • Published • 56 -
MSTRE-Net: Multistreaming Acoustic Modeling for Automatic Lyrics Transcription
Paper • 2108.02625 • Published • 1 -
FLAP: Fast Language-Audio Pre-training
Paper • 2311.01615 • Published • 16 -
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
Paper • 2402.01831 • Published • 13
-
Large-Scale Automatic Audiobook Creation
Paper • 2309.03926 • Published • 53 -
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Paper • 2310.00704 • Published • 19 -
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts
Paper • 2309.11977 • Published • 2 -
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
Paper • 2308.16692 • Published • 1
-
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 14 -
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Paper • 2310.14566 • Published • 25 -
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 6 -
Conditional Diffusion Distillation
Paper • 2310.01407 • Published • 20
-
NExT-GPT: Any-to-Any Multimodal LLM
Paper • 2309.05519 • Published • 78 -
Large Language Model for Science: A Study on P vs. NP
Paper • 2309.05689 • Published • 20 -
AstroLLaMA: Towards Specialized Foundation Models in Astronomy
Paper • 2309.06126 • Published • 16 -
Large Language Models for Compiler Optimization
Paper • 2309.07062 • Published • 22
-
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 96 -
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Paper • 2310.11511 • Published • 74 -
In-Context Learning Creates Task Vectors
Paper • 2310.15916 • Published • 41 -
Matryoshka Diffusion Models
Paper • 2310.15111 • Published • 40
-
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Paper • 2309.04662 • Published • 22 -
Neurons in Large Language Models: Dead, N-gram, Positional
Paper • 2309.04827 • Published • 16 -
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Paper • 2309.05516 • Published • 9 -
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs
Paper • 2309.03907 • Published • 8