Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2311.07919

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

Paper • 2311.07919 • Published Nov 14, 2023 • 9
mozilla-foundation/common_voice_17_0

Viewer • Updated Jun 16 • 13M • 23.4k • 167
Stable Audio Open

Paper • 2407.14358 • Published Jul 19 • 23
fnlp/AnyGPT-chat

Text Generation • Updated Jun 5 • 1.04k • 15

ModaVerse: Efficiently Transforming Modalities with LLMs

Paper • 2401.06395 • Published Jan 12 • 3
Boosting Large Language Model for Speech Synthesis: An Empirical Study

Paper • 2401.00246 • Published Dec 30, 2023 • 10
An Integration of Pre-Trained Speech and Language Models for End-to-End Speech Recognition

Paper • 2312.03668 • Published Dec 6, 2023 • 1
Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data

Paper • 2311.06753 • Published Nov 12, 2023 • 6

Papers - Audio - Captions

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

Paper • 2311.07919 • Published Nov 14, 2023 • 9
Audio Dialogues: Dialogues dataset for audio and music understanding

Paper • 2404.07616 • Published Apr 11 • 15

Papers - Audio - Understanding

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

Paper • 2311.07919 • Published Nov 14, 2023 • 9

Papers - Audio - Text to Speech

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Paper • 2404.03204 • Published Apr 4 • 7
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

Paper • 2311.07919 • Published Nov 14, 2023 • 9
FlashSpeech: Efficient Zero-Shot Speech Synthesis

Paper • 2404.14700 • Published Apr 23 • 29

Papers - Audio - TTS

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

Paper • 1712.05884 • Published Dec 16, 2017 • 2
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild

Paper • 2403.16973 • Published Mar 25 • 2
High Fidelity Neural Audio Compression

Paper • 2210.13438 • Published Oct 24, 2022 • 3
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Paper • 2404.03204 • Published Apr 4 • 7

Qwen Technical Report

Paper • 2309.16609 • Published Sep 28, 2023 • 34
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 6
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

Paper • 2311.07919 • Published Nov 14, 2023 • 9
Running

193

📷🎨👀

Qwen-VL-Plus

UniAudio: An Audio Foundation Model Toward Universal Audio Generation

Paper • 2310.00704 • Published Oct 1, 2023 • 19
Structural Similarities Between Language Models and Neural Response Measurements

Paper • 2306.01930 • Published Jun 2, 2023 • 2
Streaming Transformer ASR with Blockwise Synchronous Beam Search

Paper • 2006.14941 • Published Jun 25, 2020 • 2
NU-GAN: High resolution neural upsampling with GAN

Paper • 2010.11362 • Published Oct 22, 2020 • 2

Models - Audio - Translation

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

Paper • 2311.07919 • Published Nov 14, 2023 • 9

Papers - Synthetic Data

DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows

Paper • 2402.10379 • Published Feb 16 • 29
Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping

Paper • 1709.07857 • Published Sep 22, 2017 • 2
Simple synthetic data reduces sycophancy in large language models

Paper • 2308.03958 • Published Aug 7, 2023 • 21
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 6

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs