Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2311.00430

Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling

Paper • 2311.00430 • Published Nov 1, 2023 • 56
LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT

Paper • 2306.17103 • Published Jun 29, 2023 • 1

Matryoshka Diffusion Models

Paper • 2310.15111 • Published Oct 23, 2023 • 40
Data Filtering Networks

Paper • 2309.17425 • Published Sep 29, 2023 • 6
FlashDecoding++: Faster Large Language Model Inference on GPUs

Paper • 2311.01282 • Published Nov 2, 2023 • 35
E3 TTS: Easy End-to-End Diffusion-based Text to Speech

Paper • 2311.00945 • Published Nov 2, 2023 • 14

HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models

Paper • 2309.15701 • Published Sep 27, 2023 • 2
CoLLD: Contrastive Layer-to-layer Distillation for Compressing Multilingual Pre-trained Speech Encoders

Paper • 2309.07707 • Published Sep 14, 2023 • 1
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling

Paper • 2311.00430 • Published Nov 1, 2023 • 56
Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

Paper • 2309.13876 • Published Sep 25, 2023 • 1

Speech Translation

Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling

Paper • 2311.00430 • Published Nov 1, 2023 • 56
Efficient yet Competitive Speech Translation: FBK@IWSLT2022

Paper • 2205.02629 • Published May 5, 2022 • 1
Speechformer: Reducing Information Loss in Direct Speech Translation

Paper • 2109.04574 • Published Sep 9, 2021 • 1
Joint Speech Translation and Named Entity Recognition

Paper • 2210.11987 • Published Oct 21, 2022 • 1

Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling

Paper • 2311.00430 • Published Nov 1, 2023 • 56
MSTRE-Net: Multistreaming Acoustic Modeling for Automatic Lyrics Transcription

Paper • 2108.02625 • Published Aug 5, 2021 • 1
FLAP: Fast Language-Audio Pre-training

Paper • 2311.01615 • Published Nov 2, 2023 • 16
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities

Paper • 2402.01831 • Published Feb 2 • 13

Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling

Paper • 2311.00430 • Published Nov 1, 2023 • 56
Idempotent Generative Network

Paper • 2311.01462 • Published Nov 2, 2023 • 24

Distil-Whisper Models

The first version of the Distil-Whisper models released with the Distil-Whisper paper.

Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling

Paper • 2311.00430 • Published Nov 1, 2023 • 56
distil-whisper/distil-large-v2

Automatic Speech Recognition • Updated Mar 21 • 67.2k • 505
distil-whisper/distil-medium.en

Automatic Speech Recognition • Updated Mar 25 • 555k • 119
distil-whisper/distil-small.en

Automatic Speech Recognition • Updated Mar 25 • 23.3k • 90

Detecting Pretraining Data from Large Language Models

Paper • 2310.16789 • Published Oct 25, 2023 • 10
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models

Paper • 2310.13671 • Published Oct 20, 2023 • 18
AutoMix: Automatically Mixing Language Models

Paper • 2310.12963 • Published Oct 19, 2023 • 14
An Emulator for Fine-Tuning Large Language Models using Small Language Models

Paper • 2310.12962 • Published Oct 19, 2023 • 14

Large-Scale Automatic Audiobook Creation

Paper • 2309.03926 • Published Sep 7, 2023 • 53
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts

Paper • 2309.11977 • Published Sep 21, 2023 • 2
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models

Paper • 2308.16692 • Published Aug 31, 2023 • 1
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

Paper • 2308.05734 • Published Aug 10, 2023 • 36

Large-Scale Automatic Audiobook Creation

Paper • 2309.03926 • Published Sep 7, 2023 • 53
UniAudio: An Audio Foundation Model Toward Universal Audio Generation

Paper • 2310.00704 • Published Oct 1, 2023 • 19
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts

Paper • 2309.11977 • Published Sep 21, 2023 • 2
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models

Paper • 2308.16692 • Published Aug 31, 2023 • 1

Previous
1
2
3
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs