Automatic Speech Recognition
Transformers
PyTorch
distilwhisper
text2text-generation
File size: 1,132 Bytes
5b6bd13
 
afb4a28
 
 
 
 
 
 
 
 
 
 
 
 
 
d3b86c8
424a3ef
 
 
d650a20
5b6bd13
afb4a28
 
 
9eebb52
afb4a28
ced63b0
afb4a28
 
 
3ad19c0
afb4a28
b4bd089
afb4a28
b4bd089
 
afb4a28
b4bd089
 
 
afb4a28
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
---
license: mit
datasets:
- mozilla-foundation/common_voice_13_0
language:
- ca
- bg
- cs
- fi
- gl
- hi
- hu
- pl
- ro
- sk
- ta
- th
tags:
- automatic-speech-recognition
inference: false
pipeline_tag: automatic-speech-recognition
---

## About

Multilingual Distilwhisper allows for better ASR performance in target languages by adding lightweight CLSR modules on top of whisper-small. 
These modules are trained on a mix of cross-entropy (ASR) and knowledge distillation losses, where whisper-large-v2 is used as teacher. 
More details in the ICASSP 2024 paper: arxiv.org/abs/2311.01070

## Inference

Code for training and inference at: https://github.com/naver/multilingual-distilwhisper

## Citation
```
@inproceedings{ferraz2024distilwhisper,
  title={Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech Models via Language-Specific Experts},
  author={Ferraz, Thomas Palmeira and Boito, Marcely Zanon and Brun, Caroline and Nikoulina, Vassilina},
  booktitle={ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2024},
  organization={IEEE}
}
```