File size: 1,068 Bytes
5b6bd13 afb4a28 d3b86c8 424a3ef d650a20 5b6bd13 afb4a28 9eebb52 afb4a28 3ad19c0 afb4a28 b4bd089 afb4a28 b4bd089 afb4a28 b4bd089 afb4a28 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
---
license: mit
datasets:
- mozilla-foundation/common_voice_13_0
language:
- ca
- bg
- cs
- fi
- gl
- hi
- hu
- pl
- ro
- sk
- ta
- th
tags:
- automatic-speech-recognition
inference: false
pipeline_tag: automatic-speech-recognition
---
## About
Multilingual Distilwhisper allows for better ASR performance in target languages by adding lightweight CLSR modules on top of whisper-small.
These modules are trained on a mix of cross-entropy (ASR) and knowledge distillation losses, where whisper-large-v2 is used as teacher.
## Inference
Code for training and inference at: https://github.com/naver/multilingual-distilwhisper
## Citation
```
@inproceedings{ferraz2024distilwhisper,
title={Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech Models via Language-Specific Experts},
author={Ferraz, Thomas Palmeira and Boito, Marcely Zanon and Brun, Caroline and Nikoulina, Vassilina},
booktitle={ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
year={2024},
organization={IEEE}
}
``` |