Massively Multilingual Speech (MMS) - Finetuned ASR - ALL
This is a checkpoint of MMS Zero-shot project, a model to transcribe the speech of almost any language using only a small amount of unlabeled text in the new language. The approach is based on a multilingual acoustic model trained on data in 1,150 languages (leveraging the data of MMS) which outputs transcriptions in an intermediate representation (uroman tokens). A small amount of text in the new, unseen language is then also mapped to the this intermediate representation and at infernce time, this mapping, with an optional language model, enables transcribing a new language.
Table Of Content
Example
Please have a look at the official space for an example on using the model.
Model details
Developed by: Jinming Zhao et al.
Model type: Scaling A Simple Approach to Zero-Shot Speech Recognition
License: CC-BY-NC 4.0 license
Num parameters: 300 million
Cite as:
@article{zhao2024scaling, title={Scaling A Simple Approach to Zero-Shot Speech Recognition}, author={Zhao, Jinming and Pratap, Vineel and Auli, Michael}, journal={arXiv preprint arXiv:2407.17852}, year={2024} }
Additional Links
- Downloads last month
- 122