Zerpal Collection The largest open-source Udmurt monolingual corpora and pre-trained language models • 14 items • Updated Jun 14 • 1
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling Paper • 2311.00430 • Published Nov 1, 2023 • 57
SberQuAD -- Russian Reading Comprehension Dataset: Description and Analysis Paper • 1912.09723 • Published Dec 20, 2019 • 2
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset Paper • 2309.04662 • Published Sep 9, 2023 • 22