|
--- |
|
library_name: fasttext |
|
tags: |
|
- text-classification |
|
- language-identification |
|
--- |
|
This is a fastText-based language classification model from the paper [The first neural machine translation system for the Erzya language](https://arxiv.org/abs/2209.09368). |
|
|
|
It supports 323 languages used in Wikipedia (as of July 2022), and has extended support of the Erzya (`myv`) and Moksha (`mdf`) languages. |
|
|
|
Example usage: |
|
|
|
```Python |
|
import fasttext |
|
import urllib.request |
|
import os |
|
model_path = 'lid.323.ftz' |
|
url = 'https://huggingface.co/slone/fastText-LID-323/resolve/main/lid.323.ftz' |
|
if not os.path.exists(model_path): |
|
urllib.request.urlretrieve(url, model_path) # or just download it manually |
|
|
|
model = fasttext.load_model(model_path) |
|
languages, scores = model.predict("эрзянь кель", k=3) # k is the number of returned hypotheses |
|
``` |
|
|
|
The model was trained on texts of articles randomly sampled from Wikipedia. It works better with sentences and longer texts than with words, and may be sensitive to noise. |