slone
/

fastText-LID-323

Text Classification

language-identification

Model card Files Files and versions Community

fastText-LID-323 / README.md

cointegrated's picture

Update README.md

487c6ac over 1 year ago

|

history blame contribute delete

1.02 kB

	---
	library_name: fasttext
	tags:
	- text-classification
	- language-identification
	---
	This is a fastText-based language classification model from the paper [The first neural machine translation system for the Erzya language](https://arxiv.org/abs/2209.09368).

	It supports 323 languages used in Wikipedia (as of July 2022), and has extended support of the Erzya (`myv`) and Moksha (`mdf`) languages.

	Example usage:

	```Python
	import fasttext
	import urllib.request
	import os
	model_path = 'lid.323.ftz'
	url = 'https://huggingface.co/slone/fastText-LID-323/resolve/main/lid.323.ftz'
	if not os.path.exists(model_path):
	urllib.request.urlretrieve(url, model_path) # or just download it manually

	model = fasttext.load_model(model_path)
	languages, scores = model.predict("эрзянь кель", k=3) # k is the number of returned hypotheses
	```

	The model was trained on texts of articles randomly sampled from Wikipedia. It works better with sentences and longer texts than with words, and may be sensitive to noise.