murat
/

kyrgyz_language_NER

Token Classification

Inference Endpoints

Model card Files Files and versions Community

kyrgyz_language_NER / README.md

murat's picture

Update README.md

1f3ff1b about 2 years ago

|

history blame contribute delete

1.5 kB


	---
	language: ky
	datasets:
	- wikiann
	examples:
	widget:
	- text: "Бириккен Улуттар Уюму"
	example_title: "Sentence_1"
	- text: "Жусуп Мамай"
	example_title: "Sentence_2"
	---

	<h1>Kyrgyz Named Entity Recognition</h1>
	Fine-tuning bert-base-multilingual-cased on Wikiann dataset for performing NER on Kyrgyz language.

	WARNING: this model is not usable (see metrics below) and is built just as a proof of concept.
	I'll update the model after cleaning up the Wikiann dataset (`ky` part of it which contains only 100 train/test/valid items) or coming up with a completely new dataset.


	## Label ID and its corresponding label name

	\| Label ID \| Label Name\|
	\| -------- \| ----- \|
	\| 0 \| O \|
	\| 1 \| B-PER \|
	\| 2 \| I-PER \|
	\| 3 \| B-ORG\|
	\| 4 \| I-ORG \|
	\| 5 \| B-LOC \|
	\| 6 \| I-LOC \|

	<h1>Results</h1>

	\| Name \| Overall F1 \| LOC F1 \| ORG F1 \| PER F1 \|
	\| ---- \| -------- \| ----- \| ---- \| ---- \|
	\| Train set \| 0.595683 \| 0.570312 \| 0.687179 \| 0.549180 \|
	\| Validation set \| 0.461333 \| 0.551181 \| 0.401913 \| 0.425087 \|
	\| Test set \| 0.442622 \| 0.456852 \| 0.469565 \| 0.413114 \|


	Example
	```py
	from transformers import AutoTokenizer, AutoModelForTokenClassification
	from transformers import pipeline
	tokenizer = AutoTokenizer.from_pretrained("murat/kyrgyz_language_NER")
	model = AutoModelForTokenClassification.from_pretrained("murat/kyrgyz_language_NER")
	nlp = pipeline("ner", model=model, tokenizer=tokenizer)
	example = "Жусуп Мамай"
	ner_results = nlp(example)
	ner_results
	```