murat
/

kyrgyz_language_NER

+---
+language: ky
+datasets:
+- wikiann
+examples:
+widget:
+- text: "Бириккен Улуттар Уюму"
+  example_title: "Sentence_1"
+- text: "Жусуп Мамай"
+  example_title: "Sentence_2"
+---
+<h1>Kyrgyz Named Entity Recognition</h1>
+Fine-tuning bert-base-multilingual-cased on Wikiann dataset for performing NER on Kyrgyz language.
+WARNING: this model is not usable (see metrics below). I'll update the model after cleaning up the Wikiann dataset and re-training.
+## Label ID and its corresponding label name
+| Label ID | Label Name|
+| -------- | ----- |
+| 0 | O |
+| 1 | B-PER |
+| 2 | I-PER |
+| 3 | B-ORG|
+| 4 | I-ORG |
+| 5 | B-LOC |
+| 6 | I-LOC |
+<h1>Results</h1>
+| Name | Overall F1 | LOC F1 | ORG F1 | PER F1 |
+| ---- | -------- | ----- | ---- | ---- |
+| Train set | 0.595683 | 0.570312 | 0.687179 | 0.549180 |
+| Validation set | 0.461333 | 0.551181 |  0.401913 | 0.425087 |
+| Test set | 0.442622 | 0.456852 | 0.469565 | 0.413114 |
+Example
+```py
+from transformers import AutoTokenizer, AutoModelForTokenClassification
+from transformers import pipeline
+tokenizer = AutoTokenizer.from_pretrained("murat/kyrgyz_language_NER")
+model = AutoModelForTokenClassification.from_pretrained("murat/kyrgyz_language_NER")
+nlp = pipeline("ner", model=model, tokenizer=tokenizer)
+example = "Жусуп Мамай"
+ner_results = nlp(example)
+ner_results
+```