|
|
|
--- |
|
language: ky |
|
datasets: |
|
- wikiann |
|
examples: |
|
widget: |
|
- text: "Бириккен Улуттар Уюму" |
|
example_title: "Sentence_1" |
|
- text: "Жусуп Мамай" |
|
example_title: "Sentence_2" |
|
--- |
|
|
|
<h1>Kyrgyz Named Entity Recognition</h1> |
|
Fine-tuning bert-base-multilingual-cased on Wikiann dataset for performing NER on Kyrgyz language. |
|
|
|
WARNING: this model is not usable (see metrics below) and is built just as a proof of concept. |
|
I'll update the model after cleaning up the Wikiann dataset (`ky` part of it which contains only 100 train/test/valid items) or coming up with a completely new dataset. |
|
|
|
|
|
## Label ID and its corresponding label name |
|
|
|
| Label ID | Label Name| |
|
| -------- | ----- | |
|
| 0 | O | |
|
| 1 | B-PER | |
|
| 2 | I-PER | |
|
| 3 | B-ORG| |
|
| 4 | I-ORG | |
|
| 5 | B-LOC | |
|
| 6 | I-LOC | |
|
|
|
<h1>Results</h1> |
|
|
|
| Name | Overall F1 | LOC F1 | ORG F1 | PER F1 | |
|
| ---- | -------- | ----- | ---- | ---- | |
|
| Train set | 0.595683 | 0.570312 | 0.687179 | 0.549180 | |
|
| Validation set | 0.461333 | 0.551181 | 0.401913 | 0.425087 | |
|
| Test set | 0.442622 | 0.456852 | 0.469565 | 0.413114 | |
|
|
|
|
|
Example |
|
```py |
|
from transformers import AutoTokenizer, AutoModelForTokenClassification |
|
from transformers import pipeline |
|
tokenizer = AutoTokenizer.from_pretrained("murat/kyrgyz_language_NER") |
|
model = AutoModelForTokenClassification.from_pretrained("murat/kyrgyz_language_NER") |
|
nlp = pipeline("ner", model=model, tokenizer=tokenizer) |
|
example = "Жусуп Мамай" |
|
ner_results = nlp(example) |
|
ner_results |
|
``` |
|
|