Upload README.md with huggingface_hub
Browse files
README.md
ADDED
@@ -0,0 +1,50 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
---
|
3 |
+
language: ky
|
4 |
+
datasets:
|
5 |
+
- wikiann
|
6 |
+
examples:
|
7 |
+
widget:
|
8 |
+
- text: "Бириккен Улуттар Уюму"
|
9 |
+
example_title: "Sentence_1"
|
10 |
+
- text: "Жусуп Мамай"
|
11 |
+
example_title: "Sentence_2"
|
12 |
+
---
|
13 |
+
|
14 |
+
<h1>Kyrgyz Named Entity Recognition</h1>
|
15 |
+
Fine-tuning bert-base-multilingual-cased on Wikiann dataset for performing NER on Kyrgyz language.
|
16 |
+
WARNING: this model is not usable (see metrics below). I'll update the model after cleaning up the Wikiann dataset and re-training.
|
17 |
+
|
18 |
+
|
19 |
+
## Label ID and its corresponding label name
|
20 |
+
|
21 |
+
| Label ID | Label Name|
|
22 |
+
| -------- | ----- |
|
23 |
+
| 0 | O |
|
24 |
+
| 1 | B-PER |
|
25 |
+
| 2 | I-PER |
|
26 |
+
| 3 | B-ORG|
|
27 |
+
| 4 | I-ORG |
|
28 |
+
| 5 | B-LOC |
|
29 |
+
| 6 | I-LOC |
|
30 |
+
|
31 |
+
<h1>Results</h1>
|
32 |
+
|
33 |
+
| Name | Overall F1 | LOC F1 | ORG F1 | PER F1 |
|
34 |
+
| ---- | -------- | ----- | ---- | ---- |
|
35 |
+
| Train set | 0.595683 | 0.570312 | 0.687179 | 0.549180 |
|
36 |
+
| Validation set | 0.461333 | 0.551181 | 0.401913 | 0.425087 |
|
37 |
+
| Test set | 0.442622 | 0.456852 | 0.469565 | 0.413114 |
|
38 |
+
|
39 |
+
|
40 |
+
Example
|
41 |
+
```py
|
42 |
+
from transformers import AutoTokenizer, AutoModelForTokenClassification
|
43 |
+
from transformers import pipeline
|
44 |
+
tokenizer = AutoTokenizer.from_pretrained("murat/kyrgyz_language_NER")
|
45 |
+
model = AutoModelForTokenClassification.from_pretrained("murat/kyrgyz_language_NER")
|
46 |
+
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
|
47 |
+
example = "Жусуп Мамай"
|
48 |
+
ner_results = nlp(example)
|
49 |
+
ner_results
|
50 |
+
```
|