Update README.md
Browse files
README.md
CHANGED
@@ -6,6 +6,9 @@ datasets:
|
|
6 |
- universal_dependencies
|
7 |
metrics:
|
8 |
- accuracy
|
|
|
|
|
|
|
9 |
model-index:
|
10 |
- name: wangchanberta-ud-thai-pud-upos
|
11 |
results:
|
@@ -22,6 +25,9 @@ model-index:
|
|
22 |
- name: Accuracy
|
23 |
type: accuracy
|
24 |
value: 0.9883334914161055
|
|
|
|
|
|
|
25 |
---
|
26 |
|
27 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
@@ -42,17 +48,20 @@ It achieves the following results on the evaluation set:
|
|
42 |
|
43 |
## Model description
|
44 |
|
45 |
-
|
46 |
|
47 |
-
##
|
|
|
|
|
48 |
|
49 |
-
|
|
|
50 |
|
51 |
-
|
52 |
-
|
53 |
-
|
54 |
-
|
55 |
-
|
56 |
|
57 |
### Training hyperparameters
|
58 |
|
@@ -86,4 +95,4 @@ The following hyperparameters were used during training:
|
|
86 |
- Transformers 4.34.1
|
87 |
- Pytorch 2.1.0+cu118
|
88 |
- Datasets 2.14.6
|
89 |
-
- Tokenizers 0.14.1
|
|
|
6 |
- universal_dependencies
|
7 |
metrics:
|
8 |
- accuracy
|
9 |
+
- recall
|
10 |
+
- precision
|
11 |
+
- f1
|
12 |
model-index:
|
13 |
- name: wangchanberta-ud-thai-pud-upos
|
14 |
results:
|
|
|
25 |
- name: Accuracy
|
26 |
type: accuracy
|
27 |
value: 0.9883334914161055
|
28 |
+
language:
|
29 |
+
- th
|
30 |
+
library_name: transformers
|
31 |
---
|
32 |
|
33 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
|
|
48 |
|
49 |
## Model description
|
50 |
|
51 |
+
This model is train on thai UD Thai PUD corpus with `Universal Part-of-speech (UPOS)` tag to help with pos tagging in Thai language.
|
52 |
|
53 |
+
## Example
|
54 |
+
```python
|
55 |
+
from transformers import AutoModelForTokenClassification, AutoTokenizer, TokenClassificationPipeline
|
56 |
|
57 |
+
model = AutoModelForTokenClassification.from_pretrained("Pavarissy/wangchanberta-ud-thai-pud-upos")
|
58 |
+
tokenizer = AutoTokenizer.from_pretrained("Pavarissy/wangchanberta-ud-thai-pud-upos")
|
59 |
|
60 |
+
pipeline = TokenClassificationPipeline(model=model, tokenizer=tokenizer, grouped_entities=True)
|
61 |
+
outputs = pipeline("ประเทศไทย อยู่ใน ทวีป เอเชีย")
|
62 |
+
print(outputs)
|
63 |
+
# [{'entity_group': 'NOUN', 'score': 0.419697, 'word': '', 'start': 0, 'end': 1}, {'entity_group': 'PROPN', 'score': 0.8809489, 'word': 'ประเทศไทย', 'start': 0, 'end': 9}, {'entity_group': 'VERB', 'score': 0.7754166, 'word': 'อยู่ใน', 'start': 9, 'end': 16}, {'entity_group': 'NOUN', 'score': 0.9976932, 'word': 'ทวีป', 'start': 16, 'end': 21}, {'entity_group': 'PROPN', 'score': 0.97770107, 'word': 'เอเชีย', 'start': 21, 'end': 28}]
|
64 |
+
```
|
65 |
|
66 |
### Training hyperparameters
|
67 |
|
|
|
95 |
- Transformers 4.34.1
|
96 |
- Pytorch 2.1.0+cu118
|
97 |
- Datasets 2.14.6
|
98 |
+
- Tokenizers 0.14.1
|