Pavarissy
/

wangchanberta-ud-thai-pud-upos

Token Classification

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

Pavarissy commited on Nov 2, 2023

Commit

5e464ad

•

1 Parent(s): 1ebd810

Update README.md

Files changed (1) hide show

README.md +18 -9

README.md CHANGED Viewed

@@ -6,6 +6,9 @@ datasets:
 - universal_dependencies
 metrics:
 - accuracy
 model-index:
 - name: wangchanberta-ud-thai-pud-upos
   results:
@@ -22,6 +25,9 @@ model-index:
     - name: Accuracy
       type: accuracy
       value: 0.9883334914161055
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -42,17 +48,20 @@ It achieves the following results on the evaluation set:
 ## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
 ### Training hyperparameters
@@ -86,4 +95,4 @@ The following hyperparameters were used during training:
 - Transformers 4.34.1
 - Pytorch 2.1.0+cu118
 - Datasets 2.14.6
-- Tokenizers 0.14.1

 - universal_dependencies
 metrics:
 - accuracy
+- recall
+- precision
+- f1
 model-index:
 - name: wangchanberta-ud-thai-pud-upos
   results:
     - name: Accuracy
       type: accuracy
       value: 0.9883334914161055
+language:
+- th
+library_name: transformers
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 ## Model description
+This model is train on thai UD Thai PUD corpus with `Universal Part-of-speech (UPOS)` tag to help with pos tagging in Thai language.
+## Example
+```python
+from transformers import AutoModelForTokenClassification, AutoTokenizer, TokenClassificationPipeline
+model = AutoModelForTokenClassification.from_pretrained("Pavarissy/wangchanberta-ud-thai-pud-upos")
+tokenizer = AutoTokenizer.from_pretrained("Pavarissy/wangchanberta-ud-thai-pud-upos")
+pipeline = TokenClassificationPipeline(model=model, tokenizer=tokenizer, grouped_entities=True)
+outputs = pipeline("ประเทศไทย อยู่ใน ทวีป เอเชีย")
+print(outputs)
+# [{'entity_group': 'NOUN', 'score': 0.419697, 'word': '', 'start': 0, 'end': 1}, {'entity_group': 'PROPN', 'score': 0.8809489, 'word': 'ประเทศไทย', 'start': 0, 'end': 9}, {'entity_group': 'VERB', 'score': 0.7754166, 'word': 'อยู่ใน', 'start': 9, 'end': 16}, {'entity_group': 'NOUN', 'score': 0.9976932, 'word': 'ทวีป', 'start': 16, 'end': 21}, {'entity_group': 'PROPN', 'score': 0.97770107, 'word': 'เอเชีย', 'start': 21, 'end': 28}]
+```
 ### Training hyperparameters
 - Transformers 4.34.1
 - Pytorch 2.1.0+cu118
 - Datasets 2.14.6
+- Tokenizers 0.14.1