gunghio's picture
Add multilingual to the language tag (#1)
7c99146
metadata
language:
  - it
  - en
  - de
  - fr
  - es
  - multilingual
license:
  - mit
datasets:
  - xtreme
metrics:
  - precision: 0.874
  - recall: 0.88
  - f1: 0.877
  - accuracy: 0.943
inference:
  parameters:
    aggregation_strategy: first

gunghio/xlm-roberta-base-finetuned-panx-ner

This model was trained starting from xlm-roberta-base on a subset of xtreme dataset.

xtreme datasets subsets used are: PAN-X.{lang}. Language used for training/validation are: italian, english, german, french and spanish.

Only 75% of the whole dataset was used.

Intended uses & limitations

Fine-tuned model can be used for Named Entity Recognition in it, en, de, fr, and es.

Training and evaluation data

Training dataset: xtreme

Training results

It achieves the following results on the evaluation set:

  • Precision: 0.8744154472771157
  • Recall: 0.8791424269015351
  • F1: 0.8767725659462058
  • Accuracy: 0.9432040948504613

Details:

Label Precision Recall F1-Score Support
PER 0.922 0.908 0.915 26639
LOC 0.880 0.906 0.892 37623
ORG 0.821 0.816 0.818 28045
Overall 0.874 0.879 0.877 92307

Usage

Set aggregation stragey according to documentation.

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("gunghio/xlm-roberta-base-finetuned-panx-ner")
model = AutoModelForTokenClassification.from_pretrained("gunghio/xlm-roberta-base-finetuned-panx-ner")

nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="first")
example = "My name is Wolfgang and I live in Berlin"

ner_results = nlp(example)
print(ner_results)