metadata

language:
  - it
  - en
  - de
  - fr
  - es
  - multilingual
license:
  - mit
datasets:
  - xtreme
metrics:
  - precision: 0.874
  - recall: 0.88
  - f1: 0.877
  - accuracy: 0.943
inference:
  parameters:
    aggregation_strategy: first

gunghio/xlm-roberta-base-finetuned-panx-ner

This model was trained starting from xlm-roberta-base on a subset of xtreme dataset.

xtreme datasets subsets used are: PAN-X.{lang}. Language used for training/validation are: italian, english, german, french and spanish.

Only 75% of the whole dataset was used.

Intended uses & limitations

Fine-tuned model can be used for Named Entity Recognition in it, en, de, fr, and es.

Training and evaluation data

Training dataset: xtreme

Training results

It achieves the following results on the evaluation set:

Precision: 0.8744154472771157
Recall: 0.8791424269015351
F1: 0.8767725659462058
Accuracy: 0.9432040948504613

Details:

Label	Precision	Recall	F1-Score	Support
PER	0.922	0.908	0.915	26639
LOC	0.880	0.906	0.892	37623
ORG	0.821	0.816	0.818	28045
Overall	0.874	0.879	0.877	92307

Usage

Set aggregation stragey according to documentation.

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("gunghio/xlm-roberta-base-finetuned-panx-ner")
model = AutoModelForTokenClassification.from_pretrained("gunghio/xlm-roberta-base-finetuned-panx-ner")

nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="first")
example = "My name is Wolfgang and I live in Berlin"

ner_results = nlp(example)
print(ner_results)