SpanMarker

This is a SpanMarker model that can be used for Named Entity Recognition.

Model Details

Model Description

Model Type: SpanMarker
Maximum Sequence Length: 256 tokens
Maximum Entity Length: 6 words

Model Sources

Repository: SpanMarker on GitHub
Thesis: SpanMarker For Named Entity Recognition

Model Labels

Label	Examples
ANIM	"vertebrate", "moth", "G. firmus"
BIO	"Aspergillus", "Cladophora", "Zythiostroma"
CEL	"pulsar", "celestial bodies", "neutron star"
DIS	"social anxiety disorder", "insulin resistance", "Asperger syndrome"
EVE	"Spanish Civil War", "National Junior Angus Show", "French Revolution"
FOOD	"Neera", "Bellini ( cocktail )", "soju"
INST	"Apple II", "Encyclopaedia of Chess Openings", "Android"
LOC	"Kīlauea", "Hungary", "Vienna"
MEDIA	"CSI : Crime Scene Investigation", "Big Comic Spirits", "American Idol"
MYTH	"Priam", "Oźwiena", "Odysseus"
ORG	"San Francisco Giants", "Arm Holdings", "RTÉ One"
PER	"Amelia Bence", "Tito Lusiardo", "James Cameron"
PLANT	"vernal squill", "Sarracenia purpurea", "Drosera rotundifolia"
TIME	"prehistory", "Age of Enlightenment", "annual paid holiday"
VEHI	"Short 360", "Ferrari 355 Challenge", "Solution F / Chretien Helicopter"

Uses

Direct Use for Inference

from span_marker import SpanMarkerModel

# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("span_marker_model_id")
# Run inference
entities = model.predict("Ann Patchett ’s novel \" Bel Canto \", was another creative influence that helped her manage a plentiful cast of characters.")

Downstream Use

You can finetune this model on your own dataset.

Click to expand

from span_marker import SpanMarkerModel, Trainer

# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("span_marker_model_id")

# Specify a Dataset with "tokens" and "ner_tag" columns
dataset = load_dataset("conll2003") # For example CoNLL2003

# Initialize a Trainer using the pretrained model & dataset
trainer = Trainer(
    model=model,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
)
trainer.train()
trainer.save_model("span_marker_model_id-finetuned")

Training Details

Training Set Metrics

Training set	Min	Median	Max
Sentence length	2	21.6493	237
Entities per sentence	0	1.5369	36

Training Hyperparameters

learning_rate: 1e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1
mixed_precision_training: Native AMP

Training Results

Epoch	Step	Validation Loss	Validation Precision	Validation Recall	Validation F1	Validation Accuracy
0.0576	1000	0.0142	0.8714	0.7729	0.8192	0.9698
0.1153	2000	0.0107	0.8316	0.8815	0.8558	0.9744
0.1729	3000	0.0092	0.8717	0.8797	0.8757	0.9780
0.2306	4000	0.0082	0.8811	0.8886	0.8848	0.9798
0.2882	5000	0.0084	0.8523	0.9163	0.8831	0.9790
0.3459	6000	0.0079	0.8700	0.9113	0.8902	0.9802
0.4035	7000	0.0070	0.9107	0.8859	0.8981	0.9822
0.4611	8000	0.0069	0.9259	0.8797	0.9022	0.9827
0.5188	9000	0.0067	0.9061	0.8965	0.9013	0.9829
0.5764	10000	0.0066	0.9034	0.8996	0.9015	0.9829
0.6341	11000	0.0064	0.9160	0.8996	0.9077	0.9839
0.6917	12000	0.0066	0.8952	0.9121	0.9036	0.9832
0.7494	13000	0.0062	0.9165	0.9009	0.9086	0.9841
0.8070	14000	0.0062	0.9010	0.9121	0.9065	0.9835
0.8647	15000	0.0062	0.9084	0.9127	0.9105	0.9842
0.9223	16000	0.0060	0.9151	0.9098	0.9125	0.9846
0.9799	17000	0.0060	0.9149	0.9113	0.9131	0.9848

Framework Versions

Python: 3.8.16
SpanMarker: 1.5.0
Transformers: 4.29.0.dev0
PyTorch: 1.10.1
Datasets: 2.15.0
Tokenizers: 0.13.2

Citation

BibTeX

@software{Aarsen_SpanMarker,
    author = {Aarsen, Tom},
    license = {Apache-2.0},
    title = {{SpanMarker for Named Entity Recognition}},
    url = {https://github.com/tomaarsen/SpanMarkerNER}
}

Aaron-96
/

multiNERD_fine-tuned_only_English_roberta

SpanMarker

Model Details

Model Description

Model Sources

Model Labels

Uses

Direct Use for Inference

Downstream Use

Training Details

Training Set Metrics

Training Hyperparameters

Training Results

Framework Versions

Citation

BibTeX

Evaluation results