File size: 6,077 Bytes
879bc02 7b0a84b 879bc02 7b0a84b 879bc02 7b0a84b 0a2fe89 7b0a84b 0a2fe89 65ed7ae 7b0a84b 9aa70d5 7b0a84b b118d97 98c87dd 2af6e97 879bc02 7f4cd8c af5174d 9c2de24 879bc02 d7e42e1 879bc02 7f4cd8c 879bc02 ee7c3fb 879bc02 ee7c3fb 7936b9e ee7c3fb 879bc02 ee7c3fb 879bc02 acc096d 4ea6dc9 9c2de24 879bc02 41f246e 879bc02 0a2fe89 879bc02 0a2fe89 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 |
---
language:
- en
license: apache-2.0
library_name: transformers
tags:
- generated_from_keras_callback
datasets:
- Babelscape/multinerd
metrics:
- seqeval
base_model: distilbert-base-uncased
pipeline_tag: token-classification
widget:
- text: After months of meticulous review and analysis, I am proud to present a study
that explores the deep connections between Epstein-Barr virus (EBV), Long COVID
and Myalgic Encephalomyelitis.
example_title: Example 1
- text: Is it dangerous for a tarantula to live in a paludarium?
example_title: Example 2
- text: Billionaire Charlie Munger, Warren Buffet's right hand man, dies at 99.
example_title: Example 3
model-index:
- name: i-be-snek/distilbert-base-uncased-finetuned-ner-exp_B
results:
- task:
type: token-classification
name: ner
dataset:
name: Babelscape/multinerd (modified version)
type: Babelscape/multinerd
split: test
metrics:
- type: seqeval
value: 0.9362959157462112
name: precision
- type: seqeval
value: 0.9524846478811898
name: recall
- type: seqeval
value: 0.9443209050281742
name: f1
- type: seqeval
value: 0.9913435631438657
name: accuracy
---
<!-- This model card has been generated automatically according to the information Keras had access to. You should
probably proofread and complete it, then remove this comment. -->
# i-be-snek/distilbert-base-uncased-finetuned-ner-exp_B
This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on the English subset of the NER [Babelscape/multinerd](https://huggingface.co/datasets/Babelscape/multinerd) dataset.
It achieves the following results on the validation set:
- Train Loss: 0.0084
- Validation Loss: 0.0587
- Train Precision: 0.9185
- Train Recall: 0.9240
- Train F1: 0.9213
- Train Accuracy: 0.9857
- Epoch: 2
## Model description
[distilbert-base-uncased-finetuned-ner-exp_B](https://huggingface.co/i-be-snek/distilbert-base-uncased-finetuned-ner-exp_B) is a Named Entity Recognition model finetuned on [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased).
This model is uncased, so it makes no distinction between "sarah" and "Sarah".
The dataset it was fine-tuned on was modified. Only five entities were considered: Person (PER), Animal (ANIM), Organization (ORG), Location (LOC), and Disease (DIS).
The dataset was modified further so that all other named entities not included in this list were swapped with the '0' label ID. Tokens IDs were also re-indexed.
## Training and evaluation data
This model has been evaluated on the English subset of the test set of Babelscape/multinerd with modifications where all tags other than (0) Person (PER), Animal (ANIM), Organization (ORG), Location (LOC), and Disease (DIS) were replaced with the 'O' tag.
The label indices were also reset and the dataset was transformed accordingly. You can preprocess the dataset in the same way with any custom set of tags using the script in this [GitHub repository](https://github.com/i-be-snek/rise-assignment-ner-finetune)
## Evaluation results
| metric | value |
|:----------|---------:|
| precision | 0.936296 |
| recall | 0.952485 |
| f1 | 0.944321 |
| accuracy | 0.991344 |
|metric/tag | ANIM | DIS | LOC | ORG | PER |
|:----------|------------:|------------:|-------------:|------------:|-------------:|
| precision | 0.674603 | 0.695304 | 0.966669 | 0.954712 | 0.989048 |
| recall | 0.794888 | 0.799736 | 0.967232 | 0.942883 | 0.994872 |
| f1 | 0.729823 | 0.743873 | 0.966951 | 0.948761 | 0.991952 |
| number | 3208 | 1518 | 24048 | 6618 | 10530 |
## Training procedure
All scripts for training can be found in this [GitHub repository](https://github.com/i-be-snek/rise-assignment-ner-finetune).
The model had early stopped watching its `val_loss`.
### Training hyperparameters
The following hyperparameters were used during training:
- optimizer:
- ```python
{
"name": "AdamWeightDecay",
"learning_rate": 2e-05,
"decay": 0.0,
"beta_1": 0.9,
"beta_2": 0.999,
"epsilon": 1e-07,
"amsgrad": False,
"weight_decay_rate": 0.0,
}
```
- training_precision: float32
### Training results
| Train Loss | Validation Loss | Train Precision | Train Recall | Train F1 | Train Accuracy | Epoch |
|:----------:|:---------------:|:---------------:|:------------:|:--------:|:--------------:|:-----:|
| 0.0426 | 0.0401 | 0.9159 | 0.9284 | 0.9221 | 0.9860 | 0 |
| 0.0163 | 0.0451 | 0.9275 | 0.9235 | 0.9255 | 0.9865 | 1 |
| 0.0084 | 0.0587 | 0.9185 | 0.9240 | 0.9213 | 0.9857 | 2 |
#### Epoch 0
| Named Entity | precision | recall | f1 |
|:----------:|:---------:|:---------:|:--------:|
ANIM | 0.661526 | 0.741578 | 0.699269 |
DIS | 0.722194 | 0.763900 | 0.742462 |
LOC | 0.965829 | 0.974215 | 0.970004 |
ORG | 0.949038 | 0.906056 | 0.927049 |
PER | 0.988075 | 0.989184 | 0.988629 |
#### Epoch 1
| Named Entity | precision | recall | f1 |
|:----------:|:---------:|:---------:|:--------:|
ANIM | 0.704151 | 0.646720 | 0.674214 |
DIS | 0.750533 | 0.756379 | 0.753445 |
LOC | 0.969905 | 0.973037 | 0.971468 |
ORG | 0.930323 | 0.932971 | 0.931645 |
PER | 0.991814 | 0.989082 | 0.990446 |
#### Epoch 2
| Named Entity | precision | recall | f1 |
|:----------:|:---------:|:---------:|:--------:|
ANIM | 0.689281 | 0.675532 | 0.682337 |
DIS |0.703917 |0.786731 |0.743024 |
LOC |0.975097 |0.960122 |0.967552 |
ORG |0.928553 |0.931677 |0.930112 |
PER |0.992620 |0.988163 |0.990387 |
### Framework versions
- Transformers 4.35.2
- TensorFlow 2.14.0
- Datasets 2.15.0
- Tokenizers 0.15.0 |