--- language: - en license: apache-2.0 library_name: transformers tags: - generated_from_keras_callback datasets: - Babelscape/multinerd metrics: - seqeval base_model: distilbert-base-uncased pipeline_tag: token-classification widget: - text: After months of meticulous review and analysis, I am proud to present a study that explores the deep connections between Epstein-Barr virus (EBV), Long COVID and Myalgic Encephalomyelitis. example_title: Example 1 - text: Is it dangerous for a tarantula to live in a paludarium? example_title: Example 2 - text: Billionaire Charlie Munger, Warren Buffet's right hand man, dies at 99. example_title: Example 3 model-index: - name: i-be-snek/distilbert-base-uncased-finetuned-ner-exp_B results: - task: type: token-classification name: ner dataset: name: Babelscape/multinerd with only 5 tags type: Babelscape/multinerd split: test metrics: - type: seqeval value: 0.9362959157462112 name: precision - type: seqeval value: 0.9524846478811898 name: recall --- # i-be-snek/distilbert-base-uncased-finetuned-ner-exp_B This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on the English subset of the NER [Babelscape/multinerd](https://huggingface.co/datasets/Babelscape/multinerd) dataset. It achieves the following results on the evaluation set: - Train Loss: 0.0084 - Validation Loss: 0.0587 - Train Precision: 0.9185 - Train Recall: 0.9240 - Train F1: 0.9213 - Train Accuracy: 0.9857 - Epoch: 2 All scripts for training can be found in this [GitHub repository](https://github.com/i-be-snek/rise-assignment-ner-finetune). ## Model description [distilbert-base-uncased-finetuned-ner-exp_B](https://huggingface.co/i-be-snek/distilbert-base-uncased-finetuned-ner-exp_B) is a Named Entity Recognition model finetuned on [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased). This model is uncased, so it makes no distinction between "sarah" and "Sarah". The dataset it was fine-tuned on was modified. Only five entities were considered: Person (PER), Animal (ANIM), Organization (ORG), Location (LOC), and Disease (DIS). The dataset was modified further so that all other named entities not included in this list were swapped with the '0' label ID. Tokens IDs were also re-indexed. ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - optimizer: {'name': 'AdamWeightDecay', 'learning_rate': 2e-05, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.0} - training_precision: float32 ### Training results | Train Loss | Validation Loss | Train Precision | Train Recall | Train F1 | Train Accuracy | Epoch | |:----------:|:---------------:|:---------------:|:------------:|:--------:|:--------------:|:-----:| | 0.0426 | 0.0401 | 0.9159 | 0.9284 | 0.9221 | 0.9860 | 0 | | 0.0163 | 0.0451 | 0.9275 | 0.9235 | 0.9255 | 0.9865 | 1 | | 0.0084 | 0.0587 | 0.9185 | 0.9240 | 0.9213 | 0.9857 | 2 | #### Epoch 0 | Named Entity | precision | recall | f1 | |:----------:|:---------:|:---------:|:--------:| ANIM | 0.661526 | 0.741578 | 0.699269 | DIS | 0.722194 | 0.763900 | 0.742462 | LOC | 0.965829 | 0.974215 | 0.970004 | ORG | 0.949038 | 0.906056 | 0.927049 | PER | 0.988075 | 0.989184 | 0.988629 | #### Epoch 1 | Named Entity | precision | recall | f1 | |:----------:|:---------:|:---------:|:--------:| ANIM | 0.704151 | 0.646720 | 0.674214 | DIS | 0.750533 | 0.756379 | 0.753445 | LOC | 0.969905 | 0.973037 | 0.971468 | ORG | 0.930323 | 0.932971 | 0.931645 | PER | 0.991814 | 0.989082 | 0.990446 | #### Epoch 2 | Named Entity | precision | recall | f1 | |:----------:|:---------:|:---------:|:--------:| ANIM | 0.689281 | 0.675532 | 0.682337 | DIS |0.703917 |0.786731 |0.743024 | LOC |0.975097 |0.960122 |0.967552 | ORG |0.928553 |0.931677 |0.930112 | PER |0.992620 |0.988163 |0.990387 | ### Framework versions - Transformers 4.35.2 - TensorFlow 2.14.0 - Datasets 2.15.0 - Tokenizers 0.15.0