|
--- |
|
tags: |
|
- spacy |
|
- token-classification |
|
language: |
|
- de |
|
model-index: |
|
- name: de_ggponc_medbertde |
|
results: |
|
- task: |
|
name: NER (fine-grained, nested spans) |
|
type: token-classification |
|
metrics: |
|
- name: F1 score (Test set, fine-grained, nested spans) |
|
type: f_score |
|
value: 0.7415 |
|
- name: Precision (Test set, fine-grained, nested spans) |
|
type: precision |
|
value: 0.7304 |
|
- name: Recall (Test set, fine-grained, nested spans) |
|
type: recall |
|
value: 0.7529 |
|
datasets: |
|
- bigbio/ggponc2 |
|
library_name: spacy |
|
--- |
|
Clinical NER model using spaCy's SpanCategorizer implementation and [medBERT.de](https://huggingface.co/GerMedBERT/medbert-512). |
|
|
|
Usage: |
|
|
|
```python |
|
!huggingface-cli download phlobo/de_ggponc_medbertde de_ggponc_medbertde-1.0.0-py3-none-any.whl --local-dir . |
|
!pip install de_ggponc_medbertde-1.0.0-py3-none-any.whl |
|
|
|
import spacy |
|
nlp = spacy.load('de_ggponc_medbertde') |
|
d = nlp("allein nach Versagen einer Behandlung mit Oxaliplatin und Irinotecan") |
|
for e in d.spans['entities']: |
|
print(e, e.label_) |
|
|
|
``` |
|
|
|
yields: |
|
|
|
``` |
|
Oxaliplatin Clinical_Drug |
|
Irinotecan Clinical_Drug |
|
Versagen einer Behandlung Other_Finding |
|
Behandlung mit Oxaliplatin und Irinotecan Therapeutic |
|
``` |
|
|
|
The model has been trained on gold standard labels in GGPONC 2.0 (https://aclanthology.org/2022.lrec-1.389/). |
|
|
|
It detects the following 8 entity classes: |
|
|
|
- Findings: Diagnosis / Pathology and Other Findings |
|
- Substances: Clinical Drug, Nutrients / Body Substances, External Substances |
|
- Procedures: Therapeutic, Diagnostic |
|
|
|
The configuration for training the model is available here: https://github.com/hpi-dhc/ggponc |
|
|
|
When using the model, please cite the following publication: |
|
|
|
```bibtex |
|
@inproceedings{borchert-etal-2022-ggponc, |
|
title = "{GGPONC} 2.0 - The {G}erman Clinical Guideline Corpus for Oncology: Curation Workflow, Annotation Policy, Baseline {NER} Taggers", |
|
author = "Borchert, Florian and |
|
Lohr, Christina and |
|
Modersohn, Luise and |
|
Witt, Jonas and |
|
Langer, Thomas and |
|
Follmann, Markus and |
|
Gietzelt, Matthias and |
|
Arnrich, Bert and |
|
Hahn, Udo and |
|
Schapranow, Matthieu-P.", |
|
booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference", |
|
month = jun, |
|
year = "2022", |
|
address = "Marseille, France", |
|
publisher = "European Language Resources Association", |
|
pages = "3650--3660" |
|
} |
|
|
|
``` |
|
|
|
| Feature | Description | |
|
| --- | --- | |
|
| **Name** | `de_ggponc_medbertde` | |
|
| **Version** | `1.0.0` | |
|
| **spaCy** | `>=3.4.4,<3.5.0` | |
|
| **Default Pipeline** | `transformer`, `morphologizer`, `parser`, `transformer_spancat`, `spancat` | |
|
| **Components** | `transformer`, `morphologizer`, `parser`, `transformer_spancat`, `spancat` | |
|
| **License** | The model may be used for non-commercial research activities only, see also the Terms of Use of GGPONC: https://www.leitlinienprogramm-onkologie.de/projekte/ggponc-english | |
|
| **Author** | [Florian Borchert](https://florianborchert.de) | |
|
|