de_ggponc_medbertde / README.md
phlobo's picture
Rename whl file
a95f53e
|
raw
history blame
3.03 kB
---
tags:
- spacy
- token-classification
language:
- de
model-index:
- name: de_ggponc_medbertde
results:
- task:
name: NER (fine-grained, nested spans)
type: token-classification
metrics:
- name: F1 score (Test set, fine-grained, nested spans)
type: f_score
value: 0.7415
- name: Precision (Test set, fine-grained, nested spans)
type: precision
value: 0.7304
- name: Recall (Test set, fine-grained, nested spans)
type: recall
value: 0.7529
datasets:
- bigbio/ggponc2
library_name: spacy
---
Clinical NER model using spaCy's SpanCategorizer implementation and [medBERT.de](https://huggingface.co/GerMedBERT/medbert-512).
Usage:
```python
!huggingface-cli download phlobo/de_ggponc_medbertde de_ggponc_medbertde-1.0.0-py3-none-any.whl --local-dir .
!pip install de_ggponc_medbertde-1.0.0-py3-none-any.whl
import spacy
nlp = spacy.load('de_ggponc_medbertde')
d = nlp("allein nach Versagen einer Behandlung mit Oxaliplatin und Irinotecan")
for e in d.spans['entities']:
print(e, e.label_)
```
yields:
```
Oxaliplatin Clinical_Drug
Irinotecan Clinical_Drug
Versagen einer Behandlung Other_Finding
Behandlung mit Oxaliplatin und Irinotecan Therapeutic
```
The model has been trained on gold standard labels in GGPONC 2.0 (https://aclanthology.org/2022.lrec-1.389/).
It detects the following 8 entity classes:
- Findings: Diagnosis / Pathology and Other Findings
- Substances: Clinical Drug, Nutrients / Body Substances, External Substances
- Procedures: Therapeutic, Diagnostic
The configuration for training the model is available here: https://github.com/hpi-dhc/ggponc
When using the model, please cite the following publication:
```bibtex
@inproceedings{borchert-etal-2022-ggponc,
title = "{GGPONC} 2.0 - The {G}erman Clinical Guideline Corpus for Oncology: Curation Workflow, Annotation Policy, Baseline {NER} Taggers",
author = "Borchert, Florian and
Lohr, Christina and
Modersohn, Luise and
Witt, Jonas and
Langer, Thomas and
Follmann, Markus and
Gietzelt, Matthias and
Arnrich, Bert and
Hahn, Udo and
Schapranow, Matthieu-P.",
booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference",
month = jun,
year = "2022",
address = "Marseille, France",
publisher = "European Language Resources Association",
pages = "3650--3660"
}
```
| Feature | Description |
| --- | --- |
| **Name** | `de_ggponc_medbertde` |
| **Version** | `1.0.0` |
| **spaCy** | `>=3.4.4,<3.5.0` |
| **Default Pipeline** | `transformer`, `morphologizer`, `parser`, `transformer_spancat`, `spancat` |
| **Components** | `transformer`, `morphologizer`, `parser`, `transformer_spancat`, `spancat` |
| **License** | The model may be used for non-commercial research activities only, see also the Terms of Use of GGPONC: https://www.leitlinienprogramm-onkologie.de/projekte/ggponc-english |
| **Author** | [Florian Borchert](https://florianborchert.de) |