--- tags: - spacy - token-classification language: - de model-index: - name: de_ggponc_medbertde results: - task: name: NER (fine-grained, nested spans) type: token-classification metrics: - name: F1 score (Test set, fine-grained, nested spans) type: f_score value: 0.7415 - name: Precision (Test set, fine-grained, nested spans) type: precision value: 0.7304 - name: Recall (Test set, fine-grained, nested spans) type: recall value: 0.7529 datasets: - bigbio/ggponc2 library_name: spacy --- Clinical NER model using spaCy's SpanCategorizer implementation and [medBERT.de](https://huggingface.co/GerMedBERT/medbert-512). Usage: ```python !huggingface-cli download phlobo/de_ggponc_medbertde de_ggponc_medbertde-1.0.0-py3-none-any.whl --local-dir . !pip install de_ggponc_medbertde-1.0.0-py3-none-any.whl import spacy nlp = spacy.load('de_ggponc_medbertde') d = nlp("allein nach Versagen einer Behandlung mit Oxaliplatin und Irinotecan") for e in d.spans['entities']: print(e, e.label_) ``` yields: ``` Oxaliplatin Clinical_Drug Irinotecan Clinical_Drug Versagen einer Behandlung Other_Finding Behandlung mit Oxaliplatin und Irinotecan Therapeutic ``` The model has been trained on gold standard labels in GGPONC 2.0 (https://aclanthology.org/2022.lrec-1.389/). It detects the following 8 entity classes: - Findings: Diagnosis / Pathology and Other Findings - Substances: Clinical Drug, Nutrients / Body Substances, External Substances - Procedures: Therapeutic, Diagnostic The configuration for training the model is available here: https://github.com/hpi-dhc/ggponc When using the model, please cite the following publication: ```bibtex @inproceedings{borchert-etal-2022-ggponc, title = "{GGPONC} 2.0 - The {G}erman Clinical Guideline Corpus for Oncology: Curation Workflow, Annotation Policy, Baseline {NER} Taggers", author = "Borchert, Florian and Lohr, Christina and Modersohn, Luise and Witt, Jonas and Langer, Thomas and Follmann, Markus and Gietzelt, Matthias and Arnrich, Bert and Hahn, Udo and Schapranow, Matthieu-P.", booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference", month = jun, year = "2022", address = "Marseille, France", publisher = "European Language Resources Association", pages = "3650--3660" } ``` | Feature | Description | | --- | --- | | **Name** | `de_ggponc_medbertde` | | **Version** | `1.0.0` | | **spaCy** | `>=3.4.4,<3.5.0` | | **Default Pipeline** | `transformer`, `morphologizer`, `parser`, `transformer_spancat`, `spancat` | | **Components** | `transformer`, `morphologizer`, `parser`, `transformer_spancat`, `spancat` | | **License** | The model may be used for non-commercial research activities only, see also the Terms of Use of GGPONC: https://www.leitlinienprogramm-onkologie.de/projekte/ggponc-english | | **Author** | [Florian Borchert](https://florianborchert.de) |