File size: 3,423 Bytes
c2a2e8a
 
 
f6f08ac
 
c2a2e8a
 
 
 
f6f08ac
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c2a2e8a
 
f6f08ac
 
c2a2e8a
f6f08ac
 
 
 
 
 
 
 
 
b14856b
f6f08ac
 
 
 
 
c2a2e8a
 
 
 
 
 
 
 
 
 
 
 
f6f08ac
 
 
 
 
 
 
c2a2e8a
f6f08ac
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
tags:
- spacy
- arxiv:2408.06930
- medical
language:
- nl
license: cc-by-sa-4.0
model-index:
- name: Echocardiogram_SpanCategorizer_aortic_regurgitation
  results: 
  - task: 
      type: token-classification
    dataset:
      type: test
      name: "internal test set"
    metrics:
    - name: "Weighted f1"
      type: f1
      value: 0.897
      verified: false 
    - name: "Weighted precision"
      type: precision
      value: 0.944
      verified: false
    - name: "Weighted recall"
      type: recall
      value: 0.853
      verified: false
    
pipeline_tag: token-classification
metrics:
- f1
- precision
- recall
---

# Description
This model is a spaCy SpanCategorizer model trained from scratch on Dutch echocardiogram reports sourced from Electronic Health Records. The publication associated with the span classification task can be found at https://arxiv.org/abs/2408.06930. The config file for training the model can be found at https://github.com/umcu/echolabeler.

# Minimum working example
```python
!pip install https://huggingface.co/baukearends/Echocardiogram-SpanCategorizer-aortic-regurgitation/resolve/main/nl_Echocardiogram_SpanCategorizer_aortic_regurgitation-any-py3-none-any.whl
```
```python
import spacy
nlp = spacy.load("nl_Echocardiogram_SpanCategorizer_aortic_regurgitation")
```
```python
prediction = nlp("Op dit echo geen duidelijke WMA te zien, goede systolische L.V. functie, wel L.V.H., diastolische dysfunctie graad 1A tot 2. Geringe aortastenose en matige aortaklepinsufficientie. Geringe M.I.")
for span, score in zip(prediction.spans['sc'], prediction.spans['sc'].attrs['scores']):
    print(f"Span: {span}, label: {span.label_}, score: {score[0]:.3f}")
```

# Label Scheme

<details>

<summary>View label scheme (4 labels for 1 components)</summary>

| Component | Labels |
| --- | --- |
| **`spancat`** | `aortic_valve_native_regurgitation_not_present`, `aortic_valve_native_regurgitation_mild`, `aortic_valve_native_regurgitation_moderate`, `aortic_valve_native_regurgitation_severe` |

</details>


# Intended use
The model is developed for span classification on Dutch clinical text. Since it is a domain-specific model trained on medical data, it is meant to be used on medical NLP tasks for Dutch.

# Data
The model was trained on approximately 4,000 manually annotated echocardiogram reports from the University Medical Centre Utrecht. The training data was anonymized before starting the training procedure.

| Feature | Description |
| --- | --- |
| **Name** | `Echocardiogram_SpanCategorizer_aortic_regurgitation` |
| **Version** | `1.0.0` |
| **spaCy** | `>=3.7.4,<3.8.0` |
| **Default Pipeline** | `tok2vec`, `spancat` |
| **Components** | `tok2vec`, `spancat` |
| **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
| **Sources** | n/a |
| **License** | `cc-by-sa-4.0` |
| **Author** | [Bauke Arends]() |

# Contact
If you are having problems with this model please add an issue on our git: https://github.com/umcu/echolabeler/issues

# Usage
If you use the model in your work please use the following referral; https://doi.org/10.48550/arXiv.2408.06930

# References
Paper: Bauke Arends, Melle Vessies, Dirk van Osch, Arco Teske, Pim van der Harst, René van Es, Bram van Es (2024): Diagnosis extraction from unstructured Dutch echocardiogram reports using span- and document-level characteristic classification, Arxiv https://arxiv.org/abs/2408.06930