miguel6nunes
commited on
Commit
•
88ece8f
1
Parent(s):
ecf439a
Update README.md
Browse files
README.md
CHANGED
@@ -44,13 +44,13 @@ The first publicly available medical language model trained with real European P
|
|
44 |
|
45 |
MediAlbertina is a family of encoders from the Bert family, DeBERTaV2-based, resulting from the continuation of the pre-training of [PORTULAN's Albertina](https://huggingface.co/PORTULAN) models with Electronic Medical Records shared by Portugal's largest public hospital.
|
46 |
|
47 |
-
Like its antecessors, MediAlbertina models are distributed under the [MIT license](https://huggingface.co/portugueseNLP/medialbertina_pt-
|
48 |
|
49 |
|
50 |
|
51 |
# Model Description
|
52 |
|
53 |
-
**MediAlbertina PT-PT 900M NER
|
54 |
- **Diagnostico (D)**: All types of diseases and conditions following the ICD-10-CM guidelines.
|
55 |
- **Sintoma (S)**: Any complaints or evidence from healthcare professionals indicating that a patient is experiencing a medical condition.
|
56 |
- **Medicamento (M)**: Something that is administrated to the patient (through any route), including drugs, specific food/drink, vitamins, or blood for transfusion.
|
@@ -60,7 +60,7 @@ Like its antecessors, MediAlbertina models are distributed under the [MIT licens
|
|
60 |
- **Resultado (R)**: Results can be associated with Medical Procedures and Vital Signs. It can be a numerical value if something was measured (e.g., the value associated with blood pressure) or a descriptor to indicate the result (e.g., positive/negative, functional).
|
61 |
- **Progresso (P)**: Describes the progress of patient’s condition. Typically, it includes verbs like improving, evolving, or regressing and mentions to patient’s stability.
|
62 |
|
63 |
-
**MediAlbertina PT-PT 900M NER
|
64 |
|
65 |
| Model | B-D | I-D | B-S | I-S | B-PM | I-PM | B-SV | I-SV | B-R | I-R | B-M | I-M | B-DO | I-DO | B-P | I-P |
|
66 |
|-------------------------|:----:|:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|
@@ -74,7 +74,7 @@ Like its antecessors, MediAlbertina models are distributed under the [MIT licens
|
|
74 |
|
75 |
## Data
|
76 |
|
77 |
-
**MediAlbertina PT-PT 900M NER
|
78 |
|
79 |
|
80 |
## How to use
|
@@ -82,7 +82,7 @@ Like its antecessors, MediAlbertina models are distributed under the [MIT licens
|
|
82 |
```Python
|
83 |
from transformers import pipeline
|
84 |
|
85 |
-
ner_pipeline = pipeline('ner', model='portugueseNLP/medialbertina_pt-
|
86 |
sentence = 'Durante o procedimento endoscópico, foram encontrados pólipos no cólon do paciente.'
|
87 |
entities = ner_pipeline(sentence)
|
88 |
for entity in entities:
|
|
|
44 |
|
45 |
MediAlbertina is a family of encoders from the Bert family, DeBERTaV2-based, resulting from the continuation of the pre-training of [PORTULAN's Albertina](https://huggingface.co/PORTULAN) models with Electronic Medical Records shared by Portugal's largest public hospital.
|
46 |
|
47 |
+
Like its antecessors, MediAlbertina models are distributed under the [MIT license](https://huggingface.co/portugueseNLP/medialbertina_pt-pt_900m_NER/blob/main/LICENSE).
|
48 |
|
49 |
|
50 |
|
51 |
# Model Description
|
52 |
|
53 |
+
**MediAlbertina PT-PT 900M NER** was created through fine-tuning of [MediAlbertina PT-PT 900M](https://huggingface.co/portugueseNLP/medialbertina_pt-pt_900m) on real European Portuguese EMRs that have been hand-annotated for the following entities:
|
54 |
- **Diagnostico (D)**: All types of diseases and conditions following the ICD-10-CM guidelines.
|
55 |
- **Sintoma (S)**: Any complaints or evidence from healthcare professionals indicating that a patient is experiencing a medical condition.
|
56 |
- **Medicamento (M)**: Something that is administrated to the patient (through any route), including drugs, specific food/drink, vitamins, or blood for transfusion.
|
|
|
60 |
- **Resultado (R)**: Results can be associated with Medical Procedures and Vital Signs. It can be a numerical value if something was measured (e.g., the value associated with blood pressure) or a descriptor to indicate the result (e.g., positive/negative, functional).
|
61 |
- **Progresso (P)**: Describes the progress of patient’s condition. Typically, it includes verbs like improving, evolving, or regressing and mentions to patient’s stability.
|
62 |
|
63 |
+
**MediAlbertina PT-PT 900M NER** achieved superior results to the same adaptation made on a non-medical Portuguese language model, demonstrating the effectiveness of this domain adaptation, and its potential for medical AI in Portugal.
|
64 |
|
65 |
| Model | B-D | I-D | B-S | I-S | B-PM | I-PM | B-SV | I-SV | B-R | I-R | B-M | I-M | B-DO | I-DO | B-P | I-P |
|
66 |
|-------------------------|:----:|:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|
|
|
74 |
|
75 |
## Data
|
76 |
|
77 |
+
**MediAlbertina PT-PT 900M NER** was fine-tuned on about 10k hand-annotated medical entities from about 4k fully anonymized medical sentences from Portugal's largest public hospital. This data was acquired under the framework of the [FCT project DSAIPA/AI/0122/2020 AIMHealth-Mobile Applications Based on Artificial Intelligence](https://ciencia.iscte-iul.pt/projects/aplicacoes-moveis-baseadas-em-inteligencia-artificial-para-resposta-de-saude-publica/1567).
|
78 |
|
79 |
|
80 |
## How to use
|
|
|
82 |
```Python
|
83 |
from transformers import pipeline
|
84 |
|
85 |
+
ner_pipeline = pipeline('ner', model='portugueseNLP/medialbertina_pt-pt_900m_NER', aggregation_strategy='average')
|
86 |
sentence = 'Durante o procedimento endoscópico, foram encontrados pólipos no cólon do paciente.'
|
87 |
entities = ner_pipeline(sentence)
|
88 |
for entity in entities:
|