CORe Model - Clinical Diagnosis Prediction

Model description

The CORe (Clinical Outcome Representations) model is introduced in the paper Clinical Outcome Predictions from Admission Notes using Self-Supervised Knowledge Integration. It is based on BioBERT and further pre-trained on clinical notes, disease descriptions and medical articles with a specialised Clinical Outcome Pre-Training objective.

This model checkpoint is fine-tuned on the task of diagnosis prediction. The model expects patient admission notes as input and outputs multi-label ICD9-code predictions.

Model Predictions

The model makes predictions on a total of 9237 labels. These contain 3- and 4-digit ICD9 codes and textual descriptions of these codes. The 4-digit codes and textual descriptions help to incorporate further topical and hierarchical information into the model during training (see Section 4.2 ICD+: Incorporation of ICD Hierarchy in our paper). We recommend to only use the 3-digit code predictions at inference time, because only those have been evaluated in our work.

How to use CORe Diagnosis Prediction

You can load the model via the transformers library:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")
model = AutoModelForSequenceClassification.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction")

The following code shows an inference example:

input = "CHIEF COMPLAINT: Headaches\n\nPRESENT ILLNESS: 58yo man w/ hx of hypertension, AFib on coumadin presented to ED with the worst headache of his life."

tokenized_input = tokenizer(input, return_tensors="pt")
output = model(**tokenized_input)

import torch
predictions = torch.sigmoid(output.logits)
predicted_labels = [model.config.id2label[_id] for _id in (predictions > 0.3).nonzero()[:, 1].tolist()]

Note: For the best performance, we recommend to determine the thresholds (0.3 in this example) individually per label.

More Information

For all the details about CORe and contact info, please visit CORe.app.datexis.com.

Cite

@inproceedings{vanaken21,
  author    = {Betty van Aken and
               Jens-Michalis Papaioannou and
               Manuel Mayrdorfer and
               Klemens Budde and
               Felix A. Gers and
               Alexander Löser},
  title     = {Clinical Outcome Prediction from Admission Notes using Self-Supervised
               Knowledge Integration},
  booktitle = {Proceedings of the 16th Conference of the European Chapter of the
               Association for Computational Linguistics: Main Volume, {EACL} 2021,
               Online, April 19 - 23, 2021},
  publisher = {Association for Computational Linguistics},
  year      = {2021},
}