Continue pre-training RoBERTa-base using discharge summaries from MIMIC-III datasets.
Details can be found in the following paper
Xiang Dai and Ilias Chalkidis and Sune Darkner and Desmond Elliott. 2022. Revisiting Transformer-based Models for Long Document Classification. (https://arxiv.org/abs/2204.06683)
- Important hyper-parameters
Max sequence | 4096 |
Batch size | 8 |
Learning rate | 5e-5 |
Training epochs | 6 |
Training time | 130 GPU-hours |
- Downloads last month
- 30
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.