--- license: gpl-3.0 language: - en metrics: - accuracy base_model: dmis-lab/ANGEL_pretrained --- # Model Card for ANGEL_bc5cdr This model card provides detailed information about the ANGEL_bc5cdr model, designed for biomedical entity linking. # Model Details #### Model Description - **Developed by:** Chanhwi Kim, Hyunjae Kim, Sihyeon Park, Jiwoo Lee, Mujeen Sung, Jaewoo Kang - **Model type:** Generative Biomedical Entity Linking Model - **Language(s):** English - **License:** GPL-3.0 - **Finetuned from model:** BART-large (Base architecture) #### Model Sources - **Github Repository:** https://github.com/dmis-lab/ANGEL - **Paper:** https://arxiv.org/pdf/2408.16493 # Direct Use ANGEL_bc5cdr is a tool specifically designed for biomedical entity linking, with a focus on identifying and linking disease mentions within BC5CDR datasets. To use this model, you need to set up a virtual environment and the inference code. Start by cloning our [ANGEL GitHub repository](https://github.com/dmis-lab/ANGEL). Then, run the following script to set up the environment: ```bash bash script/environment/set_environment.sh ``` Then, if you want to run the model on a single sample, no preprocessing is required. Simply execute the run_sample.sh script: ```bash bash script/inference/run_sample.sh bc5cdr ``` To modify the sample with your own example, refer to the [Direct Use](https://github.com/dmis-lab/ANGEL?tab=readme-ov-file#direct-use) section in our GitHub repository. If you're interested in training or evaluating the model, check out the [Fine-tuning](https://github.com/dmis-lab/ANGEL?tab=readme-ov-file#fine-tuning) section and [Evaluation](https://github.com/dmis-lab/ANGEL?tab=readme-ov-file#evaluation) section. # Training #### Training Data The model was trained on the BC5CDR dataset, which includes annotated disease entities. #### Training Procedure Positive-only Pre-training: Initial training using only positive examples, following the standard approach. Negative-aware Training: Subsequent training incorporated negative examples to improve the model's discriminative capabilities. # Evaluation ### Testing Data The model was evaluated using BC5CDR dataset. ### Metrics Accuracy at Top-1 (Acc@1): Measures the percentage of times the model's top prediction matches the correct entity. ### Scores
Dataset BioSYN
(Sung et al., 2020)
SapBERT
(Liu et al., 2021)
GenBioEL
(Yuan et al., 2022b)
ANGEL
(Ours)
BC5CDR - - 93.1 94.5
The scores of GenBioEL were reproduced. We excluded the performance of BioSYN and SapBERT, as they were evaluated separately on the chemical and disease subsets, differing from our settings. # Citation If you use the ANGEL_bc5cdr model, please cite: ```bibtex @article{kim2024learning, title={Learning from Negative Samples in Generative Biomedical Entity Linking}, author={Kim, Chanhwi and Kim, Hyunjae and Park, Sihyeon and Lee, Jiwoo and Sung, Mujeen and Kang, Jaewoo}, journal={arXiv preprint arXiv:2408.16493}, year={2024} } ``` # Contact For questions or issues, please contact chanhwi_kim@korea.ac.kr.