You need to agree to share your contact information to access this model
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
This model and associated code are released under the CC-BY-NC-ND 4.0 license and may only be used for non-commercial, academic research purposes with proper attribution. Any commercial use, sale, or other monetization of the MUSK model and its derivatives, which include models trained on outputs from the MUSK model or datasets created from the MUSK model, is prohibited and requires prior approval. By downloading this model, you agree not to distribute, publish or reproduce a copy of the model. If you are a commercial entity, please contact the corresponding author.
Log in or Sign Up to review the conditions and access this model content.
MUSK: A Vision-Language Foundation Model for Precision Oncology
(Nature. 2024. In press)
Jinxi Xiang‡, Xiyue Wang‡, Xiaoming Zhang, Yinghua Xi, Feyisope Eweje, Yijiang Chen, Yuchen Li, Colin Bergstrom, Matthew Gopaulchan, Ted Kim, Kun-Hsing Yu, Sierra Willens, Francesca Maria Olguin, Jeffrey J. Nirschl, Joel Neal, Maximilian Diehn, Sen Yang+, Ruijiang Li+ (‡Equal Contribution)
Lead Contact: Ruijiang Li, Ph.D.
Stanford University, Harvard University
Installation
First clone the repo and cd into the directory:
git clone https://github.com/lilab-stanford/MUSK
cd MUSK
Create a new enviroment with anaconda.
conda create -n musk python=3.10 -y --no-default-packages
conda activate musk
pip install --upgrade pip
pip install -r requirements.txt
pip install -e .
Model Download
You need to agree to the terms to access the models and login with your HuggingFace write token:
from huggingface_hub import login
login(<huggingface write token>)
Basic Usage: MUSK as a Vision-Language Encoder
- Load the MUSK model
from musk import utils, modeling
from timm.models import create_model
model = create_model("musk_large_patch16_384", vocab_size=64010)
utils.load_model_and_may_interpolate("hf_hub:xiangjx/musk", model, 'model|module', '')
model.to(device="cuda", dtype=torch.float32)
model.eval()
- Encode image with MUSK (refer to
demo.ipynb
for complete implementation)
with torch.inference_mode():
image_embeddings = model(
image=img_tensor.to(device="cuda", dtype=torch.float32),
with_head=True,
out_norm=True
)[0] # return (vision_cls, text_cls)
The with_head
parameter controls the projection head at the last layer. Set this parameter to True
when performing image-text retrieval. For tasks like image classification or multiple instance learning (MIL), you can disable it by setting it to False
. The out_norm
parameter handles output normalization and is enabled by default (True
).
- Encode text with MUSK (refer to
demo.ipynb
for complete implementation)
tokenizer = XLMRobertaTokenizer("./musk/models/tokenizer.spm")
text = ['histopathology image of lung adenocarcinoma']
txt_ids, pad = xlm_tokenizer(txt, tokenizer, max_len=100)
with torch.inference_mode():
text_embeddings = model(
text_description=txt_ids,
padding_mask=pad,
with_head=True,
out_norm=True
)[1] # return (vision_cls, text_cls)
Both with_head
and out_norm
should keep the same settings as those used in image encoding.
Model Pretraining
Masked pretraining instructions.
Contrastive pretraining instructions.
Evaluation on Cancer Diagnosis/Detection
Please refer to ./benchmarks/demo.ipynb
for a demonstration.
This section reproduces the results of cancer diagnosis/detection benchmarks, including image-text retrieval, image classification, image-image retrieval, and more. The evaluation code is all-in-one which adapted from the CLIP Benchmark.
The evaluated dataset includes:
- PathMMU is available at https://huggingface.co/datasets/jamessyx/PathMMU.
- BookSet and PubmedSet are available at https://warwick.ac.uk/fac/cross_fac/tia/data/arch.
- PatchCamelyon can be accessed at https://patchcamelyon.grand-challenge.org/.
- NCT-CRC-HE-100K dataset is available at https://zenodo.org/record/1214456.
- SICAPv2 can be downloaded from https://data.mendeley.com/datasets/9xxm58dvs3/1.
- Osteo dataset is available at https://www.cancerimagingarchive.net/collection/osteosarcoma-tumor-assessment/.
- RenalCell can be downloaded from https://zenodo.org/records/6528599.
- SkinCancer is accessible at https://www.isic-archive.com/.
- LC25000 dataset is available for download at https://github.com/tampapath/lung_colon_image_set.
- PanNuke can be accessed at https://warwick.ac.uk/fac/cross_fac/tia/data/pannuke.
- UniToPatho dataset is available at https://ieee-dataport.org/open-access/unitopatho.
- WSSS4LUAD can be downloaded from https://wsss4luad.grand-challenge.org/WSSS4LUAD/.
- BRACS datasets for 3 and 6 classes are available for download at https://www.bracs.icar.cnr.it/.
First, download the necessary datasets. For demonstrations, we provide the example datasets here. Download and unzip it to a local path, for example /root/user/data/downstreams_demo
, then, change the directory path dataset_root=/root/user/data/downstreams_demo
. The code will automatically extract features and perform evaluations.
The main file is clip_benchmark.cli
and includes the following options:
--pretrained_model
: Specifies the model name and the path to its weights.--dataset
: Indicates the evaluation dataset(s); multiple datasets can be specified.--dataset_root
: The root of datasets.--task
: Defines the evaluation task.--batch_size
: Sets the batch size for feature extraction.--output
: Specifies where to save the output results.
Set the models.txt
file with entries in the format: (model_name, model_path)
. For example, if you want to run both MUSK and CONCH for comparison, your models.txt
might look like this:
musk_large_patch16_384,hf_hub:xiangjx/musk
conch,/path/to/conch.pt
Alternatively, you can remove the CONCH entry and run MUSK alone.
Here are some example commands:
# >>>>>>>>>>> zero-shot cross-modal retrieval >>>>>>>>>>> #
python3 -m clip_benchmark.cli eval --pretrained_model models.txt \
--dataset "pathmmu_retrieval" \
--task "zeroshot_retrieval" \
--batch_size 512 \
--num_workers 16 \
--seed 42 \
--recall_k 1 10 50 \
--dataset_root "/root/user/data/downstreams_demo" \
--output "./results/benchmark_mm_retrieval.json"
# >>>>>>>>>>> few-shot linear probe >>>>>>>>>>> #
for k_shot in "${shot_list[@]}"
do
for seed in "${seed_list[@]}"
do
python3 -m clip_benchmark.cli eval --pretrained_model models.txt \
--dataset "nct_crc" "pcam" "skin" "sicap" "pannuke" "unitopatho" "wsss4luad" "osteo" "lc25" "renal_cell" "bracs6cls" "bracs3cls" \
--task "linear_probe" \
--batch_size 512 \
--num_workers 16 \
--fewshot_k $k_shot \
--seed $seed \
--dataset_root "/root/user/data/downstreams_demo" \
--output "./results/benchmark_fs_${k_shot}shot_seed${seed}.json"
done
done
# >>>>>>>>>>> zero-shot image2image retrieval >>>>>>>>>>> #
python3 -m clip_benchmark.cli eval --pretrained_model models.txt \
--dataset "unitopatho_retrieval" "bracs_retrieval" \
--task "image_retrieval" \
--batch_size 512 \
--num_workers 16 \
--seed 41 \
--dataset_root "/root/user/data/downstreams_demo" \
--output "./results/benchmark_image_retrieval.json"
and more tasks in ./benchmarks/demo.ipynb
.
Acknowledgements
The project was built on top of many open-source repositories such as Quilt1M (training data image-text pairs), torchscale (model implementation), accelerate (model pretraining), deepspeed (model pretraining), pytorch-lightning (downstream finetuning), and CLIP Benchmark (model evaluation). We thank the authors and developers for their contributions.
Issues
- Please open new threads or address all questions to [email protected] or [email protected]
License
This model and associated code are released under the CC-BY-NC-ND 4.0 license and may only be used for non-commercial, academic research purposes with proper attribution. Any commercial use, sale, or other monetization of the MUSK model and its derivatives, which include models trained on outputs from the MUSK model or datasets created from the MUSK model, is prohibited and requires prior approval.
- Downloads last month
- 0