LKCell / docs /readmes /pannuke.md
xiazhi1
initial commit
aea73e2

A newer version of the Gradio SDK is available: 5.1.0

Upgrade

PanNuke Preparation

The original PanNuke dataset has the following style using just one big array for each dataset split:

β”œβ”€β”€ fold0
β”‚   β”œβ”€β”€ images.npy
β”‚   β”œβ”€β”€ masks.npy
β”‚   └── types.npy
β”œβ”€β”€ fold1
β”‚   β”œβ”€β”€ images.npy
β”‚   β”œβ”€β”€ masks.npy
β”‚   └── types.npy
└── fold2
    β”œβ”€β”€ images.npy
    β”œβ”€β”€ masks.npy
    └── types.npy

For memory efficieny and to make us of multi-threading dataloading with our augmentation pipeline, we reassemble the dataset to the following structure:

β”œβ”€β”€ fold0
β”‚   β”œβ”€β”€ cell_count.csv      # cell-count for each image to be used in sampling
β”‚   β”œβ”€β”€ images              # H&E Image for each sample as .png files
β”‚   β”œβ”€β”€ images
β”‚   β”‚   β”œβ”€β”€ 0_0.png
β”‚   β”‚   β”œβ”€β”€ 0_1.png
β”‚   β”‚   β”œβ”€β”€ 0_2.png
...
β”‚   β”œβ”€β”€ labels              # label as .npy arrays for each sample
β”‚   β”‚   β”œβ”€β”€ 0_0.npy
β”‚   β”‚   β”œβ”€β”€ 0_1.npy
β”‚   β”‚   β”œβ”€β”€ 0_2.npy
...
β”‚   └── types.csv           # csv file with type for each image
β”œβ”€β”€ fold1
β”‚   β”œβ”€β”€ cell_count.csv
β”‚   β”œβ”€β”€ images
β”‚   β”‚   β”œβ”€β”€ 1_0.png
...
β”‚   β”œβ”€β”€ labels
β”‚   β”‚   β”œβ”€β”€ 1_0.npy
...
β”‚   └── types.csv
β”œβ”€β”€ fold2
β”‚   β”œβ”€β”€ cell_count.csv
β”‚   β”œβ”€β”€ images
β”‚   β”‚   β”œβ”€β”€ 2_0.png
...  
β”‚   β”œβ”€β”€ labels  
β”‚   β”‚   β”œβ”€β”€ 2_0.npy
...  
β”‚   └── types.csv  
β”œβ”€β”€ dataset_config.yaml     # dataset config with dataset information
└── weight_config.yaml      # config file for our sampling

We provide all configuration files for the PanNuke dataset in the configs/datasets/PanNuke folder. Please copy them in your dataset folder. Images and masks have to be extracted using the cell_segmentation/datasets/prepare_pannuke.py script.