aimagelab / mil4wsi Goto Github PK

DAS-MIL: Distilling Across Scales for MILClassification of Histological WSIs

License: MIT License

Python 99.95% Shell 0.05%

mil4wsi's Introduction

Introduction

Welcome to the mil4wsi Framework – your gateway to state-of-the-art Multiple Instance Learning (MIL) model implementations for gigapixel whole slide images. This comprehensive open-source repository empowers researchers, developers, and enthusiasts to explore and leverage cutting-edge MIL techniques.

Installation

conda create -n wsissl python=3.9
conda activate wsissl
conda env update --file environment.yml

Data Preprocessing

This work uses CLAM to filter out background patches. After the .h5 coordinate generation, use:

H5-to-jpg: It converts .h5 coordinates into jpg images
Sort images: It reorganizes patches into hierarchical folders
Dino Training: Given the patches, train dino with the vit_small option
Feature Extraction: It extracts patch features and adjacency matrices
Geometric Dataset Conversion: It allows to work with graphs architectures and PyTorch geometric

Available Models

MaxPooling
MeanPooling
ABMIL
DSMIL
DASMIL
BUFFERMIL
TRANSMIL
HIPT

DASMIL

@inproceedings{Bontempo2023_MICCAI,
    author={Bontempo, Gianpaolo and Porrello, Angelo and Bolelli, Federico and Calderara, Simone and Ficarra, Elisa},
    title={{DAS-MIL: Distilling Across Scales for MIL Classification of Histological WSIs}},
    booktitle={Medical Image Computing and Computer Assisted Intervention – MICCAI 2023},
    pages={248--258},
    year=2023,
    month={Oct},
    publisher={Springer},
    doi={https://doi.org/10.1007/978-3-031-43907-0_24},
    isbn={978-3-031-43906-3}
}


@ARTICLE{Bontempo2024_TMI,
  author={Bontempo, Gianpaolo and Bolelli, Federico and Porrello, Angelo and Calderara, Simone and Ficarra, Elisa},
  journal={IEEE Transactions on Medical Imaging}, 
  title={A Graph-Based Multi-Scale Approach With Knowledge Distillation for WSI Classification}, 
  year={2024},
  volume={43},
  number={4},
  pages={1412-1421},
  keywords={Feature extraction;Proposals;Spatial resolution;Knowledge engineering;Graph neural networks;Transformers;Prediction algorithms;Whole slide images (WSIs);multiple instance learning (MIL);(self) knowledge distillation;weakly supervised learning},
  doi={10.1109/TMI.2023.3337549}}

Training

python main.py --datasetpath DATASETPATH --dataset [cam or lung]

Reproducibility

Pretrained models

DINO Camelyon16	DINO LUNG
x5 ~0.65GB	x5 ~0.65GB
x10 ~0.65GB	x10 ~0.65GB
x20 ~0.65GB	x20 ~0.65GB

DASMIL Camelyon16	DASMIL LUNG
model ~9MB	model ~15MB
ACC: 0.945	ACC: 0.92
AUC: 0.967	AUC: 0.966

Pytorch Geometric - Extracted Features

Camelyon16	LUNG
Dataset ~4.25GB	Dataset ~17.5GB

Eval

setup checkpoints and datasets paths in utils/experiment.py then

python eval.py --datasetpath DATASETPATH --checkpoint CHECKPOINTPATH --dataset [cam or lung]

Contributing

We encourage and welcome contributions from the community to help improve the MIL Models Framework and make it even more valuable for the entire machine-learning community.

mil4wsi's People

Contributors

Stargazers

Watchers

Forkers

efunimore senonets wangbo00129 francescamiccolis

mil4wsi's Issues

About TCGA-Lung dataset

Thank you for the excellent work you have posted. I would like to know please what is models.selectModel

PREPROCESSING OF CAMELYON16 DATASET USING CLAM

Hi, i was trying to reproduce the results u have shown with CAMELYON 16 dataset. i have used the configuration u have provided in 0-extract_patches to run with create_pathes_fp.py

seg_level,sthresh,mthresh,close,use_otsu,a_t,a_h,max_n_holes,vis_level,line_thickness,white_thresh,black_thresh,use_padding,contour_fn,keep_ids,exclude_ids
-1,8,7,4,TRUE,25,4,8,-1,100,5,50,TRUE,four_pt,none,none

but i am getting around 1200 patches at level 1 compared to 5771 reported in the paper. and similarly for level 2 i got 400 images compared to 1528 as mentioned in paper. am i doing something wrong? do i need to change he config file?

also i think the CLAM repo is modified so i am getting some error while using convert_h5_to_jpg.py with CLAM. can u look into that and help me?

Downloading Weights and Error 403

All your weights and embeddings return error 403 and can not be downloaded.

Question about preprocessing

Hello, I'm trying to preprocess my private dataset. Before that, I had a try on TCGA-78-8662-01Z-00-DX1, which is already processed in lungGraph_13/processed/train/data_1.

According to the properties of the svs, I rerun CLAM using 512, 1024 and 2048 patch sizes.

However, I found the h5 from patch size 512 is 21281, which differs from your patch number of level 3.

Could you help me?

ISSUE WITH 1-sort_images for CAMELYON16 dataset

i am trying to sort the images using sort_hierarchy.py but its showing the following error:

/home/thomas/.conda/envs/wsissl/lib/python3.10/site-packages/submitit/core/core.py:628: UserWarning: Received an empty job array
warnings.warn("Received an empty job array")

does this code require a slide_properties.csv file to be present in the same directory? this seems to be missing. or do we have to create it based on metadata of dataset?
2.is the an issue related to the format of output patches? i am attaching a screenshot of the output patches format after converting to jpg

Data preprocessing issues

Hello, I encounter the submitit.core.utils.UncompletedJobError: Job not requeued because: timed-out and not checkpointable. error when running the convert_h5_to_jpg.py code. How can I resolve this issue?

Question about reproducing

Thank you for your great work. I'm trying to reproduce the exact results using your pre-generated pt files and failed to reach your accuracy.
This is the script I used.

CUDA_VISIBLE_DEVICES=1 python main.py --datasetpath /home/wangb/projects/20240226_reproduce_das_mil/data/camGraph_23 --
dataset cam

Are there any suggestions about how to improve it? Thank you.

data preprocessing/Dino part

First of all thanks for your informative and unique method in your recent paper. I am wondering if you could answer my question.
Thanks in advance. :)

In data preprocessing" section in Extract Dino Features in the "run_with_submitit.py" script what are the "pretrained_weights1, pretrained_weights2, pretrained_weights3" ?
are they the pretrained camelyon 16 and lung? if we want to use different pretrain weghts like ResNet50, how we could use them in this script (run_with_submitit.py)?
Moreover, should we alter the "main_dino.py" based on ResNet as well? (as you mentioned in the codes for dino training)

FROC metric calculation

Hello! Thanks for your great work!

Could you share code for generating an FROC curve? Additionally, I'm interested in understanding whether the FROC calculation involves plotting curves for all bags and then averaging them or directly calculating FROC for individual instances across all bags.

solve virtual environment

Thank you for your work and contribution. Following the installation steps in readme, I use the command "mamba env update -- file environment. yml". there were packages that could not be resolved and did not exist when I was solving virtual environment dependencies. May I ask how I should solve it?