bmirds / deepslide Goto Github PK

Code for the Nature Scientific Reports paper "Pathologist-level classification of histologic patterns on resected lung adenocarcinoma slides with deep neural networks." A sliding window framework for classification of high resolution whole-slide images, often microscopy or histopathology images.

Home Page: https://www.nature.com/articles/s41598-019-40041-7

License: GNU General Public License v3.0

Python 98.81% Shell 1.19%

histopathology-images microscopy sliding-windows medical-image-analysis resnet wsi lung cancer pathology-image

deepslide's Introduction

DeepSlide: A Sliding Window Framework for Classification of High Resolution Microscopy Images (Whole-Slide Images)

This repository is a sliding window framework for classification of high resolution whole-slide images, often called microscopy or histopathology images. This is also the code for the paper Pathologist-level Classification of Histologic Patterns on Resected Lung Adenocarcinoma Slides with Deep Neural Networks. For a practical guide and implementation tips, see the Medium post Classification of Histopathology Images with Deep Learning: A Practical Guide.

We have made 143 digitized high-resolution histology slides of lung adenocarcinoma in the test set and their predominant subtypes according to the consensus opinion of three pathologists at Dartmouth-Hitchcock Medical Center publicly available. More information about this dataset and instructions on how to download are provided on the dataset webpage.

For questions about our code, please open an issue on this code repository.

Requirements

Installing Dependencies (Recommended method)

conda env create --file setup/conda_env.yaml

This command creates a conda environment called 'deepslide_env' with Python 3.9 and PyTorch with CUDA 11.3. Please modify the environment file(s) for other versions.

In addition, install_openslide.sh installs dependencies of OpenSlide package in Ubuntu. For other platforms, please visit to the OpenSlide's official website for more information.

Usage

Take a look at code/config.py before you begin to get a feel for what parameters can be changed.

1. Train-Val-Test Split:

Splits the data into a validation and test set. Default validation whole-slide images (WSI) per class is 20 and test images per class is 30. You can change these numbers by changing the --val_wsi_per_class and --test_wsi_per_class flags at runtime. You can skip this step if you did a custom split (for example, you need to split by patients).

python code/1_split.py

If you do not want to duplicate the data, append --keep_orig_copy False to the above command.

Inputs: all_wsi

Outputs: wsi_train, wsi_val, wsi_test, labels_train.csv, labels_val.csv, labels_test.csv

Note that all_wsi must contain subfolders of images labeled by class. For instance, if your two classes are a and n, you must have a/*.jpg with the images in class a and n/*.jpg with images in class n.

If you already have a patch-based preprocessed dataset, you may skip to Stage 3 for model training. Please make sure that at least:

all_wsi has a folder for each class as a placeholder (they can be empty).
Both train_folder/train and train_folder/val folders contain a folder for each class and each slide that belongs to its partition. The slide folder should contain at least one patch extracted from the slide (e.g., train_folder/train/<class_name>/<slide_name>/<patch_file>).
Review code/config.py and make appropriate/necessary changes for your dataset.

Example

python code/1_split.py --val_wsi_per_class 10 --test_wsi_per_class 20

2. Data Processing

Generate patches for the training set.
Balance the class distribution for the training set.
Generate patches for the validation set.
Generate patches by folder for WSI in the validation set.
Generate patches by folder for WSI in the testing set.

python code/2_process_patches.py

Note that this will take up a significant amount of space. Change --num_train_per_class to be smaller if you wish not to generate as many windows. If your histopathology images are H&E-stained, whitespace will automatically be filtered. Turn this off using the option --type_histopath False. Default overlapping area is 1/3 for test slides. Use 1 or 2 if your images are very large; you can also change this using the --slide_overlap option.

Inputs: wsi_train, wsi_val, wsi_test

Outputs: train_folder (fed into model for training), patches_eval_train (for validation, sorted by WSI), patches_eval_test (for testing, sorted by WSI)

Example

python code/2_process_patches.py --num_train_per_class 20000 --slide_overlap 2

3. Model Training

CUDA_VISIBLE_DEVICES=0 python code/3_train.py

We recommend using ResNet-18 if you are training on a relatively small histopathology dataset. You can change hyperparameters using the argparse flags. There is an option to retrain from a previous checkpoint. Model checkpoints are saved by default every epoch in checkpoints.

Inputs: train_folder

Outputs: checkpoints, logs

Example

CUDA_VISIBLE_DEVICES=0 python code/3_train.py --batch_size 32 --num_epochs 100 --save_interval 5

4. Testing on WSI

Run the model on all the patches for each WSI in the validation and test set.

CUDA_VISIBLE_DEVICES=0 python code/4_test.py

We automatically choose the model with the best validation accuracy. You can also specify your own. You can change the thresholds used in the grid search by specifying the threshold_search variable in code/config.py.

Inputs: patches_eval_val, patches_eval_test

Outputs: preds_val, preds_test

Example

CUDA_VISIBLE_DEVICES=0 python code/4_test.py --auto_select False

5. Searching for Best Thresholds

The simplest way to make a whole-slide inference is to choose the class with the most patch predictions. We can also implement thresholding on the patch level to throw out noise. To find the best thresholds, we perform a grid search. This function will generate csv files for each WSI with the predictions for each patch.

python code/5_grid_search.py

Inputs: preds_val, labels_val.csv

Outputs: inference_val

Example

python code/5_grid_search.py --preds_val different_labels_val.csv

6. Visualization

A good way to see what the network is looking at is to visualize the predictions for each class.

python code/6_visualize.py

Inputs: wsi_val, preds_val

Outputs: vis_val

You can change the colors in colors in code/config.py

Example

python code/6_visualize.py --vis_test different_vis_test_directory

7. Final Testing

Do the final testing to compute the confusion matrix on the test set.

python code/7_final_test.py

Inputs: preds_test, labels_test.csv, inference_val and labels_val (for the best thresholds)

Outputs: inference_test and confusion matrix to stdout

Example

python code/7_final_test.py --labels_test different_labels_test.csv

Best of luck.

Quick Run

If you want to run all code and change the default parameters in code/config.py, run

sh code/run_all.sh

and change the desired flags on each line of the code/run_all.sh script.

Pre-Processing Scripts

See code/z_preprocessing for some code to convert images from svs into jpg. This uses OpenSlide and takes a while. How much you want to compress images will depend on the resolution that they were originally scanned, but a guideline that has worked for us is 3-5 MB per WSI.

Known Issues and Limitations

Only 1 GPU supported.
Should work, but not tested on Windows.
In cases where no crops are found for an image, empty directories are created. Current workaround uses try and except statements to catch errors.
Image reading code expects colors to be in the RGB space. Current workaround is to keep first 3 channels.
This code will likely work better when the labels are at the tissue level. It will still work for the entire WSI, but results may vary.

Still not working? Consider the following...

Ask a pathologist to look at your visualizations.
Make your own heuristic for aggregating patch predictions to determine the WSI-level classification. Often, a slide thats 20% abnormal and 80% normal should be classified as abnormal.
If each WSI can have multiple types of lesions/labels, you may need to annotate bounding boxes around these.
Did you pre-process your images? If you used raw .svs files that are more than 1GB in size, its likely that the patches are way too zoomed in to see any cell structures.
If you have less than 10 WSI per class in the training set, obtain more.
Feel free to view our end-to-end attention-based model in JAMA Network Open: https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2753982.

Future Work

Contributions to this repository are welcome.
Code for generating patches on the fly instead of storing them in memory for training and testing would save a lot of disk space.
If you have issues, please post in the issues section and we will do our best to help.

Citations

DeepSlide is an open-source library and is licensed under the GNU General Public License (v3). If you are using this library please cite:

Jason Wei, Laura Tafe, Yevgeniy Linnik, Louis Vaickus, Naofumi Tomita, Saeed Hassanpour, "Pathologist-level Classification of Histologic Patterns on Resected Lung Adenocarcinoma Slides with Deep Neural Networks", Scientific Reports;9:3358 (2019).

deepslide's People

Contributors

Stargazers

Watchers

Forkers

aliysefian pkumaplee rongchangzhao shafiahmed pankajshegokar lazycrazyowl xuzf2016 rameezrehman83 stjordanis giovannivolpe saeedseyyedi daisy5566 riviera2015 biomedicalmachinelearning zanariah8 huhuali1030 renatosc raonyguimaraes chaloemphisit hbcbh1999 zuoyuwei tmkdam asad iamvazu yoghur yaohuaxin mgvel kuycak keniuniu sraja2911 3dimaging jennie106 clemvst sarbanidatta12 sky-xian guidograzioli christinaliang mengdanzhu gj1996pp woodhaha seeker1943 zsj0577 bs3537 guyucowboy patkalie linhduongtuan guangluye anu-bioinfo dishamehra yf817 muralidare milkigit mbluestone coconut0035 humayun skyclub3 amblount raoufsk rlds-107 romil-lodaya mattbixley nik-shvetsov guang000 manugoyal12345 kmkmark gianmarcomidena panxipeng shift093 cai-xvkun peterdonnelly1 kyounghyoun ikevin2810 kmboehm kolawoletech icecube2020 prempatrick vpulim wangtaoknight nikhilkurian cpufxb iact-medical-image-processing mofatuzi wenyizh barejaa dogghou eva4-rs-group jsaenzbimcv dgonzmd lsm-sunny cancer-diagnosis dengjiongshen ankitshah009 saqibmamoon beeqb ikari996 wxphb yangyang117 annabator huangpu1 chendbox

deepslide's Issues

Data Preprocessing error

Hi Guys, thank you for making this wonderful resource available.
I organized my slides into wsi_train, wsi_val, wsi_test using 1_split.py which ran fine. However I keep getting this error when I run code/2_process_patches.py:

wsi_train/neg: 35359.581701MB, 201 images, overlap_factor=1.00
wsi_train/pos: 92751.49675MB, 451 images, overlap_factor=1.00

getting small crops from 201 images in wsi_train/neg with inverse overlap factor 1.00 outputting in train_folder/train/neg
Traceback (most recent call last):
  File "/home/mao4005/.conda/envs/deepslide/lib/python3.6/site-packages/PIL/ImageFile.py", line 101, in __init__
    self._open()
  File "/home/mao4005/.conda/envs/deepslide/lib/python3.6/site-packages/PIL/TiffImagePlugin.py", line 979, in _open
    self._seek(0)
  File "/home/mao4005/.conda/envs/deepslide/lib/python3.6/site-packages/PIL/TiffImagePlugin.py", line 1046, in _seek
    self._setup()
  File "/home/mao4005/.conda/envs/deepslide/lib/python3.6/site-packages/PIL/TiffImagePlugin.py", line 1170, in _setup
    self._compression = COMPRESSION_INFO[self.tag_v2.get(COMPRESSION, 1)]
KeyError: 33003

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "deepslide/2_process_patches.py", line 20, in <module>
    type_histopath=config.args.type_histopath)
  File "/athena/marchionnilab/scratch/lab_data/Mohamed/pca_outcome/deepslide/utils_processing.py", line 155, in gen_train_patches
    type_histopath=type_histopath)
  File "/athena/marchionnilab/scratch/lab_data/Mohamed/pca_outcome/deepslide/utils_processing.py", line 364, in produce_patches
    uri=image_loc if by_folder else input_folder.joinpath(image_loc))
  File "/home/mao4005/.conda/envs/deepslide/lib/python3.6/site-packages/imageio/core/functions.py", line 265, in imread
    reader = read(uri, format, "i", **kwargs)
  File "/home/mao4005/.conda/envs/deepslide/lib/python3.6/site-packages/imageio/core/functions.py", line 186, in get_reader
    return format.get_reader(request)
  File "/home/mao4005/.conda/envs/deepslide/lib/python3.6/site-packages/imageio/core/format.py", line 170, in get_reader
    return self.Reader(self, request)
  File "/home/mao4005/.conda/envs/deepslide/lib/python3.6/site-packages/imageio/core/format.py", line 221, in __init__
    self._open(**self.request.kwargs.copy())
  File "/home/mao4005/.conda/envs/deepslide/lib/python3.6/site-packages/imageio/plugins/pillow.py", line 125, in _open
    self._im = factory(self._fp, "")
  File "/home/mao4005/.conda/envs/deepslide/lib/python3.6/site-packages/PIL/ImageFile.py", line 110, in __init__
    raise SyntaxError(v)
SyntaxError: 33003

These are the packages in my conda env:
_libgcc_mutex 0.1 main conda-forge
_tflow_select 2.3.0 mkl
attrs 19.3.0 py_0 conda-forge
blas 1.0 mkl conda-forge
bzip2 1.0.8 h7b6447c_0
c-ares 1.15.0 h7b6447c_1001
ca-certificates 2021.9.30 h06a4308_1
cairo 1.16.0 h18b612c_1001 conda-forge
certifi 2021.5.30 py36h06a4308_0
cloudpickle 2.0.0 pyhd3eb1b0_0
cpuonly 2.0 0 pytorch
cudatoolkit 10.1.243 h6bb024c_0
cycler 0.10.0 pypi_0 pypi
cytoolz 0.11.0 py36h7b6447c_0
dask-core 2021.3.0 pyhd3eb1b0_0
dataclasses 0.8 pyh4f3eec9_6
dbus 1.13.12 h746ee38_0
decorator 4.4.2 py_0 conda-forge
defusedxml 0.6.0 py_0 conda-forge
entrypoints 0.3 py36_0
et_xmlfile 1.1.0 py36h06a4308_0
expat 2.2.6 he6710b0_0
ffmpeg 4.2.2 h20bf706_0
fontconfig 2.13.1 he4413a7_1000 conda-forge
freetype 2.9.1 h8a8886c_1
fribidi 1.0.9 h516909a_0 conda-forge
gast 0.3.3 py_0 conda-forge
gdk-pixbuf 2.38.2 h3f25603_4 conda-forge
glib 2.63.1 h5a9c865_0
gmp 6.1.2 h6c8ec71_1
gnutls 3.6.15 he1e5248_0
gobject-introspection 1.56.1 py36hbc4ca2d_2
google-pasta 0.2.0 py_0
graphite2 1.3.13 h23475e2_0
gst-plugins-base 1.14.0 hbbd80ab_1
gstreamer 1.14.0 hb453b48_1
h5py 2.10.0 py36h7918eee_0
harfbuzz 2.4.0 h37c48d4_1 conda-forge
hdf5 1.10.4 hb1b8bf9_0
icu 58.2 h9c2bf20_1
imageio 2.9.0 pyhd3eb1b0_0
importlib-metadata 4.8.1 py37h89c1867_0 conda-forge
intel-openmp 2019.4 243
ipython_genutils 0.2.0 pyhd3eb1b0_1
jinja2 2.11.1 py_0 conda-forge
jpeg 9d h36c2ea0_0 conda-forge
jupyter_client 6.1.0 py_0 conda-forge
jupyter_core 4.8.1 py36h06a4308_0
keras-applications 1.0.8 py_0
keras-preprocessing 1.1.0 py_1
kiwisolver 1.3.1 py36h2531618_0
lame 3.100 h7b6447c_0
ld_impl_linux-64 2.33.1 h53a641e_7 conda-forge
libblas 3.8.0 14_mkl conda-forge
libcroco 0.6.13 h8d621e5_0 conda-forge
libedit 3.1.20181209 hc058e9b_0
libffi 3.2.1 hd88cf55_4
libgcc 7.2.0 h69d50b8_2 conda-forge
libgcc-ng 9.1.0 hdf63c60_0
libgfortran-ng 7.3.0 hdf63c60_0
libiconv 1.15 h516909a_1006 conda-forge
libidn2 2.3.2 h7f8727e_0
liblapack 3.8.0 14_mkl conda-forge
libopus 1.3.1 h7b6447c_0
libpng 1.6.37 hbc83047_0
libprotobuf 3.11.4 hd408876_0
librsvg 2.46.2 h33a7fed_1 conda-forge
libsodium 1.0.16 h1bed415_0
libstdcxx-ng 9.1.0 hdf63c60_0
libtasn1 4.16.0 h27cfd23_0
libtiff 4.1.0 h2733197_0
libunistring 0.9.10 h27cfd23_0
libuuid 2.32.1 h14c3975_1000 conda-forge
libuv 1.40.0 h7b6447c_0
libvpx 1.7.0 h439df22_0
libxcb 1.13 h1bed415_1
libxml2 2.9.9 hea5a465_1
markupsafe 2.0.1 py36h27cfd23_0
matplotlib 3.3.4 pypi_0 pypi
matplotlib-base 3.2.1 py36hef1b27d_0
mkl 2019.4 243
mkl-service 2.3.0 py36he8ac12f_0
mkl_fft 1.3.0 py36h54f3939_0
mkl_random 1.1.0 py36hd6b4f25_0
nbconvert 5.6.1 py37_0 conda-forge
nbformat 5.0.4 py_0 conda-forge
ncurses 6.2 he6710b0_0
nettle 3.7.3 hbbd107a_1
networkx 2.5.1 pyhd3eb1b0_0
notebook 6.0.1 py37_0 conda-forge
numpy 1.19.5 pypi_0 pypi
numpy-base 1.19.2 py36hfa32c7d_0
olefile 0.46 pyhd3eb1b0_0
openh264 2.1.0 hd408876_0
openjpeg 2.3.1 h981e76c_3 conda-forge
openpyxl 3.0.9 pyhd3eb1b0_0
openslide 3.4.1 h8137273_0 conda-forge
openssl 1.1.1l h7f8727e_0
packaging 21.0 pyhd8ed1ab_0 conda-forge
pandoc 2.2.3.2 0
pango 1.42.4 h7062337_3 conda-forge
parso 0.6.2 py_0 conda-forge
pcre 8.43 he6710b0_0
pillow 5.3.0 py36h34e0f95_0
pip 21.2.4 pyhd8ed1ab_0 conda-forge
pixman 0.38.0 h7b6447c_0
prometheus_client 0.7.1 py_0 conda-forge
prompt-toolkit 3.0.4 py_0 conda-forge
prompt_toolkit 3.0.4 0 conda-forge
protobuf 3.11.4 py36he6710b0_0
pycparser 2.20 py_0 conda-forge
pygments 2.6.1 py_0 conda-forge
pyparsing 2.4.6 py_0 conda-forge
python 3.6.10 hcf32534_1
python-dateutil 2.8.1 py_0 conda-forge
python_abi 3.6 2_cp36m conda-forge
pytorch 1.10.0 py3.6_cpu_0 pytorch
pytorch-mutex 1.0 cpu pytorch
pytz 2019.3 py_0 conda-forge
pywavelets 1.1.1 py36h7b6447c_2
pyyaml 5.4.1 py36h27cfd23_1
pyzmq 18.1.1 py36he6710b0_0
qt 5.9.7 h5867ecd_1
readline 8.0 h7b6447c_0
scikit-image 0.17.2 pypi_0 pypi
scipy 1.5.2 py36h0b6359f_0
setuptools 58.0.4 py36h06a4308_0
six 1.16.0 pyhd3eb1b0_0
sqlite 3.31.1 h7b6447c_0
testpath 0.4.4 py_0 conda-forge
tifffile 2020.9.3 pypi_0 pypi
tk 8.6.8 hbc83047_0
toolz 0.11.1 pyhd3eb1b0_0
torchaudio 0.10.0 py36_cpu [cpuonly] pytorch
torchvision 0.11.1 py36_cpu [cpuonly] pytorch
tornado 6.1 py36h27cfd23_0
traitlets 4.3.3 py36h06a4308_0
typing_extensions 3.10.0.2 pyh06a4308_0
wcwidth 0.1.8 py_0 conda-forge
webencodings 0.5.1 py_1 conda-forge
werkzeug 1.0.0 py_0 conda-forge
wheel 0.37.0 pyhd8ed1ab_1 conda-forge
x264 1!157.20191217 h7b6447c_0
xlrd 2.0.1 pyhd3eb1b0_0
xorg-kbproto 1.0.7 h14c3975_1002 conda-forge
xorg-libice 1.0.10 h516909a_0 conda-forge
xorg-libsm 1.2.3 h84519dc_1000 conda-forge
xorg-libx11 1.6.9 h516909a_0 conda-forge
xorg-libxext 1.3.4 h516909a_0 conda-forge
xorg-libxrender 0.9.10 h516909a_1002 conda-forge
xorg-renderproto 0.11.1 h14c3975_1002 conda-forge
xorg-xextproto 7.3.0 h14c3975_1002 conda-forge
xorg-xproto 7.0.31 h14c3975_1007 conda-forge
xz 5.2.4 h14c3975_4
yaml 0.2.5 h7b6447c_0
zeromq 4.3.1 he6710b0_3
zipp 2.2.0 py_0 conda-forge
zlib 1.2.11 h7b6447c_3
zstd 1.3.7 h0b5b093_0

Question about path_mean and path_std

Extra JPG tile creation when width/height of SVS divisible by window size

I ran into an error while creating JPG tiles from one of my SVS images in 2_svs_to_jpg_tiles.py. One dimension of my image is divisible by the window size for the JPG tile output, so the code was trying to create an extra tile with nothing in it (e.g. begin_x = 20000 and end_x = 20000).

The error seems to stem from these lines in the code, where the number of x and y increments are found:

increment_x = int(width/window_size) + 1
increment_y = int(height/window_size) + 1

If either the width or the height of the SVS image is divisible by the window size then there will be one extra x increment or y increment.

I was thinking these cases could be accounted for by adding:

increment_x = int(width/window_size) + 1 if width%window_size != 0 else int(width/window_size)
increment_y = int(height/window_size) + 1 if height%window_size != 0 else int(height/window_size)

Seems like an easy fix, but let me know if I'm missing something - thanks!

FileNotFoundError: [WinError 3]

I am using windows 10, xenon processor, 64GB RAM, Quadro P4000 Graphics and cloned this repository in conda environment created for Pytorch especially.

When I run the 1_split.py file, I get following error:

FileNotFoundError: [WinError 3] The system cannot find the path specified: 'all_wsi'

Where do I need to store the wsi data? I can't get it.

Single Layer TIF format

What would you recommend for splitting up single layer TIF file into multiple jpeg tiles? I'm trying to run your preprocessing script, but am getting:

openslide.OpenSlide('X.tif')
Traceback (most recent call last):
...
openslide.lowlevel.OpenSlideUnsupportedFormatError: Unsupported or missing image file

I read somewhere that this command would still work for multilayer TIFs.

Thanks!

Crop the WSI

Before the Pathologists annotate the crops, how did you crop the whole slide image? Is there any tool used?

Model checkpoints

I know your patient data is sensitive and not so easy to share. Would you be willing to share a model checkpoint? (With mean/std normalization)

the rest of patches

there must be some patches that are not one of the five subtypes in a whole-slide image, how to deal with these patches??

Datasets request

Hi~I've read your paper, I'm really interested in classification of histologic patterns in lung adenocarcinoma,
I'm wondering if you could share the anonymized version of this dataset. Thanks a lot.

question about method

First, thanks for your hard work and share the entire code on github. I was just begain to studing this process and gain mauch from your work!
I tried to run your code. I see that you don't use the labeling information of the lesion. You are using WSI_ Is every patch defined as tumor? But tumor_ Most WSI are normal tissues. Is that what you do?

Run the code on CPUs

Hi! Nice work! I would like to know if I could try your code in a cluster without GPUs. Is there any parameter that I should change? Thanks!

RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

When I'm running 2_process_patches.py, I'm getting the following runtime error. I've tried running both:

$ python code/2_process_patches.py --num_workers 1
$ python code/2_process_patches.py --num_workers 0

I solved an issue in 1_split.py by setting num_workers=0 in line 99 of compute_stats.py but when I run 2_process_patches.py with num_workers as 0, the code just doesn't move after it displays "Generating training patches".

Do you have any suggestions as to what I can do about this error?

`
+++++ Running 2_process_patches.py +++++

----- Generating training patches -----
[WindowsPath('wsi_train/0'), WindowsPath('wsi_train/1')] subfolders found from wsi_train
wsi_train\0: 3394.777501MB, 9 images, overlap_factor=1.00
wsi_train\1: 6016.922732MB, 16 images, overlap_factor=1.00

getting small crops from 9 images in wsi_train\0 with inverse overlap factor 1.00 outputting in train_folder\train\0
forrtl: error (200): program aborting due to control-C event
Image PC Routine Line Source
libifcoremd.dll 00007FFA2772DF54 Unknown Unknown Unknown
KERNELBASE.dll 00007FFA637929E3 Unknown Unknown Unknown
KERNEL32.DLL 00007FFA645F7344 Unknown Unknown Unknown
ntdll.dll 00007FFA65C426B1 Unknown Unknown Unknown
PS E:\Image Databases\Deepslide Project\deepslide> python code/2_process_patches.py --num_workers 1
############### CONFIGURATION ###############
all_wsi: all_wsi
val_wsi_per_class: 20
test_wsi_per_class: 30
keep_orig_copy: True
num_workers: 1
patch_size: 224
wsi_train: wsi_train
wsi_val: wsi_val
wsi_test: wsi_test
labels_train: labels_train.csv
labels_val: labels_val.csv
labels_test: labels_test.csv
train_folder: train_folder
patches_eval_train: patches_eval_train
patches_eval_val: patches_eval_val
patches_eval_test: patches_eval_test
num_train_per_class: 80000
type_histopath: True
purple_threshold: 100
purple_scale_size: 15
slide_overlap: 3
gen_val_patches_overlap_factor: 1.5
image_ext: jpg
by_folder: True
color_jitter_brightness: 0.5
color_jitter_contrast: 0.5
color_jitter_saturation: 0.5
color_jitter_hue: 0.2
num_epochs: 20
num_layers: 18
learning_rate: 0.001
batch_size: 16
weight_decay: 0.0001
learning_rate_decay: 0.85
resume_checkpoint: False
save_interval: 1
checkpoints_folder: checkpoints
checkpoint_file: xyz.pt
pretrain: False
log_folder: logs
auto_select: True
preds_train: preds_train
preds_val: preds_val
preds_test: preds_test
inference_train: inference_train
inference_val: inference_val
inference_test: inference_test
vis_train: vis_train
vis_val: vis_val
vis_test: vis_test
device: cpu
classes: ['0', '1']
num_classes: 2
train_patches: train_folder\train
val_patches: train_folder\val
path_mean: [0.0, 0.0, 0.0]
path_std: [0.0, 0.0, 0.0]
resume_checkpoint_path: checkpoints\xyz.pt
log_csv: logs\log_9172023_182335.csv
eval_model: checkpoints\xyz.pt
threshold_search: (0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9)
colors: ('red', 'white', 'blue', 'green', 'purple', 'orange', 'black', 'pink', 'yellow')

#####################################################

C:\Anaconda3\Lib\site-packages\paramiko\transport.py:219: CryptographyDeprecationWarning: Blowfish has been deprecated
"class": algorithms.Blowfish,

+++++ Running 2_process_patches.py +++++

getting small crops from 9 images in wsi_train\0 with inverse overlap factor 1.00 outputting in train_folder\train\0
############### CONFIGURATION ###############
all_wsi: all_wsi
val_wsi_per_class: 20
test_wsi_per_class: 30
keep_orig_copy: True
num_workers: 1
patch_size: 224
wsi_train: wsi_train
wsi_val: wsi_val
wsi_test: wsi_test
labels_train: labels_train.csv
labels_val: labels_val.csv
labels_test: labels_test.csv
train_folder: train_folder
patches_eval_train: patches_eval_train
patches_eval_val: patches_eval_val
patches_eval_test: patches_eval_test
num_train_per_class: 80000
type_histopath: True
purple_threshold: 100
purple_scale_size: 15
slide_overlap: 3
gen_val_patches_overlap_factor: 1.5
image_ext: jpg
by_folder: True
color_jitter_brightness: 0.5
color_jitter_contrast: 0.5
color_jitter_saturation: 0.5
color_jitter_hue: 0.2
num_epochs: 20
num_layers: 18
learning_rate: 0.001
batch_size: 16
weight_decay: 0.0001
learning_rate_decay: 0.85
resume_checkpoint: False
save_interval: 1
checkpoints_folder: checkpoints
checkpoint_file: xyz.pt
pretrain: False
log_folder: logs
auto_select: True
preds_train: preds_train
preds_val: preds_val
preds_test: preds_test
inference_train: inference_train
inference_val: inference_val
inference_test: inference_test
vis_train: vis_train
vis_val: vis_val
vis_test: vis_test
device: cpu
classes: ['0', '1']
num_classes: 2
train_patches: train_folder\train
val_patches: train_folder\val
path_mean: [-5.959930872658708e+24, 6.978466352337589e-43, 5.739718509874451e-42]
path_std: [nan, 8.353721441928427e-22, 2.3957709225658984e-21]
resume_checkpoint_path: checkpoints\xyz.pt
log_csv: logs\log_9172023_182351.csv
eval_model: checkpoints\xyz.pt
threshold_search: (0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9)
colors: ('red', 'white', 'blue', 'green', 'purple', 'orange', 'black', 'pink', 'yellow')

#####################################################

C:\Anaconda3\Lib\site-packages\paramiko\transport.py:219: CryptographyDeprecationWarning: Blowfish has been deprecated
"class": algorithms.Blowfish,

+++++ Running 2_process_patches.py +++++

getting small crops from 9 images in wsi_train\0 with inverse overlap factor 1.00 outputting in train_folder\train\0
Traceback (most recent call last):
File "", line 1, in
File "C:\Anaconda3\Lib\multiprocessing\spawn.py", line 120, in spawn_main
exitcode = _main(fd, parent_sentinel)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Anaconda3\Lib\multiprocessing\spawn.py", line 129, in _main
prepare(preparation_data)
File "C:\Anaconda3\Lib\multiprocessing\spawn.py", line 240, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Anaconda3\Lib\multiprocessing\spawn.py", line 291, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 291, in run_path
File "", line 98, in _run_module_code
File "", line 88, in _run_code
File "E:\Image Databases\Deepslide Project\deepslide\code\2_process_patches.py", line 13, in
gen_train_patches(input_folder=config.args.wsi_train,
File "E:\Image Databases\Deepslide Project\deepslide\code\utils_processing.py", line 144, in gen_train_patches
produce_patches(input_folder=input_subfolder,
File "E:\Image Databases\Deepslide Project\deepslide\code\utils_processing.py", line 408, in produce_patches
p.start()
File "C:\Anaconda3\Lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
^^^^^^^^^^^^^^^^^
File "C:\Anaconda3\Lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Anaconda3\Lib\multiprocessing\context.py", line 336, in _Popen
return Popen(process_obj)
^^^^^^^^^^^^^^^^^^
File "C:\Anaconda3\Lib\multiprocessing\popen_spawn_win32.py", line 45, in init
prep_data = spawn.get_preparation_data(process_obj._name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Anaconda3\Lib\multiprocessing\spawn.py", line 158, in get_preparation_data
_check_not_importing_main()
File "C:\Anaconda3\Lib\multiprocessing\spawn.py", line 138, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Training Issues

@JosephDiPalma I was trying to train my model, however every time I attempt to train it, it appears to think that all the data is of the same class type. I am not sure why this is occurring however I have tried training and testing numerous times and it is the same result every time. Along with that I am not able to generate a confidence value either.

Any ideas as to why this may be happening?

Thank you!

A problem for input

Hello! I'm a new learner for this. I have a question for the input in this resnet and what's your input in the net. Now I have 1070 .svs folders which contain many patches(for example 1_1.tiff),how should I change my folder and patch to adapt to your net. Thanks for your reply.

3_train.py

It seems the network is not learning, just 0.5 for both training and validation accuracies.
The same dataset with other network provides .85 accuracy.
What would be your suggestion?

############### CONFIGURATION ###############
all_wsi: all_wsi
val_wsi_per_class: 20
test_wsi_per_class: 30
keep_orig_copy: True
num_workers: 8
patch_size: 224
wsi_train: wsi_train
wsi_val: wsi_val
wsi_test: wsi_test
labels_train: labels_train.csv
labels_val: labels_val.csv
labels_test: labels_test.csv
train_folder: train_folder
patches_eval_train: patches_eval_train
patches_eval_val: patches_eval_val
patches_eval_test: patches_eval_test
num_train_per_class: 8000
type_histopath: True
purple_threshold: 100
purple_scale_size: 15
slide_overlap: 3
gen_val_patches_overlap_factor: 1.5
image_ext: jpg
by_folder: True
color_jitter_brightness: 0.5
color_jitter_contrast: 0.5
color_jitter_saturation: 0.5
color_jitter_hue: 0.2
num_epochs: 20
num_layers: 18
learning_rate: 0.001
batch_size: 16
weight_decay: 0.0001
learning_rate_decay: 0.85
resume_checkpoint: False
save_interval: 1
checkpoints_folder: checkpoints
checkpoint_file: xyz.pt
pretrain: False
log_folder: logs
auto_select: True
preds_train: preds_train
preds_val: preds_val
preds_test: preds_test
inference_train: inference_train
inference_val: inference_val
inference_test: inference_test
vis_train: vis_train
vis_val: vis_val
vis_test: vis_test
device: cpu
classes: ['benign', 'tumor']
num_classes: 2
train_patches: train_folder/train
val_patches: train_folder/val
path_mean: [3.2604160257049264e-12, 1.7743442033415775e+28, 2.053507277676774e-19]
path_std: [nan, nan, nan]
resume_checkpoint_path: checkpoints/xyz.pt
log_csv: logs/log_262022_213727.csv
eval_model: checkpoints/xyz.pt
threshold_search: (0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9)
colors: ('red', 'white', 'blue', 'green', 'purple', 'orange', 'black', 'pink', 'yellow')

#####################################################

+++++ Running 3_train.py +++++
2 classes: ['benign', 'tumor']
num train images 1056
num val images 48
CUDA is_available: False
train_folder: train_folder
num_epochs: 20
num_layers: 18
learning_rate: 0.001
batch_size: 16
weight_decay: 0.0001
learning_rate_decay: 0.85
resume_checkpoint: False
resume_checkpoint_path (only if resume_checkpoint is true): checkpoints/xyz.pt
save_interval: 1
output in checkpoints_folder: checkpoints
pretrain: False
log_csv: logs/log_262022_213727.csv

Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Epoch 0 with lr 0.000850000000000: t_loss: nan t_acc: 0.5000 v_loss: nan v_acc: 0.5000

Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Epoch 1 with lr 0.000722500000000: t_loss: nan t_acc: 0.5000 v_loss: nan v_acc: 0.5000

Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Epoch 2 with lr 0.000614125000000: t_loss: nan t_acc: 0.5000 v_loss: nan v_acc: 0.5000

Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Epoch 3 with lr 0.000522006250000: t_loss: nan t_acc: 0.5000 v_loss: nan v_acc: 0.5000

Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Epoch 4 with lr 0.000443705312500: t_loss: nan t_acc: 0.5000 v_loss: nan v_acc: 0.5000

Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Epoch 5 with lr 0.000377149515625: t_loss: nan t_acc: 0.5000 v_loss: nan v_acc: 0.5000

Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Epoch 6 with lr 0.000320577088281: t_loss: nan t_acc: 0.5000 v_loss: nan v_acc: 0.5000

Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Epoch 7 with lr 0.000272490525039: t_loss: nan t_acc: 0.5000 v_loss: nan v_acc: 0.5000

Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Epoch 8 with lr 0.000231616946283: t_loss: nan t_acc: 0.5000 v_loss: nan v_acc: 0.5000

Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Epoch 9 with lr 0.000196874404341: t_loss: nan t_acc: 0.5000 v_loss: nan v_acc: 0.5000

Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Epoch 10 with lr 0.000167343243690: t_loss: nan t_acc: 0.5000 v_loss: nan v_acc: 0.5000

Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Epoch 11 with lr 0.000142241757136: t_loss: nan t_acc: 0.5000 v_loss: nan v_acc: 0.5000

Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Epoch 12 with lr 0.000120905493566: t_loss: nan t_acc: 0.5000 v_loss: nan v_acc: 0.5000

Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Epoch 13 with lr 0.000102769669531: t_loss: nan t_acc: 0.5000 v_loss: nan v_acc: 0.5000

Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Epoch 14 with lr 0.000087354219101: t_loss: nan t_acc: 0.5000 v_loss: nan v_acc: 0.5000

Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Epoch 15 with lr 0.000074251086236: t_loss: nan t_acc: 0.5000 v_loss: nan v_acc: 0.5000

Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Epoch 16 with lr 0.000063113423301: t_loss: nan t_acc: 0.5000 v_loss: nan v_acc: 0.5000

Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Epoch 17 with lr 0.000053646409806: t_loss: nan t_acc: 0.5000 v_loss: nan v_acc: 0.5000

Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Epoch 18 with lr 0.000045599448335: t_loss: nan t_acc: 0.5000 v_loss: nan v_acc: 0.5000

Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Predicted benign tumor
Actual
benign 1.00 0.00
tumor 1.00 0.00
Epoch 19 with lr 0.000038759531085: t_loss: nan t_acc: 0.5000 v_loss: nan v_acc: 0.5000

training complete in 53.00 minutes
+++++ Finished running 3_train.py +++++

About running code from 1st one, I got a problem.

Hi,
I am trying to go through the codes, and try it using my wsi images.
But when I input my wsi folder path and run 1_split.py just like:
python code/1_split.py --all_wsi deepslide-master\HE_datasets --val_wsi_per_class 10 --test_wsi_per_class 20
the result is:

It always add the wsi_train to the head of images path. And tell me "FileNotFoundError: [Errno 2] No such file or directory:"
What should I do? I am fresh man of coding.
Or, could you give me specific instructions (take an example) for running the codes.
Thanks!

About Dartmouth Lung Cancer Histology Dataset

I downloaded the dataset but failed to find out the annotations. There is only whole-slide image (WSI) without annotations. How could I get the annotations made by pathologists?

could you please share your data with me? thank you!

Data preprocessing problem

Hi,

I am having troubles with the preprocessing split/merge scripts. As you see in the attached image, the jpg tiles are not merged properly. All the images that I checked behave similarly. May be some problem in the 3_repiece_jpg_tiles.py?

Thanks

Reduce LR on Plateau

Any chance we can implement this? https://pytorch.org/docs/stable/optim.html#torch.optim.lr_scheduler.ReduceLROnPlateau

Unless the current learning rate decay method works really well in your opinion...

I have also thought seen people start with learning rates of 0.1 or 0.01, although 0.1 might be too high, any thoughts? Just a suggestion

inference instructions; pretrained weights (see https://github.com/BMIRDS/deepslide/issues/35)

Hello!

Thank you for making this repository! Do you think you could provide more detailed instructions on how to run the inference? I found the instructions here https://github.com/BMIRDS/deepslide#4-testing-on-wsi a bit too concise. In particular, I wanted to try running the inference step on one WSI from the dataset to see that everything works with my software/hardware combination.

I would really appreciate it if you could release the pre-trained weights as asked in Issue #35. This will allow just trying your code on one example to see if everything works just on one WSI without training the models from scratch, which in turn will be a good indicator of how much code I can reuse from this repository.

Best wishes,
George Batchkala

Data acquisition help

Hi,
I am a graduate student interested in pathology. I wrote an email to “[email protected]” for anonymous data， but there was no reply. Can you tell me where I can get the data used in the paper？

Thanks.

issue about all_wsi

Hi~Would you tell about the structure of the all_wsi or give an example? I am confused about the input data. Thanks a lot

NameError:

when i run 2_process_patches.py, the error is NameError: name 'config' is not defined

Create subfolders in 'all_wsi'

Hi Wei,

I am a little bit confused about the input for '1_split.py'.

In the script you mentioned that all_wsi needs to include subfolders of images labeled by class. What do you mean by 'class' here? do you mean training, testing or validation?

And as for the image format, I assume it should .svs not .jpg, right?

Parallelize and pipeline preprocessing scripts

Hey there! Awesome package. I was thinking that for some researchers with access to a large amount of slides, some of the tiling / patch creation will take a while if done in series. Just tiling ~150 slides may take a few days, and adding patches even more; preprocessing on the order of one week just for splitting the images is a bit extreme. I recommend adding the option to parallelize most of the preprocessing scripts as well as automate them into a preprocessing pipeline for deployment (I'm sure other groups could build their own internal pipeline).

Could be useful as these datasets become larger.

I'd be happy to help PR.

Bug in print statements

In utils_model.py i think the logging statement should be

writer.write(f"{epoch},{train_loss:.4f},"
                     f"{train_acc:.4f},{val_loss:.4f},{val_acc:.4f}\n")

Recover Original SVS Coordinates from Patches

Hey there, if I wanted to recover the original SVS coordinates for each patch, would this be the right way to go about it? I don't think I've been recovering the right coordinates.

xc and yc are the original compression factors, xl and yl is the length of one of the jpg tiles before patches, xi and yi are the coordinates with respect to each tile, nx and ny are the number of tiles in each direction, dimx and dimy are the sizes of the patches. I'm not sure if my formula is correct..

def image2coords(image_file):
    nx,ny,xi,yi = np.array(image_file.split('/')[-1].split('.')[0].split('_')[1:]).astype(int).tolist()
    return return_image_coord(nx=nx,ny=ny,xi=xi,yi=yi)

def return_image_coord(nx=0,ny=0,xl=3333,yl=3333,xi=0,yi=0,xc=3,yc=3,dimx=224,dimy=224):
    return (np.array([xc,yc])*np.array([nx*xl+xi+dimx/2,ny*yl+yi+dimy/2])).tolist()

I know you have some functions to stitch the image back together and add predictions, just looking for a more straight forward way, perhaps something useful to add to utils.

eg. patch name: Sample1_0_1_2016_2688.jpg

pretrained model

Hello and thank you for sharing your work!
Could you please provide the last checkpoint for the pretrained model?
Thank you in advance, Lucia

Data size

For each classes (Lepidic, Acinar, Papillary, Micropapillary, Solid and Benign), how many patches you used for training?
In your paper it says "For the training set, pathologists annotated 4,161 crops from 245 images, about 17 crops per image. These rectangular crops varied in size (mean: 718×771 pixels, standard deviation: 645×701 pixels, median: 429×473 pixels)" How the crops shape would be rectangular? What is the context of mean, standard deviation and median?
In your paper it says "For the development set, our pathologists annotated 1,068 square patches of 224×224 pixels for classic examples of each pattern." Did the pathologists annotated the patches or should be the crops? If it was patches level, why for the development set it was different from the training set?

Thank you so much!!

Dataset Download Problem

Hello,

I am not able to download the dataset. The dataset links are sent as undefined in the mail like below:

DHMC_wsi_1.zip
File size: 16.2 GB
undefined

DHMC_wsi_2.zip
File size: 13.18 GB
undefined

DHMC_wsi_3.zip
File size: 13.96 GB
undefined

DHMC_wsi_4.zip
File size: 6.7 GB
undefined

MetaData_Release_1.0.csv
File size: 48.09 KB
undefined

MD5SUMS.txt
File size: 196 Bytes
undefined

Dataset Research Use Agreement.pdf
File size: 23.36 KB
undefined

Please let me know how can I download the dataset.

Integrating my own model

Hello! I'm just starting to dive into this area, so I apologize in advance for the probably obvious question.

You use ResNet model in your utils_model.py. Is it possible to inegrate my own model into the framework? I am trying to implement transfer learning for histopathology.

Will the pipeline work ok, if I will just rewtite the create_model function?

Attention based classification in ResNet18

Sorry for the spam, I am now digging a little bit more in your code.
I realize, your code does not implement the attention based method that you published previously. Instead, it used sliding window approach, right?

I was wondering if you tried to add an attention based pooling layer in the ResNet arquitecture.
Thanks!

About anonymized version of this dataset

Hi, your work is perfect and I'm really interested in it! I'm recently working on lung cancer histopathology image classification but haven't got enough images now. I am wondering if there is an access to get the anonymized version of this dataset. That would help a lot.

4_test.py

When I run 4_test.py, it throws an error: ValueError: max() arg is an empty sequence.
What would be your suggestion?

############### CONFIGURATION ###############
all_wsi: all_wsi
val_wsi_per_class: 20
test_wsi_per_class: 30
keep_orig_copy: True
num_workers: 8
patch_size: 224
wsi_train: wsi_train
wsi_val: wsi_val
wsi_test: wsi_test
labels_train: labels_train.csv
labels_val: labels_val.csv
labels_test: labels_test.csv
train_folder: train_folder
patches_eval_train: patches_eval_train
patches_eval_val: patches_eval_val
patches_eval_test: patches_eval_test
num_train_per_class: 8000
type_histopath: True
purple_threshold: 100
purple_scale_size: 15
slide_overlap: 3
gen_val_patches_overlap_factor: 1.5
image_ext: jpg
by_folder: True
color_jitter_brightness: 0.5
color_jitter_contrast: 0.5
color_jitter_saturation: 0.5
color_jitter_hue: 0.2
num_epochs: 20
num_layers: 18
learning_rate: 0.001
batch_size: 16
weight_decay: 0.0001
learning_rate_decay: 0.85
resume_checkpoint: False
save_interval: 1
checkpoints_folder: checkpoints
checkpoint_file: xyz.pt
pretrain: False
log_folder: logs
auto_select: True
preds_train: preds_train
preds_val: preds_val
preds_test: preds_test
inference_train: inference_train
inference_val: inference_val
inference_test: inference_test
vis_train: vis_train
vis_val: vis_val
vis_test: vis_test
device: cpu
classes: ['benign', 'tumor']
num_classes: 2
train_patches: train_folder/train
val_patches: train_folder/val
path_mean: [167953056.0, 4.742346702121871e+30, 4.739335267905211e+30]
path_std: [nan, nan, nan]
resume_checkpoint_path: checkpoints/xyz.pt
log_csv: logs/log_272022_101945.csv
eval_model: checkpoints/xyz.pt
threshold_search: (0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9)
colors: ('red', 'white', 'blue', 'green', 'purple', 'orange', 'black', 'pink', 'yellow')

#####################################################

+++++ Running 4_test.py +++++

----- Finding validation patch predictions -----
Traceback (most recent call last):
File "code/4_test.py", line 21, in
pretrain=config.args.pretrain)
File "/home/SebliLo/deepslide/code/utils_model.py", line 564, in get_predictions
checkpoints_folder=checkpoints_folder) if auto_select else eval_model
File "/home/SebliLo/deepslide/code/utils_model.py", line 534, in get_best_model
key=operator.itemgetter(1))[0]
ValueError: max() arg is an empty sequence

3_repiece_jpg_tiles.py how to work?

hi,What is the function of this document? thank you

The weird `acinar/acinar/asdf.png` convention can be removed

This is pretty easy to implement and will allow us to get rid of the weird wsi folder conventions:
https://gist.github.com/andrewjong/6b02ff237533b3b2c554701fb53d5c4d

Validation Confusion Matrix on 3_train.py

Hi, I have tested the repository for some months ago. I only have 2 different classes of tissues. I started to test the code with only 1 wsi per class. It runs without any inconvenience. Now, I have started to use 20 wsi per class, but at the moment of printing the validation confusion matrix only one class appears, with a result, at the other class only prints 0.00 0.00. I have tried with different wsi, 20 in all the experiments, but it only changes the class wich prints 0.00 0.00, I attach some images.

I have verified that at the trainfolder/val/Class1 and trainfolder/val/Class2 there are the corresponding images of each class. Even the total number of validation images is the same that is printed at the beggining of the execution.

Also I received some warnings but I consider are not transcendent for the experiment.

Last question, I can not use a 32 layer cnn. Is it not abble to use? I can only experiment with 18 and 50 .

Regards! Thanks for your great apportation. :D

CUDA_VISIBLE_DEVICES=0 python code/4_test.py --auto_select False

Hi Thanks,
You did a great pipeline.
Do I missed something, it shows following.
path_mean: [0.0, 0.0, 0.0]
path_std: [0.0, 0.0, 0.0]

At the code/3_train.py I have the same mistake, however, I repaste the WSI file to the all_wsi folder (which I previously deleted).
However in this step, it still shows me error.
Thanks, looking forward to hear from you.

(tf2) qiang@Qiang:~/Desktop/pcr/deepslide-master$ CUDA_VISIBLE_DEVICES=0 python code/4_test.py --auto_select False
############### CONFIGURATION ###############
all_wsi: all_wsi
val_wsi_per_class: 20
test_wsi_per_class: 30
keep_orig_copy: True
num_workers: 8
patch_size: 224
wsi_train: wsi_train
wsi_val: wsi_val
wsi_test: wsi_test
labels_train: labels_train.csv
labels_val: labels_val.csv
labels_test: labels_test.csv
train_folder: train_folder
patches_eval_train: patches_eval_train
patches_eval_val: patches_eval_val
patches_eval_test: patches_eval_test
num_train_per_class: 80000
type_histopath: True
purple_threshold: 100
purple_scale_size: 15
slide_overlap: 3
gen_val_patches_overlap_factor: 1.5
image_ext: jpg
by_folder: True
color_jitter_brightness: 0.5
color_jitter_contrast: 0.5
color_jitter_saturation: 0.5
color_jitter_hue: 0.2
num_epochs: 20
num_layers: 18
learning_rate: 0.001
batch_size: 16
weight_decay: 0.0001
learning_rate_decay: 0.85
resume_checkpoint: False
save_interval: 1
checkpoints_folder: checkpoints
checkpoint_file: xyz.pt
pretrain: False
log_folder: logs
auto_select: True
preds_train: preds_train
preds_val: preds_val
preds_test: preds_test
inference_train: inference_train
inference_val: inference_val
inference_test: inference_test
vis_train: vis_train
vis_val: vis_val
vis_test: vis_test
device: cuda:0
classes: ['a', 's']
num_classes: 2
train_patches: train_folder/train
val_patches: train_folder/val
path_mean: [0.0, 0.0, 0.0]
path_std: [0.0, 0.0, 0.0]
resume_checkpoint_path: checkpoints/xyz.pt
log_csv: logs/log_3292022_233617.csv
eval_model: checkpoints/xyz.pt
threshold_search: (0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9)
colors: ('red', 'white', 'blue', 'green', 'purple', 'orange', 'black', 'pink', 'yellow')

#####################################################

+++++ Running 4_test.py +++++

----- Finding validation patch predictions -----
model loaded from checkpoints/resnet18_e13_va0.59313.pt
testing on 63680 crops from patches_eval_val/DHMC_0046/DHMC_0046
Traceback (most recent call last):
File "code/4_test.py", line 8, in
get_predictions(patches_eval_folder=config.args.patches_eval_val,
File "/home/qiang/Desktop/pcr/deepslide-master/code/utils_model.py", line 623, in get_predictions
for batch_num, (test_inputs, test_labels) in enumerate(dataloader):
File "/home/qiang/environments/tf2/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 517, in next
data = self._next_data()
File "/home/qiang/environments/tf2/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
return self._process_data(data)
File "/home/qiang/environments/tf2/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
data.reraise()
File "/home/qiang/environments/tf2/lib/python3.8/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/qiang/environments/tf2/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
data = fetcher.fetch(index)
File "/home/qiang/environments/tf2/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/qiang/environments/tf2/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/qiang/environments/tf2/lib/python3.8/site-packages/torchvision/datasets/folder.py", line 180, in getitem
sample = self.transform(sample)
File "/home/qiang/environments/tf2/lib/python3.8/site-packages/torchvision/transforms/transforms.py", line 60, in call
img = t(img)
File "/home/qiang/environments/tf2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/qiang/environments/tf2/lib/python3.8/site-packages/torchvision/transforms/transforms.py", line 221, in forward
return F.normalize(tensor, self.mean, self.std, self.inplace)
File "/home/qiang/environments/tf2/lib/python3.8/site-packages/torchvision/transforms/functional.py", line 331, in normalize
raise ValueError('std evaluated to zero after conversion to {}, leading to division by zero.'.format(dtype))
ValueError: std evaluated to zero after conversion to torch.float32, leading to division by zero.

Overfitting when trying to reproduce on 143 histology slides.

Hello,
Thank you for making this wonderful resource available. I have downloaded the 143 histology slides from the provided link and want to obtain a reliable baseline for further study. But I found this network is easy to be overfitting in five-category classification.
These data are firstly divided into training, testing, and validation sets at 7:2:1 ratio in every category. And 'num_train_per_class' and 'slide_overlap' are set to be 17 and 1 respectively for not too many windows ( but nearly 3.95 million patches are generated in train_folder). Then, in order to avoid nan std calculated by compute_stats.py. I change the 'path_mean' and 'path_s' in config.py（https://github.com/BMIRDS/deepslide/blob/6102a69d8bab0c5cb5deab70e8650f757e7d7db5/code/config.py#LL368C1-L369C62）to [0,0,0],[1,1,1].
Finally, I set the 'batch_size' and 'num_epochs' to 80 and 100 for training. In the training process, the 'train_acc' gets higher but 'val_acc' keeps around 0.4. Is this problem related to the partitioning or the amount of the dataset for five-category classification. And is the setting of 'path_mean', 'path_s', 'num_train_per_class' and 'slide_overlap' reasonable.
log_5102023_17555.csv

6_visualize.py

I am running the code in CentOS Linux 7 which is based on RHEL.
How to see the visualization figures like the one you showed in your readme section of "6. Visualization" Deep Learning Model figures from B.i - B.iv?

py3) [SebliLo@tesla deepslide]$ python code/6_visualize.py
############### CONFIGURATION ###############
all_wsi: all_wsi
val_wsi_per_class: 20
test_wsi_per_class: 30
keep_orig_copy: True
num_workers: 8
patch_size: 224
wsi_train: wsi_train
wsi_val: wsi_val
wsi_test: wsi_test
labels_train: labels_train.csv
labels_val: labels_val.csv
labels_test: labels_test.csv
train_folder: train_folder
patches_eval_train: patches_eval_train
patches_eval_val: patches_eval_val
patches_eval_test: patches_eval_test
num_train_per_class: 8000
type_histopath: True
purple_threshold: 100
purple_scale_size: 15
slide_overlap: 3
gen_val_patches_overlap_factor: 1.5
image_ext: jpg
by_folder: True
color_jitter_brightness: 0.5
color_jitter_contrast: 0.5
color_jitter_saturation: 0.5
color_jitter_hue: 0.2
num_epochs: 20
num_layers: 18
learning_rate: 0.001
batch_size: 16
weight_decay: 0.0001
learning_rate_decay: 0.85
resume_checkpoint: False
save_interval: 1
checkpoints_folder: checkpoints
checkpoint_file: xyz.pt
pretrain: False
log_folder: logs
auto_select: True
preds_train: preds_train
preds_val: preds_val
preds_test: preds_test
inference_train: inference_train
inference_val: inference_val
inference_test: inference_test
vis_train: vis_train
vis_val: vis_val
vis_test: vis_test
device: cpu
classes: ['benign', 'tumor']
num_classes: 2
train_patches: train_folder/train
val_patches: train_folder/val
path_mean: [0.16684997, 0.16684997, 0.16684997]
path_std: [0.40847272, 0.40847272, 0.40847272]
resume_checkpoint_path: checkpoints/xyz.pt
log_csv: logs/log_2212022_231057.csv
eval_model: checkpoints/xyz.pt
threshold_search: (0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9)
colors: ('red', 'white', 'blue', 'green', 'purple', 'orange', 'black', 'pink', 'yellow')

#####################################################

+++++ Running 6_visualize.py +++++

----- Visualizing validation set -----
40 whole slides found from wsi_val
visualizing wsi_val/benign/Copy of 6 3412 2-L-1 HCC_67_13_0.png of shape (256, 256, 3)
visualizing wsi_val/benign/Copy of 6 3412 2-L-1 HCC_67_14_0.png of shape (256, 256, 3)
visualizing wsi_val/benign/Copy of 6 3412 2-L-1 HCC_67_15_0.png of shape (256, 256, 3)
visualizing wsi_val/benign/Copy of 6 3412 2-L-1 HCC_67_16_0.png of shape (256, 256, 3)
visualizing wsi_val/benign/Copy of 6 3412 2-L-1 HCC_67_17_0.png of shape (256, 256, 3)
visualizing wsi_val/benign/Copy of 6 3412 2-L-1 HCC_67_18_0.png of shape (256, 256, 3)
visualizing wsi_val/benign/Copy of 6 3412 2-L-1 HCC_67_22_0.png of shape (256, 256, 3)
visualizing wsi_val/benign/Copy of 6 3412 2-L-1 HCC_68_11_0.png of shape (256, 256, 3)
visualizing wsi_val/benign/Copy of 6 3412 2-L-1 HCC_68_12_0.png of shape (256, 256, 3)
visualizing wsi_val/benign/Copy of 6 3412 2-L-1 HCC_68_15_0.png of shape (256, 256, 3)
visualizing wsi_val/benign/Copy of 6 3412 2-L-1 HCC_68_16_0.png of shape (256, 256, 3)
visualizing wsi_val/benign/Copy of 6 3412 2-L-1 HCC_68_22_0.png of shape (256, 256, 3)
visualizing wsi_val/benign/Copy of 6 3412 2-L-1 HCC_69_12_0.png of shape (256, 256, 3)
visualizing wsi_val/benign/Copy of 6 3412 2-L-1 HCC_69_13_0.png of shape (256, 256, 3)
visualizing wsi_val/benign/Copy of 6 3412 2-L-1 HCC_69_14_0.png of shape (256, 256, 3)
visualizing wsi_val/benign/Copy of 6 3412 2-L-1 HCC_69_15_0.png of shape (256, 256, 3)
visualizing wsi_val/benign/Copy of 6 3412 2-L-1 HCC_69_16_0.png of shape (256, 256, 3)
visualizing wsi_val/benign/Copy of 6 3412 2-L-1 HCC_69_17_0.png of shape (256, 256, 3)
visualizing wsi_val/benign/Copy of 6 3412 2-L-1 HCC_69_18_0.png of shape (256, 256, 3)
visualizing wsi_val/benign/Copy of 6 3412 2-L-1 HCC_69_19_0.png of shape (256, 256, 3)
visualizing wsi_val/tumor/Copy of 6 3412 2-L-1 HCC_104_65_1.png of shape (256, 256, 3)
visualizing wsi_val/tumor/Copy of 6 3412 2-L-1 HCC_104_66_1.png of shape (256, 256, 3)
visualizing wsi_val/tumor/Copy of 6 3412 2-L-1 HCC_105_62_1.png of shape (256, 256, 3)
visualizing wsi_val/tumor/Copy of 6 3412 2-L-1 HCC_105_63_1.png of shape (256, 256, 3)
visualizing wsi_val/tumor/Copy of 6 3412 2-L-1 HCC_105_64_1.png of shape (256, 256, 3)
visualizing wsi_val/tumor/Copy of 6 3412 2-L-1 HCC_105_65_1.png of shape (256, 256, 3)
visualizing wsi_val/tumor/Copy of 6 3412 2-L-1 HCC_105_66_1.png of shape (256, 256, 3)
visualizing wsi_val/tumor/Copy of 6 3412 2-L-1 HCC_105_67_1.png of shape (256, 256, 3)
visualizing wsi_val/tumor/Copy of 6 3412 2-L-1 HCC_105_68_1.png of shape (256, 256, 3)
visualizing wsi_val/tumor/Copy of 6 3412 2-L-1 HCC_49_58_1.png of shape (256, 256, 3)
visualizing wsi_val/tumor/Copy of 6 3412 2-L-1 HCC_50_56_1.png of shape (256, 256, 3)
visualizing wsi_val/tumor/Copy of 6 3412 2-L-1 HCC_51_56_1.png of shape (256, 256, 3)
visualizing wsi_val/tumor/Copy of 6 3412 2-L-1 HCC_52_55_1.png of shape (256, 256, 3)
visualizing wsi_val/tumor/Copy of 6 3412 2-L-1 HCC_59_68_1.png of shape (256, 256, 3)
visualizing wsi_val/tumor/Copy of 6 3412 2-L-1 HCC_59_69_1.png of shape (256, 256, 3)
visualizing wsi_val/tumor/Copy of 6 3412 2-L-1 HCC_60_43_1.png of shape (256, 256, 3)
visualizing wsi_val/tumor/Copy of 6 3412 2-L-1 HCC_60_44_1.png of shape (256, 256, 3)
visualizing wsi_val/tumor/Copy of 6 3412 2-L-1 HCC_60_45_1.png of shape (256, 256, 3)
visualizing wsi_val/tumor/Copy of 6 3412 2-L-1 HCC_60_67_1.png of shape (256, 256, 3)
visualizing wsi_val/tumor/Copy of 6 3412 2-L-1 HCC_61_44_1.png of shape (256, 256, 3)
find the visualizations in vis_val
----- Finished visualizing validation set -----

----- Visualizing test set -----
60 whole slides found from wsi_test
visualizing wsi_test/benign/Copy of 6 3412 2-L-1 HCC_70_12_0.png of shape (256, 256, 3)
visualizing wsi_test/benign/Copy of 6 3412 2-L-1 HCC_70_14_0.png of shape (256, 256, 3)
visualizing wsi_test/benign/Copy of 6 3412 2-L-1 HCC_70_15_0.png of shape (256, 256, 3)
visualizing wsi_test/benign/Copy of 6 3412 2-L-1 HCC_70_16_0.png of shape (256, 256, 3)
visualizing wsi_test/benign/Copy of 6 3412 2-L-1 HCC_70_17_0.png of shape (256, 256, 3)
visualizing wsi_test/benign/Copy of 6 3412 2-L-1 HCC_70_18_0.png of shape (256, 256, 3)
visualizing wsi_test/benign/Copy of 6 3412 2-L-1 HCC_70_19_0.png of shape (256, 256, 3)
visualizing wsi_test/benign/Copy of 6 3412 2-L-1 HCC_71_13_0.png of shape (256, 256, 3)
visualizing wsi_test/benign/Copy of 6 3412 2-L-1 HCC_71_14_0.png of shape (256, 256, 3)
visualizing wsi_test/benign/Copy of 6 3412 2-L-1 HCC_71_15_0.png of shape (256, 256, 3)
visualizing wsi_test/benign/Copy of 6 3412 2-L-1 HCC_71_16_0.png of shape (256, 256, 3)
visualizing wsi_test/benign/Copy of 6 3412 2-L-1 HCC_71_17_0.png of shape (256, 256, 3)
visualizing wsi_test/benign/Copy of 6 3412 2-L-1 HCC_71_18_0.png of shape (256, 256, 3)
visualizing wsi_test/benign/Copy of 6 3412 2-L-1 HCC_71_19_0.png of shape (256, 256, 3)
visualizing wsi_test/benign/Copy of 6 3412 2-L-1 HCC_72_14_0.png of shape (256, 256, 3)
visualizing wsi_test/benign/Copy of 6 3412 2-L-1 HCC_72_15_0.png of shape (256, 256, 3)
visualizing wsi_test/benign/Copy of 6 3412 2-L-1 HCC_72_16_0.png of shape (256, 256, 3)
visualizing wsi_test/benign/Copy of 6 3412 2-L-1 HCC_72_18_0.png of shape (256, 256, 3)
visualizing wsi_test/benign/Copy of 6 3412 2-L-1 HCC_72_19_0.png of shape (256, 256, 3)
visualizing wsi_test/benign/Copy of 6 3412 2-L-1 HCC_72_20_0.png of shape (256, 256, 3)
visualizing wsi_test/benign/Copy of 6 3412 2-L-1 HCC_72_21_0.png of shape (256, 256, 3)
visualizing wsi_test/benign/Copy of 6 3412 2-L-1 HCC_73_15_0.png of shape (256, 256, 3)
visualizing wsi_test/benign/Copy of 6 3412 2-L-1 HCC_73_18_0.png of shape (256, 256, 3)
visualizing wsi_test/benign/Copy of 6 3412 2-L-1 HCC_73_19_0.png of shape (256, 256, 3)
visualizing wsi_test/benign/Copy of 6 3412 2-L-1 HCC_73_20_0.png of shape (256, 256, 3)
visualizing wsi_test/benign/Copy of 6 3412 2-L-1 HCC_74_16_0.png of shape (256, 256, 3)
visualizing wsi_test/benign/Copy of 6 3412 2-L-1 HCC_74_17_0.png of shape (256, 256, 3)
visualizing wsi_test/benign/Copy of 6 3412 2-L-1 HCC_74_18_0.png of shape (256, 256, 3)
visualizing wsi_test/benign/Copy of 6 3412 2-L-1 HCC_75_16_0.png of shape (256, 256, 3)
visualizing wsi_test/benign/Copy of 6 3412 2-L-1 HCC_75_17_0.png of shape (256, 256, 3)
visualizing wsi_test/tumor/Copy of 6 3412 2-L-1 HCC_61_45_1.png of shape (256, 256, 3)
visualizing wsi_test/tumor/Copy of 6 3412 2-L-1 HCC_61_46_1.png of shape (256, 256, 3)
visualizing wsi_test/tumor/Copy of 6 3412 2-L-1 HCC_61_66_1.png of shape (256, 256, 3)
visualizing wsi_test/tumor/Copy of 6 3412 2-L-1 HCC_62_42_1.png of shape (256, 256, 3)
visualizing wsi_test/tumor/Copy of 6 3412 2-L-1 HCC_62_43_1.png of shape (256, 256, 3)
visualizing wsi_test/tumor/Copy of 6 3412 2-L-1 HCC_62_46_1.png of shape (256, 256, 3)
visualizing wsi_test/tumor/Copy of 6 3412 2-L-1 HCC_63_41_1.png of shape (256, 256, 3)
visualizing wsi_test/tumor/Copy of 6 3412 2-L-1 HCC_63_42_1.png of shape (256, 256, 3)
visualizing wsi_test/tumor/Copy of 6 3412 2-L-1 HCC_63_44_1.png of shape (256, 256, 3)
visualizing wsi_test/tumor/Copy of 6 3412 2-L-1 HCC_63_45_1.png of shape (256, 256, 3)
visualizing wsi_test/tumor/Copy of 6 3412 2-L-1 HCC_64_40_1.png of shape (256, 256, 3)
visualizing wsi_test/tumor/Copy of 6 3412 2-L-1 HCC_64_43_1.png of shape (256, 256, 3)
visualizing wsi_test/tumor/Copy of 6 3412 2-L-1 HCC_65_42_1.png of shape (256, 256, 3)
visualizing wsi_test/tumor/Copy of 6 3412 2-L-1 HCC_65_51_1.png of shape (256, 256, 3)
visualizing wsi_test/tumor/Copy of 6 3412 2-L-1 HCC_65_52_1.png of shape (256, 256, 3)
visualizing wsi_test/tumor/Copy of 6 3412 2-L-1 HCC_66_50_1.png of shape (256, 256, 3)
visualizing wsi_test/tumor/Copy of 6 3412 2-L-1 HCC_66_51_1.png of shape (256, 256, 3)
visualizing wsi_test/tumor/Copy of 6 3412 2-L-1 HCC_66_52_1.png of shape (256, 256, 3)
visualizing wsi_test/tumor/Copy of 6 3412 2-L-1 HCC_67_50_1.png of shape (256, 256, 3)
visualizing wsi_test/tumor/Copy of 6 3412 2-L-1 HCC_75_8_1.png of shape (256, 256, 3)
visualizing wsi_test/tumor/Copy of 6 3412 2-L-1 HCC_75_9_1.png of shape (256, 256, 3)
visualizing wsi_test/tumor/Copy of 6 3412 2-L-1 HCC_76_6_1.png of shape (256, 256, 3)
visualizing wsi_test/tumor/Copy of 6 3412 2-L-1 HCC_77_7_1.png of shape (256, 256, 3)
visualizing wsi_test/tumor/Copy of 6 3412 2-L-1 HCC_77_8_1.png of shape (256, 256, 3)
visualizing wsi_test/tumor/Copy of 6 3412 2-L-1 HCC_89_64_1.png of shape (256, 256, 3)
visualizing wsi_test/tumor/Copy of 6 3412 2-L-1 HCC_91_64_1.png of shape (256, 256, 3)
visualizing wsi_test/tumor/Copy of 6 3412 2-L-1 HCC_92_62_1.png of shape (256, 256, 3)
visualizing wsi_test/tumor/Copy of 6 3412 2-L-1 HCC_92_63_1.png of shape (256, 256, 3)
visualizing wsi_test/tumor/Copy of 6 3412 2-L-1 HCC_92_64_1.png of shape (256, 256, 3)
visualizing wsi_test/tumor/Copy of 6 3412 2-L-1 HCC_93_65_1.png of shape (256, 256, 3)
find the visualizations in vis_test
----- Finished visualizing test set -----

+++++ Finished running 6_visualize.py +++++

IndexError: max(): Expected reduction dim 1 to have non-zero size

Hi,

I followed your code and generated patches from your sample data.
Now that I have train_folder, I try to run 3_train.py but I have the following error:

Traceback (most recent call last):
  File "/beegfs/scratch/ric.ostuni/ric.ostuni/DP_Carlo/deepslide/code/3_train.py", line 6, in <module>
    train_resnet(batch_size=config.args.batch_size,
  File "/beegfs/scratch/ric.ostuni/ric.ostuni/DP_Carlo/deepslide/code/utils_model.py", line 483, in train_resnet
    train_helper(model=model,
  File "/beegfs/scratch/ric.ostuni/ric.ostuni/DP_Carlo/deepslide/code/utils_model.py", line 265, in train_helper
    __, train_preds = torch.max(train_outputs, dim=1)
IndexError: max(): Expected reduction dim 1 to have non-zero size.

do you have any suggestion on how to proceed?

best,
carlo

Use pre-trained ResNet with frozen layers

Hi!
As transfer learning seem to be quite a big issue on these days, I was wondering if you tried to train ResNet 18 using ImageNet pretrained model but freezing all the layers except the last one.

Also, I found the documentenation in PyTorch here but I am not sure where should I implement this lines of code. Should be in the create_model function?
Thanks!

Newer scipy versions: cannot import name 'imsave' from 'scipy' (in preprocessing script 2)

The z_preprocessing 2. svs_to_jpeg_tiles.py code is not working with the newer versions of scipy. When they updated to scipy 1.2.0 (now 1.4.0) scipy.misc.imsave was removed. Maybe it could be fixed by using imageio.imwrite instead?

Raises error in line 8 of the script:

from scipy.misc import imsave
Traceback (most recent call last):
File "<pyshell#44>", line 1, in
from scipy import imsave
ImportError: cannot import name 'imsave' from 'scipy'

Question about path_mean and path_std

Hello! Thanks for your code! But I've been meeting a confusing problem about path_mean and path_std. In the training process, I noticed the values of path_mean and path_std were confusing, but the model could still be trained but very slowly. What's wrong with it? I'd appreciate it if you could help me! Thank you!

path_mean: [-4.436188726613424e-32, 3.0744488307286487e-41, -4.436184024636021e-32] path_std: [nan, 5.54477138001159e-21, nan]

############### CONFIGURATION ############### all_wsi: all_wsi val_wsi_per_class: 20 test_wsi_per_class: 30 keep_orig_copy: True num_workers: 8 patch_size: 224 wsi_train: wsi_train wsi_val: wsi_val wsi_test: wsi_test labels_train: labels_train.csv labels_val: labels_val.csv labels_test: labels_test.csv train_folder: train_folder patches_eval_train: patches_eval_train patches_eval_val: patches_eval_val patches_eval_test: patches_eval_test num_train_per_class: 80000 type_histopath: True purple_threshold: 100 purple_scale_size: 15 slide_overlap: 3 gen_val_patches_overlap_factor: 1.5 image_ext: jpg by_folder: True color_jitter_brightness: 0.5 color_jitter_contrast: 0.5 color_jitter_saturation: 0.5 color_jitter_hue: 0.2 num_epochs: 100 num_layers: 18 learning_rate: 0.001 batch_size: 128 weight_decay: 0.0001 learning_rate_decay: 0.85 resume_checkpoint: False save_interval: 1 checkpoints_folder: checkpoints checkpoint_file: xyz.pt pretrain: False log_folder: logs auto_select: True preds_train: preds_train preds_val: preds_val preds_test: preds_test inference_train: inference_train inference_val: inference_val inference_test: inference_test vis_train: vis_train vis_val: vis_val vis_test: vis_test device: cuda:0 classes: ['N', 'Y'] num_classes: 2 train_patches: train_folder/train val_patches: train_folder/val path_mean: [-4.436188726613424e-32, 3.0744488307286487e-41, -4.436184024636021e-32] path_std: [nan, 5.54477138001159e-21, nan] resume_checkpoint_path: checkpoints/xyz.pt log_csv: logs/log_12182020_123013.csv eval_model: checkpoints/xyz.pt threshold_search: (0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9) colors: ('red', 'white', 'blue', 'green', 'purple', 'orange', 'black', 'pink', 'yellow')

2_process_patches.py

Please see the output of 1_split.py and 2_process_patches.py
The first step looks fine, but the second throws an error (IndexError: list index out of range), and also it creates 0 patches ( that might be the cause for the error). Please how can we fix it?

"finished patches from wsi_train/benign with inverse overlap factor 108.81 in 1.98 seconds outputting in train_folder/train/benign for 0 patches"

+++++ Running 1_split.py +++++
class benign #train=14 #val=10 #test=20
class tumor #train=14 #val=10 #test=20
+++++ Finished running 1_split.py +++++

############### CONFIGURATION ###############
all_wsi: all_wsi
val_wsi_per_class: 20
test_wsi_per_class: 30
keep_orig_copy: True
num_workers: 8
patch_size: 224
wsi_train: wsi_train
wsi_val: wsi_val
wsi_test: wsi_test
labels_train: labels_train.csv
labels_val: labels_val.csv
labels_test: labels_test.csv
train_folder: train_folder
patches_eval_train: patches_eval_train
patches_eval_val: patches_eval_val
patches_eval_test: patches_eval_test
num_train_per_class: 80000
type_histopath: True
purple_threshold: 100
purple_scale_size: 15
slide_overlap: 3
gen_val_patches_overlap_factor: 1.5
image_ext: jpg
by_folder: True
color_jitter_brightness: 0.5
color_jitter_contrast: 0.5
color_jitter_saturation: 0.5
color_jitter_hue: 0.2
num_epochs: 20
num_layers: 18
learning_rate: 0.001
batch_size: 16
weight_decay: 0.0001
learning_rate_decay: 0.85
resume_checkpoint: False
save_interval: 1
checkpoints_folder: checkpoints
checkpoint_file: xyz.pt
pretrain: False
log_folder: logs
auto_select: True
preds_train: preds_train
preds_val: preds_val
preds_test: preds_test
inference_train: inference_train
inference_val: inference_val
inference_test: inference_test
vis_train: vis_train
vis_val: vis_val
vis_test: vis_test
device: cpu
classes: ['benign', 'tumor']
num_classes: 2
train_patches: train_folder/train
val_patches: train_folder/val
path_mean: [167953056.0, 4.742346702121871e+30, 4.739335267905211e+30]
path_std: [nan, nan, nan]
resume_checkpoint_path: checkpoints/xyz.pt
log_csv: logs/log_252022_213829.csv
eval_model: checkpoints/xyz.pt
threshold_search: (0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9)
colors: ('red', 'white', 'blue', 'green', 'purple', 'orange', 'black', 'pink', 'yellow')

#####################################################

+++++ Running 2_process_patches.py +++++

----- Generating training patches -----
[PosixPath('wsi_train/benign'), PosixPath('wsi_train/tumor')] subfolders found from wsi_train
wsi_train/benign: 2.002104MB, 14 images, overlap_factor=108.81
wsi_train/tumor: 2.111021MB, 14 images, overlap_factor=104.57

getting small crops from 14 images in wsi_train/benign with inverse overlap factor 108.81 outputting in train_folder/train/benign
finished patches from wsi_train/benign with inverse overlap factor 108.81 in 1.98 seconds outputting in train_folder/train/benign for 0 patches

getting small crops from 14 images in wsi_train/tumor with inverse overlap factor 104.57 outputting in train_folder/train/tumor
finished patches from wsi_train/tumor with inverse overlap factor 104.57 in 2.24 seconds outputting in train_folder/train/tumor for 833 patches

finished all folders

----- Finished generating training patches -----

----- Balancing the training patches -----
Traceback (most recent call last):
File "code/2_process_patches.py", line 24, in
balance_classes(training_folder=config.train_patches)
File "/home/sebliLo/deepslide/code/utils_processing.py", line 251, in balance_classes
n=biggest_size)
File "/home/sebliLo/deepslide/code/utils_processing.py", line 214, in duplicate_until_n
print(f"balancing {image_paths[0].parent} by duplicating {num_dupls}")
IndexError: list index out of range

Fixing print statement

@JosephDiPalma

If I'm not wrong, you need a \n after the print statement in config.py here:

f"{chr(10).join(f'{k}:{chr(9)}{v}' for k, v in vars(args).items())}"

Pytorch CPU and conda environment conflicts

Not a bug report per se, but I thought it would be useful to add this to the issues list in case other people have this problem.

When trying to set up the Conda environment I was having trouble installing a Cuda-compatible version of Pytorch. The CPU version was automatically installed when using the default conda_env.yaml file. Whenever I made changes to the .yaml file to try to install the GPU version the environment would take an extremely long time to solve and would contain conflicts. To get a working environment I was required to make the following changes to the conda_env.yaml file:

#$ conda env create --file conda-env.yaml
name: deepslide_env
channels:
  - pytorch
  - nvidia 
  - conda-forge
dependencies:
  - python=3.11
  - torchvision
  - pytorch-cuda=11.7
  - pytorch
  - pandas
  - matplotlib
  - scikit-learn
  - scikit-image
  - pip
  - pip:
    - -r pip-requirements.txt

One consequence of this was using a newer version of Pandas, meaning the following line of code needed changing:
utils_model.py (line 63): cm.style.hide_index() ---> cm.style.hide(axis = 'index')

ZeroDivisionError: float division by zero

python code/2_process_patches.py 
###############     CONFIGURATION     ###############
all_wsi:	all_wsi
val_wsi_per_class:	20
test_wsi_per_class:	30
keep_orig_copy:	True
num_workers:	8
patch_size:	224
wsi_train:	wsi_train
wsi_val:	wsi_val
wsi_test:	wsi_test
labels_train:	labels_train.csv
labels_val:	labels_val.csv
labels_test:	labels_test.csv
train_folder:	train_folder
patches_eval_train:	patches_eval_train
patches_eval_val:	patches_eval_val
patches_eval_test:	patches_eval_test
num_train_per_class:	80000
type_histopath:	True
purple_threshold:	100
purple_scale_size:	15
slide_overlap:	3
gen_val_patches_overlap_factor:	1.5
image_ext:	jpg
by_folder:	True
color_jitter_brightness:	0.5
color_jitter_contrast:	0.5
color_jitter_saturation:	0.5
color_jitter_hue:	0.2
num_epochs:	20
num_layers:	18
learning_rate:	0.001
batch_size:	16
weight_decay:	0.0001
learning_rate_decay:	0.85
resume_checkpoint:	False
save_interval:	1
checkpoints_folder:	checkpoints
checkpoint_file:	xyz.pt
pretrain:	False
log_folder:	logs
auto_select:	True
preds_train:	preds_train
preds_val:	preds_val
preds_test:	preds_test
inference_train:	inference_train
inference_val:	inference_val
inference_test:	inference_test
vis_train:	vis_train
vis_val:	vis_val
vis_test:	vis_test
device:	cuda:0
classes:	['0', '1', '2', '3', '4', '5']
num_classes:	6
train_patches:	train_folder/train
val_patches:	train_folder/val
path_mean:	[-2.5605530957943825e-20, 3.067582468253457e-41, 0.0]
path_std:	[nan, 0.0, 0.0]
resume_checkpoint_path:	checkpoints/xyz.pt
log_csv:	logs/log_8242021_121821.csv
eval_model:	checkpoints/xyz.pt
threshold_search:	(0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9)
colors:	('red', 'white', 'blue', 'green', 'purple', 'orange', 'black', 'pink', 'yellow')

#####################################################





+++++ Running 2_process_patches.py +++++

----- Generating training patches -----
[PosixPath('wsi_train/0'), PosixPath('wsi_train/1'), PosixPath('wsi_train/2'), PosixPath('wsi_train/3'), PosixPath('wsi_train/4'), PosixPath('wsi_train/5')] subfolders found from wsi_train
wsi_train/0: 106686.839468MB, 2842 images, overlap_factor=1.00
wsi_train/1: 103738.586958MB, 2616 images, overlap_factor=1.00
wsi_train/2: 46213.862613MB, 1293 images, overlap_factor=1.00
wsi_train/3: 33736.711066MB, 1192 images, overlap_factor=1.00
wsi_train/4: 18611.279527MB, 594 images, overlap_factor=1.00
Traceback (most recent call last):
  File "code/2_process_patches.py", line 20, in <module>
    type_histopath=config.args.type_histopath)
  File "/mnt/sda1/deepslide/code/utils_processing.py", line 140, in gen_train_patches
    subfolders=subfolders, desired_crops_per_class=num_train_per_class)
  File "/mnt/sda1/deepslide/code/utils_processing.py", line 107, in get_subfolder_to_overlap
    math.sqrt(desired_crops_per_class / (subfolder_size / 0.013)),
ZeroDivisionError: float division by zero

You may need to fix division occuring in utils_processing.py", line 107

AttributeError: Can't pickle local object 'compute_stats.<locals>.MyDataset'

Hi Wei!
I am new to deeplearning classification, thank you for your code and explanations, they are very clear to understand. But I kept ran into errors like this:

(Torch) C:\Windows>Traceback (most recent call last):

File "", line 1, in
File "C:\Users\Liang\Anaconda3\envs\Torch\lib\multiprocessing\spawn.py", line 99, in spawn_main
new_handle = reduction.steal_handle(parent_pid, pipe_handle)
File "C:\Users\Liang\Anaconda3\envs\Torch\lib\multiprocessing\reduction.py", line 82, in steal_handle
_winapi.PROCESS_DUP_HANDLE, False, source_pid)
OSError: [WinError 87] The parameter is incorrect
C:/Users/Liang/Anaconda3/envs/Torch/python.exe //scallop/User/Liang/Code/Python/deepslide-master/code/config.py
Traceback (most recent call last):
File "//scallop/User/Liang/Code/Python/deepslide-master/code/config.py", line 369, in
image_ext=args.image_ext)
File "\scallop\User\Liang\Code\Python\deepslide-master\code\compute_stats.py", line 98, in compute_stats
shuffle=False))
File "\scallop\User\Liang\Code\Python\deepslide-master\code\compute_stats.py", line 83, in online_mean_and_sd
for data in loader:
File "C:\Users\Liang\Anaconda3\envs\Torch\lib\site-packages\torch\utils\data\dataloader.py", line 279, in iter
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\Liang\Anaconda3\envs\Torch\lib\site-packages\torch\utils\data\dataloader.py", line 719, in init
w.start()
File "C:\Users\Liang\Anaconda3\envs\Torch\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:\Users\Liang\Anaconda3\envs\Torch\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\Liang\Anaconda3\envs\Torch\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\Liang\Anaconda3\envs\Torch\lib\multiprocessing\popen_spawn_win32.py", line 89, in init
reduction.dump(process_obj, to_child)
File "C:\Users\Liang\Anaconda3\envs\Torch\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'compute_stats..MyDataset'

(Torch) C:\Windows>Traceback (most recent call last):
File "", line 1, in
File "C:\Users\Liang\Anaconda3\envs\Torch\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users\Liang\Anaconda3\envs\Torch\lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "C:\Users\Liang\Anaconda3\envs\Torch\lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\Liang\Anaconda3\envs\Torch\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
run_name="mp_main")
File "C:\Users\Liang\Anaconda3\envs\Torch\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "C:\Users\Liang\Anaconda3\envs\Torch\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "C:\Users\Liang\Anaconda3\envs\Torch\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "\scallop\User\Liang\Code\Python\deepslide-master\code\config.py", line 369, in
image_ext=args.image_ext)
File "\scallop\User\Liang\Code\Python\deepslide-master\code\compute_stats.py", line 98, in compute_stats
shuffle=False))
File "\scallop\User\Liang\Code\Python\deepslide-master\code\compute_stats.py", line 83, in online_mean_and_sd
for data in loader:
File "C:\Users\Liang\Anaconda3\envs\Torch\lib\site-packages\torch\utils\data\dataloader.py", line 279, in iter
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\Liang\Anaconda3\envs\Torch\lib\site-packages\torch\utils\data\dataloader.py", line 719, in init
w.start()
File "C:\Users\Liang\Anaconda3\envs\Torch\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:\Users\Liang\Anaconda3\envs\Torch\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\Liang\Anaconda3\envs\Torch\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\Liang\Anaconda3\envs\Torch\lib\multiprocessing\popen_spawn_win32.py", line 46, in init
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\Users\Liang\Anaconda3\envs\Torch\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "C:\Users\Liang\Anaconda3\envs\Torch\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

I could only fix the RuntimeError by adding:
if __name__ == '__main__':
But I cannot fix the

AttributeError: Can't pickle local object 'compute_stats..MyDataset'

Could you help me with this issue?

bmirds / deepslide Goto Github PK

deepslide's Introduction

DeepSlide: A Sliding Window Framework for Classification of High Resolution Microscopy Images (Whole-Slide Images)

Requirements

Installing Dependencies (Recommended method)

Usage

1. Train-Val-Test Split:

Example

2. Data Processing

Example

3. Model Training

Example

4. Testing on WSI

Example

5. Searching for Best Thresholds

Example

6. Visualization

Example

7. Final Testing

Example

Quick Run

Pre-Processing Scripts

Known Issues and Limitations

Still not working? Consider the following...

Future Work

Citations

deepslide's People

Contributors

Stargazers

Watchers

Forkers

deepslide's Issues

Recommend Projects

Recommend Topics

Recommend Org