Camelyon17 - Multilevel feature fusion in digital pathology

Metastasis instance segmentation that was produced with a multilevel model.

Introduction

Deep learning algorithms have proven to be efficient and accurate when detecting metastases in hematoxylin and eosin-stained tissue, and their performance is comparable to the level of an expert pathologist. Many of the tumor-detecting deep learning algorithms focus on the local features that are in the small batches of images, which leaves out potentially relevant features from the surroundings.[1]

Research questions

Does including information from the surrounding area, improve the performance of deep learning tumor detection algorithm?
What features will a deep neural network focus on with different scales, when it is trained to detect a tumor?

Hypothesis

A deep neural network will learn to use information from a wider receptive field and this improves the detection performance. High zoom level parts of the network will focus on the detailed structures while the low zoom levels will focus more on regional structures.

Methodology

Environment

64bit Ubuntu 16.04.6 LTS (Xenial Xerus) GNU/Linux (virtual)
2x Intel Xeon Platinum 8160 CPU @ 2.10GHz
4x Nvidia Tesla V100, 32GB
1510GB RAM
6.4T NVMe SSD

Software

PyTorch 1.1.0
TorchVision 0.3.0
Fastai 1.0.52
OpenSlide 3.4.1 (ASAP 1.8 depends on libopenslide)
ASAP 1.8 (1.9 does not support the Philips scanner TIFF file format of the Center_4)
OpenCV 4.1.0

Preprocessing

Chameleon17 training data set was divided by medical centers to test (center_4) and train parts (center_0, center_1, center_2, center_3). Whole slide image (WSI) tissue areas were sampled to 256x256 overlapping tiles, where the corners of tile were centers of neighboring tiles. Otsu thresholding was used for finding the tissue areas.

Tumor coverage percentages were calculated for each tile, and a 75% threshold was selected for labeling a tile as a tumor or normal. Tiles were undersampled from each medical center so that tumor and normal tiles were represented in equal amounts.

Image crops of size 256x256 were sampled from each tile's center point in 1, 2, 4 and 8 -pixel downsampling rates. Each downsampling crop had the same center point as the tile.

Downsampling rates with green tumor area annotation. Threshold of 75% in downsampling=1 is used for deciding the tumor label.

Normalized copies were made from each image crop and normalization was done using color deconvolution to separate haematoxylin and eosin components, and normalizing their amounts using a reference.

Tissue samples before and after the staining normalization in two leftmost columns. Two rightmost columns show the separated haematoxylin and eosin stains.

Baseline models

Baseline models were trained to do binary classification (tumor/normal) from crop images and ROC AUC was used as the performance metric. Medical center-fold cross-validation was used for searching the optimal CNN architecture, learning rate cycle, and training augmentations. Cross-validation was done with the training set of 4 medical centers and only crop images of pixel samplingrate1 were used. The effectiveness of stain normalization was determined by training models with either normalized or original images.

Multilevel models

Multi-input (multilevel) models were assembled from the good performing baseline model architectures. These models took two different downsampling rate images as input. Both downsampling rates had the same center, so the models were looking at the same spot from two different zoom scales. Multilevel models consisted of two separate CNN base architectures. One for the context (lower zoom and wider receptive field) and a deeper architecture for the focus (highest zoom). The output vectors from the last convolutional layers of both architectures were combined with linear layers to produce a single binary output.

Multilevel architecture

Testing

Five replicates of each of the best performing baseline and multilevel models were trained on all training folds. Their performance was measured in the test fold.

Three of the tumor region containing test set WSI's were re-sampled covering the whole tissue region. Best performing models were used for generating tile-level tumor-probability heatmaps. Tumor regions were thresholded from the heatmaps, and each model's threshold value was selected from the highest F-0.5 score in the training folds.

Results

Training folds optimization

Leave-one-center-out cross-validation.

-Fold_0: Train={center_1,center_2,center_3}, Validation={center_0} -Fold_1: Train={center_0,center_2,center_3}, Validation={center_1} -Fold_2: Train={center_0,center_1,center_3}, Validation={center_2} -Fold_3: Train={center_0,center_1,center_2}, Validation={center_3}

id suffix N = Trained and tested on normalized data

id suffix A = Trained on heavily color augmented data

Average AUCs from all of the four folds. The red dotted line is the best baseline average AUC

Test fold

Train={center_0,center_1,center_2,center_3}, Test={center_4}

id suffix N = Trained and tested on normalized data

id suffix A = Trained on heavily color augmented data

The red dotted line is the best baseline run. The two baseline models use SE-ResNeXt101 32x4d architecture. All multilevel models use SE-ResNeXt50 32x4d as the context model and SE-ResNeXt101 32x4d as the focus model.

The red dotted line is the best average baseline AUC

Tumor WSI's from the test fold

Tumor masks were produced from three test fold WSI's that had tumor regions: patient_081_node_4, patient_088_node_1 and patient_099_node_4.

Tumor segmentation of the three patient WSI's (patient_081_node_4, patient_088_node_1 and patient_099_node_4) with the model 19A

Project

This section describes the project structure and notebook contents.

Project structure

This project assumes that the Camelyon17 training data set is downloaded and unzipped in the following way:

data/
    |_annotations/
        |_patient_004_node.xml
        |_...
    |_training/
        |_center_0
            |_patient_000.tif
            |_...
        |_...
    |_stage_labels.csv

Notebooks

These should be run in order as the preprocessing steps are prerequisites for the later notebooks.

Preprocessing - Convert lesion annotations to masks

Camelyon17 annotations are stored in polygon representations (xml). This notebook converts them to tif pixel image masks where value of 1 means tumor and 2 means normal.

Preprocessing - View tumor annotations and create tissue masks

16 times downsampled tissuemasks are stored as binary (0-background, 255-tissue) uint8 numpy arrays from each WSI.

Preprocessing - Create dataframes

Dataframe contains center coordinates, tissue percentage, tumor percentage, file information, and label of all the tissue samples.

Statistics - Patch stats
Dataset - Sampling splits
Dataset - Creating patches
Normalization - Normalize H&E staining in patches to compare models with, and without normalization.
Baseline - Baseline models - hyperparameter optimization
Multilevel - Multilevel models - hyperparameter optimization
Pretraining modules - Pretraining multilevel CNN modules
Pretrained multilevel - Multilevel models with autoencoder pretrained context encoders
Test - Test set performance
Threshold selection - Search for the best WSI heatmap binary threshold with the training set.
WSI heatmap - Tumor heatmap for the test set tumor WSI's
Conclusion - Overview and analysis of the results

References

[1] B. E. Bejnordi, M. Veta, P. J. van Diest, B. van Ginneken, N. Karssemeijer, G. Litjens, J. A. W. M. van der Laak, and the CAMELYON16 Consortium. (2017) Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA . 318 (22):2199–2210. doi: 10.1001/jama.2017.14585

[2] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. (2017) Grad-CAM: Visual explanations from deep networks via gradient-based localization. arXiv:1610.02391

aliushn / camelyon17-multilevel Goto Github PK

camelyon17-multilevel's Introduction