Git Product home page Git Product logo

ncrf's Introduction

Baidu Logo

NCRF

This repository contains the code and data to reproduce the main results from the paper:

Yi Li and Wei Ping. Cancer Metastasis Detection With Neural Conditional Random Field. Medical Imaging with Deep Learning (MIDL), 2018.

If you find the code/data is useful, please cite the above paper:

@inproceedings{li2018cancer,
    title={Cancer Metastasis Detection With Neural Conditional Random Field},
    booktitle={Medical Imaging with Deep Learning},
    author={Li, Yi and Ping, Wei},
    year={2018}
}

If you have any quesions, please post it on github issues or email at [email protected]

Prerequisites

  • Python (3.6)

  • Numpy (1.14.3)

  • Scipy (1.0.1)

  • PyTorch (0.3.1)/CUDA 8.0 The specific binary wheel file is cu80/torch-0.3.1-cp36-cp36m-linux_x86_64.whl. I havn't tested on other versions, especially 0.4+, wouldn't recommend using other versions.

  • torchvision (0.2.0)

  • PIL (5.1.0)

  • scikit-image (0.13.1)

  • OpenSlide 3.4.1(Please don't use 3.4.0 as some potential issues found on this version)/openslide-python (1.1.0)

  • matplotlib (2.2.2)

  • tensorboardX Standard along tensorboard that also works for PyTorch. This is mostly used in monitoring the training curves.

  • QuPath Although not directly relevant to training/testing models, I found it very useful to visualize the whole slide images.

Most of the dependencies can be installed through pip install with version number, e.g.

pip install 'numpy==1.14.3'

Or just simply

pip install numpy

A requirements.txt file is also provided, so that you can install most of the dependencies at once:

pip install -r requirements.txt -i https://pypi.python.org/simple/

For PyTorch please consider downloading the specific wheel binary and use

pip install torch-0.3.1-cp36-cp36m-linux_x86_64.whl

Data

Whole slide images

The main data are the whole slide images (WSI) in *.tif format from the Camelyon16 challenge. You need to apply on Camelyon16 for data access, and once it's approved, you can download from either Google Drive, or Baidu Pan. Note that, one slide is usually ~100Kx100K pixels at level 0 and 1GB+ on disk. There are 400 slides in total, together about 700GB+. So make sure you have enough disk space. The tumor slides for training are named as Tumor_XXX.tif, where XXX ranges from 001 to 110. The normal slides for training are named as Normal_XXX.tif, where XXX ranges from 001 to 160. The slides for testing are named as Test_XXX.tif where XXX ranges from 001 to 130.

Once you download all the slides, please put all the tumor slides and normal slides for training under one same directory, e.g. named /WSI_TRAIN/.

Update

It seems the whole slide image *tif files are now application free to download at GigaDB. But still please contact the Camelyon16 organizers for data usage.

Annotations

The Camelyon16 organizers also provides annotations of tumor regions for each tumor slide in xml format. I've converted them into some what simpler json format, located under NCRF/jsons. Each annotation is a list of polygons, where each polygon is represented by its vertices. Particularly, positive polygons mean tumor region and negative polygons mean normal regions. You can also use the following command to convert the xml format into the json format

python NCRF/wsi/bin/camelyon16xml2json.py Tumor_001.xml Tumor_001.json

Patch images

Although the original 400 WSI files contain all the necessary information, they are not directly applicable to train a deep CNN. Therefore, we have to sample much smaller image patches, e.g. 256x256, that a typical deep CNN can handle. Efficiently sampling informative and representative patches is one of the most critical parts to achieve good tumor detection performance. To ease this process, I have included the coordinates of pre-sampled patches used in the paper for training within this repo. They are located at NCRF/coords. Each one is a csv file, where each line within the file is in the format like Tumor_024,25417,127565 that the last two numbers are (x, y) coordinates of the center of each patch at level 0. tumor_train.txt and normal_train.txt contains 200,000 coordinates respectively, and tumor_valid.txt and normal_valid.txt contains 20,000 coordinates respectively. Note that, coordinates of hard negative patches, typically around tissue boundary regions, are also included within normal_train.txt and normal_valid.txt. With the original WSI and pre-sampled coordinates, we can now generate image patches for training deep CNN models. Run the four commands below to generate the corresponding patches:

python NCRF/wsi/bin/patch_gen.py /WSI_TRAIN/ NCRF/coords/tumor_train.txt /PATCHES_TUMOR_TRAIN/
python NCRF/wsi/bin/patch_gen.py /WSI_TRAIN/ NCRF/coords/normal_train.txt /PATCHES_NORMAL_TRAIN/
python NCRF/wsi/bin/patch_gen.py /WSI_TRAIN/ NCRF/coords/tumor_valid.txt /PATCHES_TUMOR_VALID/
python NCRF/wsi/bin/patch_gen.py /WSI_TRAIN/ NCRF/coords/normal_valid.txt /PATCHES_NORMAL_VALID/

where /WSI_TRAIN/ is the path to the directory where you put all the WSI files for training as mentioned above, and /PATCHES_TUMOR_TRAIN/ is the path to the directory to store generated tumor patches for training. Same naming applies to /PATCHES_NORMAL_TRAIN/, /PATCHES_TUMOR_VALID/ and /PATCHES_NORMAL_VALID/. By default, each command is going to generate patches of size 768x768 at level 0 using 5 processes, where the center of each patch corresponds to the coordinates. Each 768x768 patch is going to be further split into a 3x3 grid of 256x256 patches, when the training algorithm that leverages CRF comes into play.

Note that, generating 200,000 768x768 patches using 5 processes took me about 4.5 hours, and is about 202GB on disk.

Model

NCRF The core idea of NCRF is taking a grid of patches as input, e.g. 3x3, using CNN module to extract patch embeddings, and using CRF module to model their spatial correlations. The CNN module is adopted from the standard ResNet released by torchvision (https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py). The major difference is during the forward pass that 1. the input tensor has one more dimension, 2. use the CRF module to smooth the logit of each patch using their embeddings.

def forward(self, x):
    """
    Args:
        x: 5D tensor with shape of
        [batch_size, grid_size, 3, crop_size, crop_size],
        where grid_size is the number of patches within a grid (e.g. 9 for
        a 3x3 grid); crop_size is 224 by default for ResNet input;
    Returns:
        logits, 2D tensor with shape of [batch_size, grid_size], the logit
        of each patch within the grid being tumor
    """
    batch_size, grid_size, _, crop_size = x.shape[0:4]
    # flatten grid_size dimension and combine it into batch dimension
    x = x.view(-1, 3, crop_size, crop_size)

    x = self.conv1(x)
    x = self.bn1(x)
    x = self.relu(x)
    x = self.maxpool(x)

    x = self.layer1(x)
    x = self.layer2(x)
    x = self.layer3(x)
    x = self.layer4(x)

    x = self.avgpool(x)
    # feats means features, i.e. patch embeddings from ResNet
    feats = x.view(x.size(0), -1)
    logits = self.fc(feats)

    # restore grid_size dimension for CRF
    feats = feats.view((batch_size, grid_size, -1))
    logits = logits.view((batch_size, grid_size, -1))

    if self.crf:
        logits = self.crf(feats, logits)

    logits = torch.squeeze(logits)

    return logits

The CRF module only has one trainable parameter W for pairwise potential between patches. You can plot the W from the ckpt file (see next section) of a trained CRF model by

python NCRF/wsi/bin/plot_W.py /PATH_TO_MODEL/best.ckpt

When the CRF model is well trained, W typically reflects the relative spatial positions between different patches within the input grid. For more details about the model, please refer to our paper.

Training

With the generated patch images, we can now train the model by the following command

python NCRF/wsi/bin/train.py /CFG_PATH/cfg.json /SAVE_PATH/

where /CFG_PATH/ is the path to the config file in json format, and /SAVE_PATH/ is where you want to save your model in checkpoint(ckpt) format. Two config files are provided at NCRF/configs, one is for ResNet-18 with CRF

{
 "model": "resnet18",
 "use_crf": true,
 "batch_size": 10,
 "image_size": 768,
 "patch_size": 256,
 "crop_size": 224,
 "lr": 0.001,
 "momentum": 0.9,
 "data_path_tumor_train": "/PATCHES_TUMOR_TRAIN/",
 "data_path_normal_train": "/PATCHES_NORMAL_TRAIN/",
 "data_path_tumor_valid": "/PATCHES_TUMOR_VALID/",
 "data_path_normal_valid": "/PATCHES_NORMAL_VALID/",
 "json_path_train": "NCRF/jsons/train",
 "json_path_valid": "NCRF/jsons/valid",
 "epoch": 20,
 "log_every": 100
}

Please modify /PATCHES_TUMOR_TRAIN/, /PATCHES_NORMAL_TRAIN/, /PATCHES_TUMOR_VALID/, /PATCHES_NORMAL_VALID/ respectively to your own path of generated patch images. Please also modify NCRF/jsons/train and NCRF/jsons/valid with respect to the full path to the NCRF repo on your machine. The other config file is for ResNet-18 without CRF (the baseline model).

By default, train.py use 1 GPU (GPU_0) to train model, 2 processes for load tumor patch images, and 2 processes to load normal patch images. On one GTX 1080Ti, it took about 5 hours to train 1 epoch, and 4 days to finish 20 epoches. You can also use tensorboard to monitor the training process

tensorboard --logdir /SAVE_PATH/

training_acc Typically, you will observe the CRF model consistently achieves higher training accuracy than the baseline model.

train.py will generate a train.ckpt, which is the most recently saved model, and a best.ckpt, which is the model with the best validation accuracy. We also provide the best.ckpt of pretained resnet18_base and resnet18_crf at NCRF/ckpt.

Testing

Tissue mask

The main testing results from a trained model for WSI analysis is the probability map that represents where on the WSI the model thinks is tumor region. Naively, we can use a sliding window fashion that predicts the probability of all the patches being tumor or not across the whole slide image. But since most part of the WSI is actually white background region, lots of computation is wasted in this sliding window fashion. Instead, we first compute a binary tissue mask that represent each patch is tissue or background, and then tumor prediction is only performed on tissue region. A typical WSI and its tissue mask looks like this (Test_026) tissue_mask To obtain the tissue mask of a given input WSI, e.g. Test_026.tif, run the following command

python NCRF/wsi/bin/tissue_mask.py /WSI_PATH/Test_026.tif /MASK_PATH/Test_026.npy

where /WSI_PATH/ is the path to the WSI you are interested, and /MASK_PATH/ is the path where you want to save the generated tissue mask in numpy format. By default, the tissue mask is generated at level 6, corresponding to the inference stride of 64, i.e. making a prediction every 64 pixels at level 0.

The tissue mask of Test_026_tissue_mask.npy at level 6 is attached for comparison. Note that, when you plot it using matplotlib.pyplot.imshow, please transpose it.

Probability map

With the generated tissue mask, we can now obtain the probability map of a given WSI, e.g. Test_026.tif, using a trained model:

python NCRF/wsi/bin/probs_map.py /WSI_PATH/Test_026.tif /CKPT_PATH/best.ckpt /CFG_PATH/cfg.json /MASK_PATH/Test_026.npy /PROBS_MAP_PATH/Test_026.npy

where /WSI_PATH/ is the path to the WSI you are interested. /CKPT_PATH/ is where you saved your trained model and best.ckpt corresponds to the model with the best validation accuracy. /CFG_PATH/ is the path to the config file of the trained model in json format, and is typically the same as /CKPT_PATH/. /MASK_PATH/ is where you saved the generated tissue mask. /PROBS_MAP_PATH/ is where you want to save the generated probability map in numpy format.

By defautl, probs_map.py use GPU_0 for interence, 5 processes for data loading. Note that, although we load a grid of patches, e.g. 3x3, only the predicted probability of the center patch is retained for easy implementation. And because of this heavy computational overhead, it takes 0.5-1 hour to obtain the probability map of one WSI. We are thinking about developing more efficient inference algorithm for obtaining probability maps. probability_map This figure shows the probability maps of Test_026 with different settings: (a) original WSI, (b) ground truth annotation, (c) baseline method, (d) baseline method with hard negative mining, (e) NCRF with hard negative mining. We can see the probability map from the baseline method typically has lots of isolated false positives. Hard negative mining significantly reduces the number of false positives for the baseline method, but the probability density among the ground truth tumor regions is also decreased, which decreases model sensitivity. NCRF with hard negative mining not only achieves low false positives but also maintains high probability density among the ground truth tumor regions with sharp boundaries.

The probability map of Test_026_probs_map.npy at level 6 is attached for comparison. Note that, when you plot it using matplotlib.pyplot.imshow, please transpose it.

Tumor localization

We use non-maximal suppression (nms) algorithm to obtain the coordinates of each detectd tumor region at level 0 given a probability map.

python NCRF/wsi/bin/nms.py /PROBS_MAP_PATH/Test_026.npy /COORD_PATH/Test_026.csv

where /PROBS_MAP_PATH/ is where you saved the generated probability map, and /COORD_PATH/ is where you want to save the generated coordinates of each tumor regions at level 0 in csv format. There is an optional command --level with default value 6, and make sure it's consistent with the level used for the corresponding tissue mask and probability map.

FROC evaluation

With the coordinates of tumor regions for each test WSI, we can finally evaluate the average FROC score of tumor localization.

python NCRF/wsi/bin/Evaluation_FROC.py /TEST_MASK/ /COORD_PATH/

/TEST_MASK/ is where you put the ground truth tif mask files of the test set, and /COORD_PATH/ is where you saved the generated tumor coordinates. Evaluation_FROC.py is based on the evaluation code provided by the Camelyon16 organizers with minor modification. Note, Test_049 and Test_114 are excluded from the evaluation as noted by the Camelyon16 organizers.

ncrf's People

Contributors

dependabot[bot] avatar yil8 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ncrf's Issues

device-side assert triggered

I got RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/generic/THCTensorCopy.c:70 when calculate the loss(loss = loss_fn(output, target)), something wrong with my label?

Is Linux a requirement?

  1. Do I have to use Linux to run this or can I use windows (it is throwing an error when I try to run it with windows?

generate the corresponding patches

I got error when I ran this command to generate the corresponding patches how can I solve it? thank you!

[123@533 bin]$ python patch_gen.py /home/tom/CAMELYON16/training/WSI_TRAIN  /home/tom/Downloads/NCRF/coords/tumor_train.txt /home/tom/CAMELYON16/PATCHES_TUMOR_TRAIN
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib64/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib64/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "patch_gen.py", line 38, in process
    slide = openslide.OpenSlide(wsi_path)
  File "/usr/local/lib64/python3.6/site-packages/openslide/__init__.py", line 153, in __init__
    self._osr = lowlevel.open(filename)
  File "/usr/local/lib64/python3.6/site-packages/openslide/lowlevel.py", line 137, in _check_open
    "Unsupported or missing image file")
openslide.lowlevel.OpenSlideUnsupportedFormatError: Unsupported or missing image file
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "patch_gen.py", line 81, in <module>
    main()
  File "patch_gen.py", line 77, in main
    run(args)
  File "patch_gen.py", line 72, in run
    pool.map(process, opts_list)
  File "/usr/lib64/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/usr/lib64/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
openslide.lowlevel.OpenSlideUnsupportedFormatError: Unsupported or missing image file

关于test数据集中mask.tif

您好,能否提供一份测试集的tif格式的标注mask。我的邮箱[email protected]
ps:我一直是利用.xml的标注使用cv2.fillPoly来生成的只有单个倍率tif的mask,但我觉得这样的标注mask似乎并不准确。

CRF implementation in Keras is not not giving good results

Hi,
I am a student and working on Camelyon’16 as my Master’s project.
I was going through your very impressive paper - Yi Li and Wei Ping. Cancer Metastasis Detection With Neural Conditional Random Field. Medical Imaging with Deep Learning (MIDL), 2018.
And, found that you have implemented CRF in your code on top of Resnet 18. So far I am using Resnet50 but my FROC score is not going up from 0.55.

So, I have decided to use your approach and re-implemented your code in Keras (backend-Tensorflow). But the performance of the trained model is not even close to your results. Best FROC of Resnet18+CRF trained model-0.55. Lot of FPs are coming. My Resnet18 is taken from https://github.com/raghakot/keras-resnet

My queries-

  • My training loss- BCE- started from 1.16 and finally settled to 0.8639 and validation loss 0.8528. It is right or loss should go further down. I have run for 30+ epochs, but the loss remains the same(plateau). Don’t know why? (please refer attached image below)

  • Weight plot doesn’t look closer to what you have shown in your paper. In your, case all the positional patch weights W[0,0,0,0] <0,W[0,1,0,1] <0..W[2,2,2,2] <0 but I am not seeing that. As per equation, these will not affect the final predictions. (please refer attached image below)

Can you help me in understanding why loss is not going further down? It gets plateau after a certain number of epochs and after that no effect even with Cyclic Learning rate [ 1e-4,1e-5,1e-7]. This behavior is common across many models – rennet50/101/18 + Inception V3.
Please help me to solve these problems. I shall be thankful to you.

Training Configration
config

TensorBoard ACC/LOSS plot
image002

image006

Just Validation loss and training loss ( after 16 epochs ) -- Orange is val-loss, Blue is Training BCE loss
val_train_loss

Weight plots-
plot across epochs-

weight_epochs

Just one of the epoch(16) weight map from which heatmap is generated
image009

Heatmaps
Test_001.tiff ( cam'16 test data set)
Results from my trained model- at level 8

001

Results for your model-at level 6
test_001_crfbaidu

Clearly, your model performs far better than my trained model.

I have matched my CRF implementation in kears with yours in pyTorch. For the same input in both the model, I am getting the same output.

Please help me to reproduce your results in Kears+ TF.

Docker Container

If I build and write up the requirements and installation into a docker file (for easier adoption), would you be willing to accept that pull-request into this repo? If so, I'll gladly do that!

when I use probs_map.py, it will work,but when it will succeed,it report an error as following

INFO:root:2020-05-25 00:44:36, flip : NONE, rotate : NONE, batch : 34369/36683, Run Time : 0.96
INFO:root:2020-05-25 00:44:37, flip : NONE, rotate : NONE, batch : 34370/36683, Run Time : 0.97
Traceback (most recent call last):
File "NCRF/wsi/bin/probs_map.py", line 163, in
main()
File "NCRF/wsi/bin/probs_map.py", line 159, in main
run(args)
File "NCRF/wsi/bin/probs_map.py", line 116, in run
probs_map = get_probs_map(model, dataloader)
File "NCRF/wsi/bin/probs_map.py", line 53, in get_probs_map
for (data, x_mask, y_mask) in dataloader:
File "D:\python\lib\site-packages\torch\utils\data\dataloader.py", line 264, in next
batch = self.collate_fn([self.dataset[i] for i in indices])
File "D:\python\lib\site-packages\torch\utils\data\dataloader.py", line 264, in
batch = self.collate_fn([self.dataset[i] for i in indices])
File "C:\Users\Administrator\Desktop\NCRF\NCRF\wsi\bin/../..\wsi\data\wsi_producer.py", line 85, in getitem
(x, y), 0, (self._image_size, self.image_size)).convert('RGB')
File "D:\python\lib\site-packages\openslide_init
.py", line 223, in read_region
level, size[0], size[1])
File "D:\python\lib\site-packages\openslide\lowlevel.py", line 222, in read_region
_read_region(slide, buf, x, y, level, w, h)
File "D:\python\lib\site-packages\openslide\lowlevel.py", line 159, in _check_error
raise OpenSlideError(err)
openslide.lowlevel.OpenSlideError: Not a JPEG file: starts with 0xff 0xc0

I have check my openslide:
C:\Users\Administrator>openslide-show-properties --version
openslide-show-properties.exe 3.4.1, using OpenSlide 3.4.1
Copyright (C) 2007-2015 Carnegie Mellon University and others

OpenSlide is free software: you can redistribute it and/or modify it under
the terms of the GNU Lesser General Public License, version 2.1.
http://gnu.org/licenses/lgpl-2.1.html

OpenSlide comes with NO WARRANTY, to the extent permitted by law. See the
GNU Lesser General Public License for more details.

Codes for coordinate generation

Could you release the codes for coordinates generation? Now the patch size is fixed to be 768x768. Once it changes, we need to generate new coordinates. Also need to extract and save a new set of extracted images, which is spatially expensive.

如何挑选较好的模型

您好,我在使用您提供的代码跑出来的性能与论文差距较大,我想知道,您在训练时,是根据在valid上选择的吗?如果是 是根据哪些指标选择的了, 我发现有时候两个模型相比,acc差不多 但是在test上的分数却差异比较大?

A issue about the data

.tif 图片的 level 6 图片,与level 0图片。图片的两条边的缩放比例不一样,导致wsi_producer.py 的 52行报错。
请问有人遇到这个问题吗,谢谢啦

image

Is this results achieved the SOTA before 2018?

Thanks for your great work! I found that Google's results is 0.87 in 2017, which is higher than your results. Is that two results comparable or they are in different setting? And your results is 0.79 in your paper but 0.80 in the presentation, what difference between the two results?

Why is there no NCRF without hard negative mining in the probability map?

Thanks for your great work. I want to know the probability map in NCRF without hard negative mining, which I think it's more comparable with the baseline, but it seems like you didn't show this in the paper or README file. So I wonder why is there no NCRF without hard negative mining in the probability map?

slide/mask mismatch

If I follow the instructions for testing as given in readme. I get the following error

Traceback (most recent call last):
  File "wsi/bin/probs_map.py", line 163, in <module>
    main()
  File "wsi/bin/probs_map.py", line 159, in main
    run(args)
  File "wsi/bin/probs_map.py", line 115, in run
    args, cfg, flip='NONE', rotate='NONE')
  File "wsi/bin/probs_map.py", line 87, in make_dataloader
    flip=flip, rotate=rotate),
  File "/media/udion/a2c5c487-f939-4b82-a348-86b3d1bdb024/udion_home/Projects/NCRF/wsi/bin/../../wsi/data/wsi_producer.py", line 42, in __init__
    self._preprocess()
  File "/media/udion/a2c5c487-f939-4b82-a348-86b3d1bdb024/udion_home/Projects/NCRF/wsi/bin/../../wsi/data/wsi_producer.py", line 55, in _preprocess
    .format(X_slide, X_mask, Y_slide, Y_mask))
Exception: Slide/Mask dimension does not match , X_slide / X_mask : 98304 / 1536, Y_slide / Y_mask : 103936 / 2048

what's the issue?

关于paper的一些疑问。

拜读了您的论文,有一点疑惑还请不吝赐教。才疏学浅尚在求学,如有纰漏万望海涵!

  • 关于平均场理论。Q(Y)估计的是联合概率密度,而对于每一个Qi(yi),其与真实的边缘概率密度Pi(yi)的差别可能是很大的。不应该用Qi(yi)来估计真实的边缘密度,就像在一个贝叶斯网络中,你不应该用它来推测某个节点的状态。
    比如一个标准的高斯联合分布P(μ,x)和最优的平均场高斯估计Q(μ,x)。Q选择了在它自己作用域中的高斯分布,因而变得很窄。此时边缘密度Qx(x)变得非常小,完全与Px(x)不同。
    _20190223183239

Evaluation Mask Level in Evaluation_FROC

Can the Evaluation_Mask_Level in Evaluation_FROC can be different from the level at which the probability map was generated?
I mean you generated the prob maps at level 6 but evaluation is done at level 5. Any particular reason behind this?
p.s. I generated ground truth tif file mask at level 6. Does it matter?
Thankyou.

EVALUATION_MASK_LEVEL = 5 # Image level at which the evaluation is done

NCRF 模型参数问题

使用提供的模型参数进行测试,发现FROC只有0.39 AUC只有0.9660。 想请教作者是如何将该FROC提升至0.79的? 目前来看性能差别有点大。

request for ground truth mask of test_021.tif

Before we caculate the froc score, it will take so much time to convert xml file in camelyon16 test set to mask.tif, maybe 7+ days for test_021.tif.
Could you please send me the mask tif file of test_021.tif ? Thanks a lot.
And my email address is [email protected]

When i try to exclude the test_021.tif of the test set, i got froc :

Avg FP = 0.25
Sensitivity = 0.5981735159817352
Avg FP = 0.5
Sensitivity = 0.6712328767123288
Avg FP = 1
Sensitivity = 0.726027397260274
Avg FP = 2
Sensitivity = 0.7808219178082192
Avg FP = 4
Sensitivity = 0.8127853881278538
Avg FP = 8
Sensitivity = 0.8584474885844748
Avg Sensivity = 0.7412480974124809

which is lower than your paper said, 0.7825 for resnet18_baseline.

Did you use transfer learning?

Hi, Can you tell whether you used transfer learning or not? and if yes what were the details especially that of fine tuning? thanks in advance.

downsampling level selection

The heatmap generated by mask downsampled in level 8 and the heatmap generated by mask downsampled in level 6 difffers a lot. How to choose the downsampling level when generating the heatmap ? Why cause this? A little confused , Thank you!

Some issues about the reproducing results

When I tried to reproduce your codes, there are less-than-perfect results.
For example, below is the raw test_001 tiff
tif_raw_convert_img
And after the whole training with the ResNet18-CRF, I got the test prob map result as:
probmap_convert_img
while the ground-truth mask is something like:(since the camelyon16 organizers didn't provide the test GT with the format of tiff anymore, I transferred the raw tiff test file and xml file to the tiff mask with the ASAP software manually.)
npy_mask_convert_img

And I have just followed your test steps and evaluated the average FROC score for the whole test set, and got this:
froc_npy

However, the result is not at all satisfying.

And is there any other trick in your preprocess, postprocess, or the training process?

Here is the prob map of test_026:

probmap_convert_img_026

The idea about Pathological section

I see you use 3x3 patches as the input. how about 5x5 or 7x7?
I kown the neighboring patch is also important to classify.
but Pathological classify has Histology and cytology. if the problem is cytology. will this skill work?

Pickle issue with prob_maps while generating heat maps

HI,
I am running pobs_map to generate the heatmap for 130 test images to reproduce the FROC.
I started with normal configuration --GPU 0,1 and --num_workers 5
I am seeing the following error on Windows.

Traceback (most recent call last):
for (data, x_mask, y_mask) in dataloader:
File "C:\ProgramData\Miniconda3\lib\site-packages\torch\utils\data\dataloader.py", line 501, in iter
return _DataLoaderIter(self)
File "C:\ProgramData\Miniconda3\lib\site-packages\torch\utils\data\dataloader.py", line 289, in init
w.start()
File "C:\ProgramData\Miniconda3\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\ProgramData\Miniconda3\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\ProgramData\Miniconda3\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\ProgramData\Miniconda3\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "C:\ProgramData\Miniconda3\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
ValueError: ctypes objects containing pointers cannot be pickled
Traceback (most recent call last):
File "", line 1, in
File "C:\ProgramData\Miniconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\ProgramData\Miniconda3\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

But when I have changed the setting to - --GPU 0 and --num_workers 0 , It started running but very slow for one heatmap @level6 - it is showing 21hrs.

Can you help me on this?

mask of Test_026 inference stride

In Tissue mask part, you said you generate the mask of Test_026.tif in level 6 corresponding to the inference stride of 64, but I found the level 6 downsample factor is 57.375, dose it means the inference stride is nearly 57 instead of 64?

coords from jsons

Hi,

I was wondering how are the coords made from the json files?

Kind regards,

Taran

About how to generater Normal_*.json

Hi,

I was wondering how to generater Normal_*. json files?

And if it was generatered by the normal tissue mask? can you share the process detail. Thank you!

Kind regards,

Cannot get correct FROC with resnet18 baseline ckpt

I tried to calculate FROC using your resnet18_base.ckpt, and I got:
Avg FP = 0.25
Sensitivity = 0.6061946902654868
Avg FP = 0.5
Sensitivity = 0.6769911504424779
Avg FP = 1
Sensitivity = 0.7345132743362832
Avg FP = 2
Sensitivity = 0.7876106194690266
Avg FP = 4
Sensitivity = 0.8185840707964602
Avg FP = 8
Sensitivity = 0.8628318584070797
Avg Sensivity = 0.7477876106194691

I have excluded the test_114.tif in the test set, but there is a gap between my results and the paper said (0.7825).
But I got the correct FROC using resnet18_crf.ckpt.
If the resnet18 baseline ckpt given in the project is same as which you used to calculate the FROC in paper?
Thanks a lot.

openslide.lowlevel.OpenSlideError: TIFFRGBAImageGet failed

JPEGLib: Not a JPEG file: starts with 0x00 0x00.
Traceback (most recent call last):
File "wsi/bin/patch_gen.py", line 48, in
(args.patch_size, args.patch_size)).convert('RGB')
File "/home/list-2018/.conda/envs/tensorflow/lib/python3.5/site-packages/openslide/init.py", line 223, in read_region
level, size[0], size[1])
File "/home/list-2018/.conda/envs/tensorflow/lib/python3.5/site-packages/openslide/lowlevel.py", line 259, in read_region
_read_region(slide, buf, x, y, level, w, h)
File "/home/list-2018/.conda/envs/tensorflow/lib/python3.5/site-packages/openslide/lowlevel.py", line 196, in _check_error
raise OpenSlideError(err)
openslide.lowlevel.OpenSlideError: TIFFRGBAImageGet failed

about backend

I notice you use pytorch ,can you provide a tf or keras as beckend? I think it is good for study

about how to compute pairwise_potential_E

hi,may i ask about why you compute "pairwise_potential_E" by minus
pairwise_potential_E = torch.sum(
probs * pairwise_potential - (1 - probs) * pairwise_potential,
dim=2, **keepdim=True)
if i want to use it for multi-classification,how could i modify it.
thanks.

question about "probs * pairwise_potential - (1 - probs) * pairwise_potential"

非常感谢能够拜读你的论文和代码,我在看CRF layer时,对这个公式"probs * pairwise_potential - (1 - probs) * pairwise_potential"有一点不理解,(通过 #13 已经知道在代码里有详细介绍,同时在看代码的时候也注意到了)
这个是不是用的这个公式来的,我看是期望,期望的话,是不是
expectation
probs 是i 为tumor的概率,pairwise_potential 对应的是W的相似性的一个新的值,就像如果向量很相似,那角度就是0,对应的值就是cos 就是1,那他们的w就是一样的,如果角度是180那就是完全相反情况,那值w就是相反的为-1*w,所以在 1. i=T,j=T; 2. i=N,j=T; 3. i=T,j=N; 4. i=N,j=N
的四种情况的第一种和第四种,就是数据+pairwise_potential出现的概率为probs,所以case1 是‘probs * +pairwise_potential’,同理第四个就是 i 为normal,那其概率就是 1-probs, 事件 是相反的, 所以就是 -pairwise_potential,所以case 4 是 ‘(1 - probs) * -pairwise_potential‘ .
pair
事件乘以概率,得到期望,然后23是根据’label compatibility‘消除掉的,这一块还没研究,不过想求证下刚刚上述的理解是不是对的,,,已经研究了一个星期了,今天突然感觉理解了,但是不知道对不对,希望得到解惑

the coordinates of pre-sampled patches

How did you get the coordinates of pre-sampled patches?tumor_train.txt,tumor_train.txt,tumor_valid.txt,normal_valid.txt

python NCRF/wsi/bin/patch_gen.py /WSI_TRAIN/ NCRF/coords/tumor_train.txt /PATCHES_TUMOR_TRAIN/
python NCRF/wsi/bin/patch_gen.py /WSI_TRAIN/ NCRF/coords/tumor_train.txt /PATCHES_NORMAL_TRAIN/
python NCRF/wsi/bin/patch_gen.py /WSI_TRAIN/ NCRF/coords/tumor_valid.txt /PATCHES_TUMOR_VALID/
python NCRF/wsi/bin/patch_gen.py /WSI_TRAIN/ NCRF/coords/normal_valid.txt /PATCHES_NORMAL_VALID/

利用CAMELYON16提供的Annotations在測試時無法正確計算FROC

每個xml所產生的mask.tif經過computeEvaluationMask皆只有返回一個腫瘤
ex . test_004.xml具有3個腫瘤註釋,但是生成的mask tif文件返回1個腫瘤
xml to tif
reader = mir.MultiResolutionImageReader()
mr_image = reader.open('../images/test_021.tif')
annotation_list = mir.AnnotationList()
xml_repository = mir.XmlRepository(annotation_list)
xml_repository.setSource('test_021.xml')
xml_repository.load()
annotation_mask = mir.AnnotationToMask()
camelyon17_type_mask = False
label_map = {'metastases': 1, 'normal': 2} if camelyon17_type_mask else {'_0': 1, '_1': 1, '_2': 0}
conversion_order = ['metastases', 'normal'] if camelyon17_type_mask else ['_0', '_1', '_2']
output_path = "test_021_mask.tif"
annotation_mask.convert(annotation_list, output_path, mr_image.getDimensions(), mr_image.getSpacing(), label_map, conversion_order)

evaluation mask
def computeEvaluationMask(maskDIR, resolution, level):
"""Computes the evaluation mask.

Args:
    maskDIR:    the directory of the ground truth mask
    resolution: Pixel resolution of the image at level 0
    level:      The level at which the evaluation mask is made
    
Returns:
    evaluation_mask
"""
slide = openslide.open_slide(maskDIR)
dims = slide.level_dimensions[level]
pixelarray = np.zeros(dims[0]*dims[1], dtype='uint')
pixelarray = np.array(slide.read_region((0,0), level, dims))
distance = nd.distance_transform_edt(255 - pixelarray[:,:,0])
Threshold = 75/(resolution * pow(2, level) * 2) # 75µm is the equivalent size of 5 tumor cells
binary = distance < Threshold
filled_image = nd.morphology.binary_fill_holes(binary)
evaluation_mask = measure.label(filled_image, connectivity = 2) 
return evaluation_mask

难样本挖掘问题

您好,我看您在训练过程中使用了难样本挖掘技术,之前参看谷歌的论文,是将分类错误的图片进行多次训练,请问您的难样本挖掘是如何实现的呢?将tumor图像中的边缘地区截取出来作为负样本吗?

confuse about hard example extracting

As in the readme 'Note that, coordinates of hard negative patches, typically around tissue boundary regions, are also includedwithin normal_train.txt and normal_valid.txt', and the code directly provide the coordinate list containing hard negative patches.
But I could not find how these coordinates were generated. I don't sure if I miss something in the code about this question.

probability map semantics

the output of the command for generating probability map, like

python NCRF/wsi/bin/probs_map.py /WSI_PATH/Test_012.tif /CKPT_PATH/best.ckpt /CFG_PATH/cfg.json /MASK_PATH/Test_012.npy /PROBS_MAP_PATH/Test_012.npy

generates numpy file for the probability map, but the semantics of numpy file are bit odd, how do I intereprete the values?

for example, in the above case, If I print the max and min values of test_012.npy, then I get
0.05732796713709831 and 0.0
Which seems odd to me, as I was expecting max probability to be 1 (or close to one), but such low value represents that at every pixel probability of detecting cancer is so small?

The naming for coordinates used in normal_train.txt file is confusing.

Hello
The coords that you have shared with us was superb. It could not be more helpful.
However, In the normal_train.txt file, the naming used for each whole slide images is a bit confusing. As it is for Normal part, we can see Tumor_094,80227,34836 in this file. I appreciate it if you can help me to understand whether it is a mistake and I should change the name with Normal_094,80227,34836 or there is another reason for that.
Many thanks in advance for your help.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.