spreka / biomagdsb Goto Github PK

This repository contains the codes to run the nuclei segmentation pipeline of the BIOMAG group inspired by Kaggle's Data Science Bowl 2018 competition

Python 22.66% Shell 2.87% MATLAB 42.79% C 3.50% HTML 3.21% Makefile 0.17% Java 4.65% M4 2.93% C++ 4.41% TeX 9.80% M 0.10% Batchfile 2.92%

biomagdsb's Introduction

Intro

This repository contains the codes to run the nuclei segmentation pipeline of the BIOMAG group inspired by Kaggle's Data Science Bowl 2018 competition

Some resulting masks obtained by our method:

Prerequisites

Please see requirements.txt that can also be run as a bash script (Linux) or alternatively, you can copy the install commands to console corresponding to your system (command prompt (Windows) / terminal (Linux)) and execute them.

Install CUDA 9.0 and CuDNN 7.0 as well as MATLAB* (Release 2017a or later) appropriate for your system. Currently, Linux and Windows implementation is provided.

*: MATLAB is not required for fast prediction.
MATLAB* toolboxes used by the repository are:
- Image Processing Toolbox
- Parallel Computing Toolbox
- Statistics and Machine Learning Toolbox
- Curve Fitting Toolbox
- Global Optimization Toolbox
- Optimization Toolbox
See requirements.txt for python packages to install.
Download Matterport's Mask R-CNN github repository or clone directly with git and revert to the commit our method uses:

	git clone https://github.com/matterport/Mask_RCNN.git
	git checkout 53afbae5c5159b5a10ecd024a72b883a2b058314

You will need to set the path of your cloned Mask R-CNN folder in the scripts below
See documentation in .pdf on how to use functionalities of our pipeline

Data

Our method expects images to be 8-bit 3 channels RGB images in .png format. See our script to convert your images.

Prediction

Download our pre-trained models from our google drive

Make sure you have mask_rcnn_coco.h5, mask_rcnn_presegmentation.h5 and mask_rcnn_final.h5 in the folder \kaggle_workflow\maskrcnn\model\
Make sure you have UNet_sigma0.0_1\UNet_sigma0.0_1 and the other U-Net models in the folder \kaggle_workflow\unet\

You can choose either full prediction with post-processing or fast prediction; the former takes longer to complete and requires more VRAM.

Full prediction pipeline with post-processing

Predicts nuclei first with a presegmenter Mask R-CNN model, estimates cell sizes, predicts with multiple U-Net models and ensembles the results, then uses all of the above in a final post-processing step to refine the contours. To predict nuclei on images please edit either

start_prediction_full.bat (Windows) or
start_prediction_full.sh (Linux)

and specify the following 3 directories with their corresponding full paths on your system:

Mask R-CNN
root_dir
images_dir

Note: pre-processing scripts are provided to convert your test images. See further details in the documentation.

Fast prediction

Predicts nuclei with a presegmenter Mask R-CNN model that generalizes and performs well in varying image types. Produces fast results that can be improved with the post-processing option above. To predict fast: Please follow the steps of "PREDICTION WITH POST-PROCESSING" section for either of the files:

start_prediction_fast.bat (Windows) or
start_prediction_fast.sh (Linux)

See further details in the documentation.

Custom validation

To use your custom folder of images as validation please run the following script according to your operating system:

runGenerateValidationCustom.bat (Windows)
runGenerateValidationCustom.sh (Linux)

See further details in the documentation.

Training

Obtain our pre-trained classifier pretrainedDistanceLearner.mat for training by either:

Downloading it from our google drive. Make sure you have pretrainedDistanceLearner.mat in the folder \kaggle_workflow\inputs\clustering\

or
Installing Git LFS (Large File Storage) by following the instructions on their installation guide or their github page according to your operating system. Make sure you set it up after installation:
```
 	git lfs install
```

WARNING: it is possible to overwrite our provided trained models in this step. See documentation for details.

We include a .mat file with the validation image names we used for the Kaggle DSB2018 competition. If you would like to use your own images for this pupose, see Custom validation above.

WARNING: training will override the U-Net models we provide, we advise you make a copy of them first from the following relative path: \kaggle_workflow\unet\

To train on your own images please run the following script according to your operating system:

start_training.bat (Windows)
start_training.sh (Linux)

NOTE: for Windows you need to edit start_training.bat and set your python virtual environment path as indicated prior to running the script. It will open a second command prompt for necessary server running of pix2pix and must remain open until all pix2pix code execution is finished - which is indicated by the message "STYLE TRANSFER DONE:" in command prompt.

See further details in the documentation.

Parameter search for post-processing

A generally optimal set of parameters are provided in the scripts as default. However, you can run our parameter optimizer to best fit to your image set.

To find the most optimal parameters please run the following script according to your operating system:

start_parameterSearch.bat (Windows)
start_parameterSearch.sh (Linux)

and see the found parameters in the text file \kaggle_workflow\outputsValidation\paramsearch\paramsearchresult.txt

See further details in the documentation.

Prepare style transfer input for single experiment

To prepare style transfer on your own images coming from the same experiment please run the following script according to your operating system:

start_singleExperimentPreparation.bat (Windows)
start_singleExperimentPreparation.sh (Linux)

After this you are ought to run these training scripts instead of the ones above:

start_training_singleExperiment.bat (Windows)
start_training_singleExperiment.sh (Linux)

as these scripts would use the single experiment data for style transfer learning.

WARNING: If you do not provide your own mask folder for this step the default option will be \kaggle_workflow\outputs\presegment which is created by the fast segmentation step of our pipeline. Please run it prior to this step to avoid 'file not found' errors.

NOTE: This option should only be used if all your images come from the same experiment. If you provide mixed data, subsequent style transfer learning will result in flawed models and failed synthetic images.

Preprocess test images

If your test images are 16-bit you may want to convert them to 8-bit 3 channel images with either

start_image_preprocessing.bat (Windows)
start_image_preprocessing.sh (Linux)

Citation

Please cite our paper if you use our method:

Reka Hollandi, Abel Szkalisity, Timea Toth, Ervin Tasnadi, Csaba Molnar, Botond Mathe, Istvan Grexa, Jozsef Molnar, Arpad Balind, Mate Gorbe, Maria Kovacs, Ede Migh, Allen Goodman, Tamas Balassa, Krisztian Koos, Wenyu Wang, Juan Carlos Caicedo, Norbert Bara, Ferenc Kovacs, Lassi Paavolainen, Tivadar Danka, Andras Kriston, Anne Elizabeth Carpenter, Kevin Smith, Peter Horvath (2020): “nucleAIzer: a parameter-free deep learning framework for nucleus segmentation using image style transfer”, Cell Systems, Volume 10, Issue 5, 20 May 2020, Pages 453-458.e6

biomagdsb's People

Contributors

Stargazers

Watchers

Forkers

volkerh usccolumbia henley13 csmolnar opencv30 sunycl eddienko paulxiong renzinan dmankins joe-nano nishannova toufiq54 zwb7

biomagdsb's Issues

Very Different output from github code and online API on same image to detect nuceli

Hi,
Thank you for sharing code for the opensource community.
I am trying to modify your source code as per my need but I am seeing very weird behavior.
I am trying to detect nuclei boundary using - https://www.nucleaizer.org/ and GitHub source code- start_prediction_full.bat for the same image.

I have few queries in this context. Please answer them-

Why there is so much different in output? Are you using the different trained model in online and GitHub file ( downloaded from google drive as suggested on the website)
what should be image correct image size to run the script? I was running online API using 3000x3000 / 1500x1500 images and all of them were failing. But, now I am trying with 512x512 and it is working.

Where should I put the images and masks if I am training the model using my own dataset?

I have read the codes "run_workflow_trainOnly.sh" and "start_training.sh".

I am confused about $IMAGES_DIR, $ORIGINAL_DATA, $TEST1, $TRAIN_UNET, $TRAIN_MASKRCNN.

After many trials and errors, I am able to run "start_training.sh" without error when I did the following:

Split my data into train, val and test into some folders
runGenerateValidationCustom.sh using my val data
copy both train and val data into $TRAIN_UNET and $TRAIN_MASKRCNN
copy test data into $TEST1
copy a few images out of test into $IMAGES_DIR

However, I am not sure this is what I should be doing.

Why do I need $IMAGES_DIR?

I am using your original $CLUSTER_CONFIG. Is that right?

training after style transfer

Hi , again.
you wrote in the readme file that (single style transfer section):
After this you are ought to run these training scripts instead of the ones above:

start_prediction_fast.bat (Windows)
start_prediction_fast.sh (Linux)

but, these scripts are for predictions. they should not be train scripts ?? or am I missing something ?
as I know we should train the models again with new augmented data after style transfer. correct me if I'm wrong.

No provided Mask RCNN and black output

Hello,

I've tried running the start_prediction_full.sh script but I got an error from the python script saying ModuleNotFoundError: No module named 'config'. So I downloaded the Mask RCNN folder from matterport and pasted the required files into ./FinalModel. The script runs fine after that. However, the output I get is always a completely black image.

I've tried to run the detection on the image you see below (its resolution is 256 x 256) as well as the three included test images. The console states that the count of nuclei is greater than 0 but in the output folder under "postprocessing" as well as "presegment" I only find one tiff image per input image that is completely black.

I've tried using only files from the Mask RCNN that are tagged with v2.1 (as that's the version you seem to be using) which did nothing. I've also tried replacing the skimage.io.imsave(...) line with cv2.imwrite(...) but no luck with that either.

Error for single channel grayscale images

If I run start_prediction_full.bat on a single channel grayscale image I get an error:
File "C:\Code\Python3\biomagdsb\UNet\utils.py", line 264, in getitem
image = io.imread(image_path)[:, :, :3]
IndexError: too many indices for Array

and later:
Error using imread>get_full_filename (line 481)
File "C:\Code\Python3\biomagdsb\kaggle_workflow\outputs\ensemble\output\BE_300_01_rescaled.png" does not exist.

Error in imread (line 344)
filename = get_full_filename(fid, errmsg, filename);

Error in postProcCodeFINAL (line 104)
probMap = (imread([probFolder filesep imageList(i).name(1:end-5) '.png']));

Error in postProcCodeRunnerFINAL (line 94)
postProcCodeFINAL(inMainFolder,outMainFolder,...

(It works if I convert the image to RGB before starting)

Error in cell size estimation and getting black images as presegmentation ends

Hi, I Wanted to run start_prediction_full.sh and follow all the instructions in the documentation and github readme. the first part presegmentation started and ends and next part cell size estimation failed with just this error :

CELL SIZE ESTIMATION:
ERROR: Error during cell size estimation

after this I went to check presegment output and all I see was black images. what is going wrong ?? How can I find out where is my problem ??

Extracting nucleus size/location data from final output?

I've set up the pipeline to run on my own images, and successfully obtained the final segmentation images. Now, I want to extract the data for nucleus size and location for downstream analysis - ideally in .csv or similar format. I was just wondering if the pipeline includes this data somewhere in the output folders? Thank you!

Question about custom training

Hello,

We have been using the nucleAIzer approach for the last few months to assist in the segmentation of tumor samples. At this point, we wish to perform slight modifications to the network in order to optimize it for our data – primarily due to the presence of dense, blurry nuclei clusters in the sample. We are new to the field of machine learning, and as such were wondering if you could provide some brief guidance on how best to proceed? Specifically –

Would image style transfer, or completely new training, be the best way to proceed here? For both, how much annotated data would generally be required in order to observe a meaningful increase in performance?
Is there some way to manually provide feedback to the network – say, to identify nuclei which are incorrectly segmented – and use this to modify the network so that it is optimized for our situation?

Thank you for developing such a useful tool!

runGenerateValidationCustom.sh

Should I copy my train into $TRAIN_UNET and $TRAIN_MASKRCNN? What about my validation data? (I experimented a few times, but it seems that I have to runGenerateValidationCustom.sh using my validation data and copy both train and validation data into $TRAIN_UNET and $TRAIN_MASKRCNN.)

Yes, copy the train images there. Validation goes in the $VALIDATION folder which will be in kaggle_workflow/outputs/validation by default. Indeed you need to run runGenerateValidationCustom.sh /path/to/validationfolder, the $IMAGES_DIR variable in this script is only used to list the validation images, sorry for the confusing variable names.

Originally posted by @spreka in #14 (comment)

Further elaboration on inputs for neural style transfer

Hi spreka,

I was wondering if you could elaborate a bit more (or point to the code) that covers the following statement from your paper. In your paper you state,

"The input of the discriminator is an image/mask pair (either a real pair, or a synthetically generated pair)."

I have two questions about this:

What proportion of the neural style "synthetic database" were images generated from real image/mask pairs versus synthetically generated pairs?
What do you mean by "synthetically generated"? Sampled from probability distributions? Randomly generated binary encodings? etc? Could you describe this process a bit more or point me to code/portion of the paper that describes this step (I couldn't find it)?

I'm trying to test out your approach, but a bit more clarity on this section of the paper would be of great help.

Thanks!
Tom O

Error during pre-segmentation

I get the following error when I try to run start_prediction_full.bat:

File "C:\Code\Python3\biomagdsb\FinalModel\segmentation.py", line 119, in Segment
temp, _, _ = cv2.findContours(temp, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE)
ValueError: not enough values to unpack (expected 3, got 2)
ERROR: "Error during pre-segmentation"

Error in training UNET

hi , again.
I can run prediction_full and other scripts in your repo. but I got this error for start_training.sh file for training UNET.

CELL SIZE ESTIMATION ON VALIDATION DONE
INIT UNET ENVIRONMENT:
INIT UNET ENVIRONMENT DONE
UNET TRAIN MODELS:
UNET TRAIN MODELS:
PREDICT UNet_sigma0.0_1 MODEL:
./kaggle_workflow/unet
./kaggle_workflow/outputs/train_unet
./kaggle_workflow/outputs/test1
./kaggle_workflow/outputs
/media/erfan/3AB81DDCB81D9809/Thesis/Projects/Style_project/main/biomagdsb/venv/lib/python3.7/site-packages/skimage/util/dtype.py:503: UserWarning: Downcasting uint16 to uint8 without scaling because max value 255 fits in uint8
return _convert(image, np.uint8, force_copy)
Traceback (most recent call last):
File "./UNet/train_sh.py", line 58, in
model.train_model(train_dataset, n_epochs=args.epochs, n_batch=args.batch, verbose=args.verbose, validation_dataset=val_dataset)
File "/media/erfan/3AB81DDCB81D9809/Thesis/Projects/Style_project/main/biomagdsb/UNet/wrappers.py", line 54, in train_model
epoch_running_loss += training_loss.data[0]
IndexError: invalid index of a 0-dim tensor. Use tensor.item() in Python or tensor.item<T>() in C++ to convert a 0-dim tensor to a number
ERROR: Error during unet prediction

Can You help me with this ??
I ran the code on ubuntu 16.04 and matlab 2018a version, if you need.

Thanks

cv2 version 4.x.x incompatibility

Hi, thank you very much for your great work on this and sharing it publically.

I'm having an issue with segmentation.py line 119 to perform pCavityFilling
if pCavityFilling: for i in range(count): temp = cv2.bitwise_not(masks[:, :, i]) temp, _, _ = cv2.findContours(temp, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE) masks[:, :, i] = cv2.bitwise_not(temp)

This returns the following error message on cv2 v4.x.x:
temp, _, _ = cv2.findContours(temp, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE)
ValueError: not enough values to unpack (expected 3, got 2)

In cv2 v3, temp refers to a modified image outputted by the function.
im2, contours, hierarchy = cv.findContours(thresh, cv.RETR_TREE, cv.CHAIN_APPROX_SIMPLE)
https://docs.opencv.org/3.4.6/d4/d73/tutorial_py_contours_begin.html

in cv2 v4, the modified image is no longer an output. Only contours and hierarchy are outputs
contours, hierarchy = cv.findContours(thresh, cv.RETR_TREE, cv.CHAIN_APPROX_SIMPLE)
https://docs.opencv.org/master/d4/d73/tutorial_py_contours_begin.html

Thus, there is no equivalent output from that function to perform pCavityFilling

Network not detecting groups of nuclei

Hello,

I have noticed that the algorithm will fail to detect clusters of nuclei, with the resulting segmentation containing large empty gaps. I have attached two representative images of this situation. Is there any way I can modify the algorithm, or pre-process my image, to prevent this from happening? Thank you!

What Matab tools/libraries are needed?

I wanted to try your pipeline and are about to install MatLab and wonder if there are any tools needed to be installed apart from MatLab itself?

Here is the list of installation options I get:

Product | Size [MB]

Is there a way to return a blank image if no nuclei are detected (invalid prediction)?

My current pipeline involves the generation of many subimages from a larger image, and these subimages are then processed using the segmentation network. However, a few of these subimages invariably do not contain any nuclei, which the segmentation program treats as invalid and does not return anything. Is there a way to make the program return the blank image in this case, so that the segmented subimages can be stitched back together even if there are no nuclei present in a subimage? Thank you so much!

spreka / biomagdsb Goto Github PK

biomagdsb's Introduction

Intro

Prerequisites

Data

Prediction

Full prediction pipeline with post-processing

Fast prediction

Custom validation

Training

Parameter search for post-processing

Prepare style transfer input for single experiment

Preprocess test images

Citation

biomagdsb's People

Contributors

Stargazers

Watchers

Forkers

biomagdsb's Issues

Recommend Projects

Recommend Topics

Recommend Org