Git Product home page Git Product logo

biomagdsb's Introduction

Intro

This repository contains the codes to run the nuclei segmentation pipeline of the BIOMAG group inspired by Kaggle's Data Science Bowl 2018 competition

Some resulting masks obtained by our method:

picture

Prerequisites

Please see requirements.txt that can also be run as a bash script (Linux) or alternatively, you can copy the install commands to console corresponding to your system (command prompt (Windows) / terminal (Linux)) and execute them.

  • Install CUDA 9.0 and CuDNN 7.0 as well as MATLAB* (Release 2017a or later) appropriate for your system. Currently, Linux and Windows implementation is provided.

    *: MATLAB is not required for fast prediction.

  • MATLAB* toolboxes used by the repository are:

    • Image Processing Toolbox
    • Parallel Computing Toolbox
    • Statistics and Machine Learning Toolbox
    • Curve Fitting Toolbox
    • Global Optimization Toolbox
    • Optimization Toolbox
  • See requirements.txt for python packages to install.

  • Download Matterport's Mask R-CNN github repository or clone directly with git and revert to the commit our method uses:

	git clone https://github.com/matterport/Mask_RCNN.git
	git checkout 53afbae5c5159b5a10ecd024a72b883a2b058314
  • You will need to set the path of your cloned Mask R-CNN folder in the scripts below
  • See documentation in .pdf on how to use functionalities of our pipeline

Data

Our method expects images to be 8-bit 3 channels RGB images in .png format. See our script to convert your images.

Prediction

Download our pre-trained models from our google drive

  • Make sure you have mask_rcnn_coco.h5, mask_rcnn_presegmentation.h5 and mask_rcnn_final.h5 in the folder \kaggle_workflow\maskrcnn\model\
  • Make sure you have UNet_sigma0.0_1\UNet_sigma0.0_1 and the other U-Net models in the folder \kaggle_workflow\unet\

You can choose either full prediction with post-processing or fast prediction; the former takes longer to complete and requires more VRAM.

Full prediction pipeline with post-processing

Predicts nuclei first with a presegmenter Mask R-CNN model, estimates cell sizes, predicts with multiple U-Net models and ensembles the results, then uses all of the above in a final post-processing step to refine the contours. To predict nuclei on images please edit either

  • start_prediction_full.bat (Windows) or
  • start_prediction_full.sh (Linux)

and specify the following 3 directories with their corresponding full paths on your system:

  • Mask R-CNN
  • root_dir
  • images_dir

Note: pre-processing scripts are provided to convert your test images. See further details in the documentation.

Fast prediction

Predicts nuclei with a presegmenter Mask R-CNN model that generalizes and performs well in varying image types. Produces fast results that can be improved with the post-processing option above. To predict fast: Please follow the steps of "PREDICTION WITH POST-PROCESSING" section for either of the files:

  • start_prediction_fast.bat (Windows) or
  • start_prediction_fast.sh (Linux)

See further details in the documentation.

Custom validation

To use your custom folder of images as validation please run the following script according to your operating system:

  • runGenerateValidationCustom.bat (Windows)
  • runGenerateValidationCustom.sh (Linux)

See further details in the documentation.

Training

Obtain our pre-trained classifier pretrainedDistanceLearner.mat for training by either:

  • Downloading it from our google drive. Make sure you have pretrainedDistanceLearner.mat in the folder \kaggle_workflow\inputs\clustering\

    or

  • Installing Git LFS (Large File Storage) by following the instructions on their installation guide or their github page according to your operating system. Make sure you set it up after installation:

     	git lfs install
    

WARNING: it is possible to overwrite our provided trained models in this step. See documentation for details.

We include a .mat file with the validation image names we used for the Kaggle DSB2018 competition. If you would like to use your own images for this pupose, see Custom validation above.

WARNING: training will override the U-Net models we provide, we advise you make a copy of them first from the following relative path: \kaggle_workflow\unet\

To train on your own images please run the following script according to your operating system:

  • start_training.bat (Windows)
  • start_training.sh (Linux)

NOTE: for Windows you need to edit start_training.bat and set your python virtual environment path as indicated prior to running the script. It will open a second command prompt for necessary server running of pix2pix and must remain open until all pix2pix code execution is finished - which is indicated by the message "STYLE TRANSFER DONE:" in command prompt.

See further details in the documentation.

Parameter search for post-processing

A generally optimal set of parameters are provided in the scripts as default. However, you can run our parameter optimizer to best fit to your image set.

To find the most optimal parameters please run the following script according to your operating system:

  • start_parameterSearch.bat (Windows)
  • start_parameterSearch.sh (Linux)

and see the found parameters in the text file \kaggle_workflow\outputsValidation\paramsearch\paramsearchresult.txt

See further details in the documentation.

Prepare style transfer input for single experiment

To prepare style transfer on your own images coming from the same experiment please run the following script according to your operating system:

  • start_singleExperimentPreparation.bat (Windows)
  • start_singleExperimentPreparation.sh (Linux)

After this you are ought to run these training scripts instead of the ones above:

  • start_training_singleExperiment.bat (Windows)
  • start_training_singleExperiment.sh (Linux)

as these scripts would use the single experiment data for style transfer learning.

WARNING: If you do not provide your own mask folder for this step the default option will be \kaggle_workflow\outputs\presegment which is created by the fast segmentation step of our pipeline. Please run it prior to this step to avoid 'file not found' errors.

NOTE: This option should only be used if all your images come from the same experiment. If you provide mixed data, subsequent style transfer learning will result in flawed models and failed synthetic images.

Preprocess test images

If your test images are 16-bit you may want to convert them to 8-bit 3 channel images with either

  • start_image_preprocessing.bat (Windows)
  • start_image_preprocessing.sh (Linux)

Citation

Please cite our paper if you use our method:

Reka Hollandi, Abel Szkalisity, Timea Toth, Ervin Tasnadi, Csaba Molnar, Botond Mathe, Istvan Grexa, Jozsef Molnar, Arpad Balind, Mate Gorbe, Maria Kovacs, Ede Migh, Allen Goodman, Tamas Balassa, Krisztian Koos, Wenyu Wang, Juan Carlos Caicedo, Norbert Bara, Ferenc Kovacs, Lassi Paavolainen, Tivadar Danka, Andras Kriston, Anne Elizabeth Carpenter, Kevin Smith, Peter Horvath (2020): “nucleAIzer: a parameter-free deep learning framework for nucleus segmentation using image style transfer”, Cell Systems, Volume 10, Issue 5, 20 May 2020, Pages 453-458.e6

biomagdsb's People

Contributors

spreka avatar szkabel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

biomagdsb's Issues

Very Different output from github code and online API on same image to detect nuceli

Hi,
Thank you for sharing code for the opensource community.
I am trying to modify your source code as per my need but I am seeing very weird behavior.
I am trying to detect nuclei boundary using - https://www.nucleaizer.org/ and GitHub source code- start_prediction_full.bat for the same image.

I have few queries in this context. Please answer them-

  • Why there is so much different in output? Are you using the different trained model in online and GitHub file ( downloaded from google drive as suggested on the website)

  • what should be image correct image size to run the script? I was running online API using 3000x3000 / 1500x1500 images and all of them were failing. But, now I am trying with 512x512 and it is working.

Where should I put the images and masks if I am training the model using my own dataset?

I have read the codes "run_workflow_trainOnly.sh" and "start_training.sh".

I am confused about $IMAGES_DIR, $ORIGINAL_DATA, $TEST1, $TRAIN_UNET, $TRAIN_MASKRCNN.

After many trials and errors, I am able to run "start_training.sh" without error when I did the following:

  1. Split my data into train, val and test into some folders
  2. runGenerateValidationCustom.sh using my val data
  3. copy both train and val data into $TRAIN_UNET and $TRAIN_MASKRCNN
  4. copy test data into $TEST1
  5. copy a few images out of test into $IMAGES_DIR

However, I am not sure this is what I should be doing.

Why do I need $IMAGES_DIR?

I am using your original $CLUSTER_CONFIG. Is that right?

training after style transfer

Hi , again.
you wrote in the readme file that (single style transfer section):
After this you are ought to run these training scripts instead of the ones above:

start_prediction_fast.bat (Windows)
start_prediction_fast.sh (Linux)

but, these scripts are for predictions. they should not be train scripts ?? or am I missing something ?
as I know we should train the models again with new augmented data after style transfer. correct me if I'm wrong.

No provided Mask RCNN and black output

Hello,

I've tried running the start_prediction_full.sh script but I got an error from the python script saying ModuleNotFoundError: No module named 'config'. So I downloaded the Mask RCNN folder from matterport and pasted the required files into ./FinalModel. The script runs fine after that. However, the output I get is always a completely black image.

I've tried to run the detection on the image you see below (its resolution is 256 x 256) as well as the three included test images. The console states that the count of nuclei is greater than 0 but in the output folder under "postprocessing" as well as "presegment" I only find one tiff image per input image that is completely black.

I've tried using only files from the Mask RCNN that are tagged with v2.1 (as that's the version you seem to be using) which did nothing. I've also tried replacing the skimage.io.imsave(...) line with cv2.imwrite(...) but no luck with that either.

he-002-01-01-0

Error for single channel grayscale images

If I run start_prediction_full.bat on a single channel grayscale image I get an error:
File "C:\Code\Python3\biomagdsb\UNet\utils.py", line 264, in getitem
image = io.imread(image_path)[:, :, :3]
IndexError: too many indices for Array

and later:
Error using imread>get_full_filename (line 481)
File "C:\Code\Python3\biomagdsb\kaggle_workflow\outputs\ensemble\output\BE_300_01_rescaled.png" does not exist.

Error in imread (line 344)
filename = get_full_filename(fid, errmsg, filename);

Error in postProcCodeFINAL (line 104)
probMap = (imread([probFolder filesep imageList(i).name(1:end-5) '.png']));

Error in postProcCodeRunnerFINAL (line 94)
postProcCodeFINAL(inMainFolder,outMainFolder,...

(It works if I convert the image to RGB before starting)

Error in cell size estimation and getting black images as presegmentation ends

Hi, I Wanted to run start_prediction_full.sh and follow all the instructions in the documentation and github readme. the first part presegmentation started and ends and next part cell size estimation failed with just this error :

CELL SIZE ESTIMATION:
ERROR: Error during cell size estimation

after this I went to check presegment output and all I see was black images. what is going wrong ?? How can I find out where is my problem ??

Extracting nucleus size/location data from final output?

I've set up the pipeline to run on my own images, and successfully obtained the final segmentation images. Now, I want to extract the data for nucleus size and location for downstream analysis - ideally in .csv or similar format. I was just wondering if the pipeline includes this data somewhere in the output folders? Thank you!

Question about custom training

Hello,

We have been using the nucleAIzer approach for the last few months to assist in the segmentation of tumor samples. At this point, we wish to perform slight modifications to the network in order to optimize it for our data – primarily due to the presence of dense, blurry nuclei clusters in the sample. We are new to the field of machine learning, and as such were wondering if you could provide some brief guidance on how best to proceed? Specifically –

  1. Would image style transfer, or completely new training, be the best way to proceed here? For both, how much annotated data would generally be required in order to observe a meaningful increase in performance?
  2. Is there some way to manually provide feedback to the network – say, to identify nuclei which are incorrectly segmented – and use this to modify the network so that it is optimized for our situation?

Thank you for developing such a useful tool!

runGenerateValidationCustom.sh

Should I copy my train into $TRAIN_UNET and $TRAIN_MASKRCNN? What about my validation data? (I experimented a few times, but it seems that I have to runGenerateValidationCustom.sh using my validation data and copy both train and validation data into $TRAIN_UNET and $TRAIN_MASKRCNN.)

Yes, copy the train images there. Validation goes in the $VALIDATION folder which will be in kaggle_workflow/outputs/validation by default. Indeed you need to run runGenerateValidationCustom.sh /path/to/validationfolder, the $IMAGES_DIR variable in this script is only used to list the validation images, sorry for the confusing variable names.

Originally posted by @spreka in #14 (comment)

Further elaboration on inputs for neural style transfer

Hi spreka,

I was wondering if you could elaborate a bit more (or point to the code) that covers the following statement from your paper. In your paper you state,

"The input of the discriminator is an image/mask pair (either a real pair, or a synthetically generated pair)."

I have two questions about this:

  1. What proportion of the neural style "synthetic database" were images generated from real image/mask pairs versus synthetically generated pairs?
  2. What do you mean by "synthetically generated"? Sampled from probability distributions? Randomly generated binary encodings? etc? Could you describe this process a bit more or point me to code/portion of the paper that describes this step (I couldn't find it)?

I'm trying to test out your approach, but a bit more clarity on this section of the paper would be of great help.

Thanks!
Tom O

Error during pre-segmentation

I get the following error when I try to run start_prediction_full.bat:

File "C:\Code\Python3\biomagdsb\FinalModel\segmentation.py", line 119, in Segment
temp, _, _ = cv2.findContours(temp, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE)
ValueError: not enough values to unpack (expected 3, got 2)
ERROR: "Error during pre-segmentation"

Error in training UNET

hi , again.
I can run prediction_full and other scripts in your repo. but I got this error for start_training.sh file for training UNET.

CELL SIZE ESTIMATION ON VALIDATION DONE
INIT UNET ENVIRONMENT:
INIT UNET ENVIRONMENT DONE
UNET TRAIN MODELS:
UNET TRAIN MODELS:
PREDICT UNet_sigma0.0_1 MODEL:
./kaggle_workflow/unet
./kaggle_workflow/outputs/train_unet
./kaggle_workflow/outputs/test1
./kaggle_workflow/outputs
/media/erfan/3AB81DDCB81D9809/Thesis/Projects/Style_project/main/biomagdsb/venv/lib/python3.7/site-packages/skimage/util/dtype.py:503: UserWarning: Downcasting uint16 to uint8 without scaling because max value 255 fits in uint8
return _convert(image, np.uint8, force_copy)
Traceback (most recent call last):
File "./UNet/train_sh.py", line 58, in
model.train_model(train_dataset, n_epochs=args.epochs, n_batch=args.batch, verbose=args.verbose, validation_dataset=val_dataset)
File "/media/erfan/3AB81DDCB81D9809/Thesis/Projects/Style_project/main/biomagdsb/UNet/wrappers.py", line 54, in train_model
epoch_running_loss += training_loss.data[0]
IndexError: invalid index of a 0-dim tensor. Use tensor.item() in Python or tensor.item<T>() in C++ to convert a 0-dim tensor to a number
ERROR: Error during unet prediction

Can You help me with this ??
I ran the code on ubuntu 16.04 and matlab 2018a version, if you need.

Thanks

cv2 version 4.x.x incompatibility

Hi, thank you very much for your great work on this and sharing it publically.

I'm having an issue with segmentation.py line 119 to perform pCavityFilling
if pCavityFilling: for i in range(count): temp = cv2.bitwise_not(masks[:, :, i]) temp, _, _ = cv2.findContours(temp, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE) masks[:, :, i] = cv2.bitwise_not(temp)

This returns the following error message on cv2 v4.x.x:
temp, _, _ = cv2.findContours(temp, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE)
ValueError: not enough values to unpack (expected 3, got 2)

In cv2 v3, temp refers to a modified image outputted by the function.
im2, contours, hierarchy = cv.findContours(thresh, cv.RETR_TREE, cv.CHAIN_APPROX_SIMPLE)
https://docs.opencv.org/3.4.6/d4/d73/tutorial_py_contours_begin.html

in cv2 v4, the modified image is no longer an output. Only contours and hierarchy are outputs
contours, hierarchy = cv.findContours(thresh, cv.RETR_TREE, cv.CHAIN_APPROX_SIMPLE)
https://docs.opencv.org/master/d4/d73/tutorial_py_contours_begin.html

Thus, there is no equivalent output from that function to perform pCavityFilling

Network not detecting groups of nuclei

Hello,

I have noticed that the algorithm will fail to detect clusters of nuclei, with the resulting segmentation containing large empty gaps. I have attached two representative images of this situation. Is there any way I can modify the algorithm, or pre-process my image, to prevent this from happening? Thank you!

image

image

What Matab tools/libraries are needed?

I wanted to try your pipeline and are about to install MatLab and wonder if there are any tools needed to be installed apart from MatLab itself?

Here is the list of installation options I get:

Product | Size [MB]

MATLAB 9.6 | 4095
Simulink 9.3 | 6335
5G Toolbox 1.1 | 16
Aerospace Blockset 4.1 | 164
Aerospace Toolbox 3.1 | 211
Antenna Toolbox 4.0 | 1008
Audio Toolbox 2.0 | 291
Automated Driving Toolbox 2.0 | 521
AUTOSAR Blockset 2.0 | 256
Bioinformatics Toolbox 4.12 | 541
Communications Toolbox 7.1 | 1481
Computer Vision Toolbox 9.0 | 2030
Control System Toolbox 10.6 | 796
Curve Fitting Toolbox 3.5.9 | 52
Database Toolbox 9.1 | 48
Datafeed Toolbox 5.8.1 | 14
Deep Learning Toolbox 12.1 | 804
DSP System Toolbox 9.8 | 434
Econometrics Toolbox 5.2 | 176
Embedded Coder 7.2 | 326
Filter Design HDL Coder 3.1.5 | 13
Financial Instruments Toolbox 2.9 | 111
Financial Toolbox 5.13 | 122
Fixed-Point Designer 6.3 | 340
Fuzzy Logic Toolbox 2.5 | 44
Global Optimization Toolbox 4.1 | 48
HDL Coder 3.14 | 356
Image Acquisition Toolbox 6.0 | 61
Image Processing Toolbox 10.4 | 2877
Instrument Control Toolbox 4.0 | 36
LTE HDL Toolbox 1.3 | 42
LTE Toolbox 3.1 | 99
Mapping Toolbox 4.8 | 515
MATLAB Coder 4.2 | 76
MATLAB Compiler 7.0.1 | 602
MATLAB Compiler SDK 6.6.1 | 129
MATLAB Report Generator 5.6 | 71
Mixed-Signal Blockset 1.0 | 60
Model Predictive Control Toolbox 6.3 | 38
Optimization Toolbox 8.3 | 179
Parallel Computing Toolbox 7.0 | 950
Partial Differential Equation Toolbox 3.2 | 117
Phased Array System Toolbox 4.1 | 142
Powertrain Blockset 1.5 | 178
Predictive Maintenance Toolbox 2.0 | 104
Reinforcement Learning Toolbox 1.0 | 293
RF Blockset 7.2 | 279
RF Toolbox 3.6 | 69
Risk Management Toolbox 1.5 | 22
Robotics System Toolbox 2.2 | 490
Robust Control Toolbox 6.6 | 63
Sensor Fusion and Tracking Toolbox 1.1 | 225
SerDes Toolbox 1.0 | 21
Signal Processing Toolbox 8.2 | 1867
SimBiology 5.8.2 | 94
SimEvents 5.6 | 68
Simscape 4.6 | 122
Simscape Driveline 2.16 | 37
Simscape Electrical 7.1 | 450
Simscape Fluids 2.6 | 68
Simscape Multibody 6.1 | 723
Simulink 3D Animation 8.2 | 106
Simulink Check 4.3 | 158
Simulink Coder 9.1 | 252
Simulink Control Design 5.3 | 120
Simulink Coverage 4.3 | 37
Simulink Design Optimization 3.6 | 138
Simulink Design Verifier 4.1 | 53
Simulink Desktop Real-Time 5.8 | 94
Simulink Report Generator 5.6 | 46
Simulink Requirements 1.3 | 66
Simulink Test 3.0 | 260
Stateflow 10.0 | 119
Statistics and Machine Learning Toolbox 11.5 | 704
Symbolic Math Toolbox 8.3 | 770
System Composer 1.0 | 94
System Identification Toolbox 9.10 | 67
Text Analytics Toolbox 1.3 | 212
Trading Toolbox 3.5.1 | 12
Vehicle Dynamics Blockset 1.2 | 125
Wavelet Toolbox 5.2 | 118
WLAN Toolbox 2.1 | 21

Is there a way to return a blank image if no nuclei are detected (invalid prediction)?

My current pipeline involves the generation of many subimages from a larger image, and these subimages are then processed using the segmentation network. However, a few of these subimages invariably do not contain any nuclei, which the segmentation program treats as invalid and does not return anything. Is there a way to make the program return the blank image in this case, so that the segmented subimages can be stitched back together even if there are no nuclei present in a subimage? Thank you so much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.