allencellmodeling / pytorch_fnet Goto Github PK

Three dimensional cross-modal image inference

License: Other

Python 98.89% Makefile 1.11%

pytorch_fnet's Introduction

Label-free prediction of three-dimensional fluorescence images from transmitted-light microscopy

Support

This code is in active development and is used within our organization. We are currently not supporting this code for external use and are simply releasing the code to the community AS IS. The community is welcome to submit issues, but you should not expect an active response.

For the code corresponding to our Nature Methods paper, please use the release_1 branch here.

System requirements

We recommend installation on Linux and an NVIDIA graphics card with 12+ GB of RAM (e.g., NVIDIA Titan X Pascal) with the latest drivers installed.

Installation

We recommend an environment manager such as Conda.
Install Python 3.6+ if necessary.
All commands listed below assume the bash shell.
Clone and install the repo:

git clone https://github.com/AllenCellModeling/pytorch_fnet.git
cd pytorch_fnet
pip install .

If you would like to instead install for development:

pip install -e .[dev]

If you want to run the demos in the examples directory:

pip install .[examples]

Demo on Canned AICS Data

This will download some images from our Integrated Cell Quilt repository and start training a model

cd examples
python download_and_train.py

When training is complete, you can predict on the held-out data with

python predict.py

Command-line tool

Once the package is installed, users can train and use models through the fnet command-line tool. To see what commands are available, use the -h flag.

fnet -h

The -h flag is also available for all fnet commands. For example,

fnet train -h

Train a model

Model training is done through the the fnet train command, which requires a json indicating various training parameters. e.g., what dataset to use, where to save the model, how the hyperparameters should be set, etc. To create a template json:

fnet train --json /path/to/train_options.json

Users are expected to modify this json to suit their needs. At a minimum, users should verify the following json fields and change them if necessary:

"dataset_train": The name of the training dataset.
"path_save_dir": The directory where the model will be saved. We recommend that the model be saved in the same directory as the training options json.

Once any modifications are complete, initiate training by repeating the above command:

fnet train --json /path/to/train_options.json

Since this time the json already exists, training should commence.

Perform predictions with a trained model

User can perform predictions using a trained model with the fnet predict command. A path to a saved model and a data source must be specified. For example:

fnet predict --json path/to/predict_options.json

As above, users are expected to modify this json to suit their needs. At a minimum, populate the following fields and/or copy and paste corresponding dataset values from /path/to/train_options.json

e.g.:

    "dataset_kwargs": {
        "col_index": "Index",
        "col_signal": "signal",
        "col_target": "target",
        "path_csv": "path/to/my/train.csv",
    ...
    "path_model_dir": [
        "models/model_0"
    ],
    "path_save_dir": "path/to/predictions/dir",

This will use the model save models/dna to perform predictions on the some.dataset dataset. To see additional command options, use fnet predict -h.

Once any modifications are complete, initiate training by repeating the above command:

fnet predict --json path/to/predict_options.json

Citation

If you find this code useful in your research, please consider citing our manuscript in Nature Methods:

@article{Ounkomol2018,
  doi = {10.1038/s41592-018-0111-2},
  url = {https://doi.org/10.1038/s41592-018-0111-2},
  year  = {2018},
  month = {sep},
  publisher = {Springer Nature America,  Inc},
  volume = {15},
  number = {11},
  pages = {917--920},
  author = {Chawin Ounkomol and Sharmishtaa Seshamani and Mary M. Maleckar and Forrest Collman and Gregory R. Johnson},
  title = {Label-free prediction of three-dimensional fluorescence images from transmitted-light microscopy},
  journal = {Nature Methods}
}

Contact

Gregory Johnson
E-mail: [email protected]

Allen Institute Software License

Allen Institute Software License – This software license is the 2-clause BSD license plus clause a third clause that prohibits redistribution and use for commercial purposes without further permission.
Copyright © 2018. Allen Institute. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Redistributions and use for commercial purposes are not permitted without the Allen Institute’s written permission. For purposes of this license, commercial purposes are the incorporation of the Allen Institute's software into anything for which you will charge fees or other compensation or use of the software to perform a commercial service for a third party. Contact [email protected] for commercial licensing opportunities.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

pytorch_fnet's People

Contributors

Stargazers

Watchers

pytorch_fnet's Issues

Add tqdm library to docker image

...or remove from fnet/data/chunkdataset.py

Add missing datasets

Need to add Myosin IIB, Fibrillarin. Also DIC?

Make save_progress and save_state functions

Create cross modal registration code

takes the csv from #64 and attempts to run automated registration between prediction and MBP image, and compares results to 'ground truth' produces metrics of success for making figures.

3D Dataset object

2D model definition

Freeze Saving/Loading

Make evaluation script

Calculate/report loss of model-predicted images.

Add Readme

3D Data collection as .csv

Specify dimension requirement

It looks like there is a requirement on the minimum number of z slices.

"ValueError: Input array must be at least length 32 in first dimension"

When batch-processing on a large set of images, it would be great to skip the "bad" image and raise a flag, rather than assertion fail.

Unit tests for datasets

Have clear instructions for how to download and run first test dataset on naive machine that works.

FileNotFoundError: [Errno 2] No such file or directory: '/root/projects/pytorch_fnet/data/dna/3500000913_100X_20170519_4-Scene-18-P38-E07.czi'

there is no description about where to get this data.

Terminate if error function reads "NAN"

training code update for dataloader

Add sklearn to docker image

make scripts/paper folder for paper

Problem when running with another dataset

I was trying to use tiff format images obtained after training and prediction (signal.tiff and target.tiff) to train the model instead of using czi format images. I created a new csv file with path_signal and path_target elements and their corresponding paths below.

But when I ran the training part, this error pumped out:
/home/xuecongf/.conda/envs/fnet/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, ma y indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
Using existing train/test split.
model loaded from: saved_models/dna/model.p
fnet_nn_3d | {} | iter: 50011
History loaded from: saved_models/dna/losses.csv
/home/.../.conda/envs/fnet/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
buffering images: 0%| | 0/30 [00:00<?, ?it/s]
<fnet.data.tiffdataset.TiffDataset object at 0x2b3ebbc2cf60>

Traceback (most recent call last):
File "train_model.py", line 161, in
main()
File "train_model.py", line 112, in main
dataloader_train = get_dataloader(n_remaining_iterations, opts)
File "train_model.py", line 31, in get_dataloader
**opts.bpds_kwargs,
File "/home/…/pytorch_fnet/fnet/data/bufferedpatchdataset.py", line 55, in init
datum = dataset[datum_index]
File "/home/…/pytorch_fnet/fnet/data/tiffdataset.py", line 49, in getitem
im_out[0] = t(im_out[0])
File "/home/…/pytorch_fnet/fnet/transforms.py", line 193, in call
return scipy.ndimage.zoom(x, (self.factors), mode='nearest')
File "/home/…/.conda/envs/fnet/lib/python3.6/site-packages/scipy/ndimage/interpolation.py", line 573, in zoom
zoom = _ni_support._normalize_sequence(zoom, input.ndim)
File "/home/…/.conda/envs/fnet/lib/python3.6/site-packages/scipy/ndimage/_ni_support.py", line 65, in _normalize_sequence
raise RuntimeError(err)
RuntimeError: sequence argument must have length equal to input rank

csv&bashscript.zip

TifDataset need to return a torch tensor

error when I try to run ./train_model.sh dna 0

loaded all the data to /data/ folder. and run ./train_model.sh dna 0, got the following error. does anyone know the reason? Thanks,

Using existing train/test split.
DEBUG: Initializing new model!
*** Model ***
fnet.nn_modules.fnet_nn_3d.Net(**{})
iter: 0
gpu: [0]

0%| | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:18<00:00, 18.49s/it]
{'buffer_size': 1, 'npatches': 1200000}
Traceback (most recent call last):
File "train_model.py", line 147, in
main()
File "train_model.py", line 119, in main
args, n_remaining_iterations, validation=True
File "train_model.py", line 22, in get_dataloader
ds = str_to_class(args.dataset_class)(**dataset_kwargs)
File "/local_disk0/fnet/pytorch_fnet-master/fnet/data/czidataset.py", line 13, in init
super().init(**kwargs)
File "/local_disk0/fnet/pytorch_fnet-master/fnet/data/fnetdataset.py", line 45, in init
self.df = pd.read_csv(self.path_csv)
File "/databricks/python/lib/python3.6/site-packages/pandas/io/parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "/databricks/python/lib/python3.6/site-packages/pandas/io/parsers.py", line 424, in _read
filepath_or_buffer, encoding, compression)
File "/databricks/python/lib/python3.6/site-packages/pandas/io/common.py", line 218, in get_filepath_or_buffer
raise ValueError(msg.format(_type=type(filepath_or_buffer)))
ValueError: Invalid file path or buffer object type: <class 'NoneType'>

Make command-line prediction function

Tests for MultiChTiffDataset

Specifically address #109 (comment)

Fix image order dimensions for CziDataset

add me as a collaborator

Check in Dockerfile definition

Change all random seeds to "0"

Chunk Dataset object

@counkomol too

Make Figure 1

handle missing data column

make it so that if an target column dataset is missing that it simply outputs an empty image?

Make figure/movie utils

Remove old code and files

Dataset.py? Anything else?

2D Dataset object

Add one test .tiff and .czi files to repo

Create a csv for registration testing

EM_image, EM_prediction_image, MBP_registration_target,M00,M10,M01,M11,B0,B1
path to EM input image, path to EM prediction image, path to MBP raw IF image, 'ground truth' registration parameters describing how EM prediction image should be transformed to match MBP image.

Verify Multi-GPU training

Docker build doesn't work, build script also not executable

pytorch-cudnnv6 is not an image on dockerhub, just a tag referenced from the pytorch repo. https://github.com/apaszke/pytorch-dist#docker-image. Should be rewritten to just work..

"ValueError" in BufferedPatchDataset

Example error message:

Traceback (most recent call last):
  File "train_model.py", line 206, in <module>
    main()
  File "train_model.py", line 95, in main
    for data in ds_patch:
  File "/root/projects/pytorch_fnet/fnet/data/bufferedpatchdataset.py", line 61, in __getitem__
    return self.get_random_patch()
  File "/root/projects/pytorch_fnet/fnet/data/bufferedpatchdataset.py", line 81, in get_random_patch
    starts = np.array([np.random.randint(0, d-p) for d, p in zip(datum[0].size(), self.patch_size)])
  File "/root/projects/pytorch_fnet/fnet/data/bufferedpatchdataset.py", line 81, in <listcomp>
    starts = np.array([np.random.randint(0, d-p) for d, p in zip(datum[0].size(), self.patch_size)])
  File "mtrand.pyx", line 988, in mtrand.RandomState.randint
ValueError: low >= high

Move BufferedPatchDataset and dummy chunk dataset to their own files.

Syntax eror in test script

Hello there, we are experiencing a syntax error while testing the test script. Do you guys have any ideas? Thank you very much in advance
File "scripts/train_model.py", line 3
DATASET = ${1:-dna}
^
SyntaxError: invalid syntax
---- script:
#!/bin/bash -x

DATASET=${1:-dna}
N_ITER=50000
RUN_DIR="saved_models/${DATASET}"
PATH_DATASET_ALL_CSV="data/csvs/${DATASET}.csv"
PATH_DATASET_TRAIN_CSV="data/csvs/${DATASET}/train.csv"
GPU_IDS=${2:-0}

make pre prediction cropping, and post prediction pad with zero

make test time dataset that has cropping and padding options that make the prediction the same size as the output.

test time model input modification options... pad/crop width and height parameters (can be negative to crop, positive to pad), default is to crop down to the nearest 16 option.
at test time output .. reverse operation of test time modifications.

modify dataset object to have these crop options/choices recorded, modify dataset class to have function to reverse this operation.

needs to be done before #62

Update prediction script to work with many models

Move *.py files (except setup.py) into fnet directory

Make pip installable

Make datasets loadable from command line

2D data collection as .csv

output logs should contain relative paths

pip install doesn't install submodules

pip install git+https://github.com/AllenCellModeling/pytorch_fnet.git will install the base modules but does not install fnet.nn_modules for instance.

import fnet works and imports the module and the defined imports from its __init__.py

import fnet.nn_modules fails due to two things:

setup.py needs to use packages=find_packages(exclude=['doc/*', 'docker/*', 'data/*', 'scripts/*', 'tests/*'])
Create an __init__.py file for the nn_modules submodule, and for all submodules you want to include in the final package structured as so:

from . import fnet_nn_2d
from . import fnet_nn_3d

Discovered while trying to implement something similar to the predict.py script in the base directory.