google / in-silico-labeling Goto Github PK

Code for In silico labeling: Predicting fluorescent labels in unlabeled images

License: Apache License 2.0

Python 98.09% Starlark 1.91%

in-silico-labeling's Introduction

In silico labeling: Predicting fluorescent labels in unlabeled images

This is the code for In silico labeling: Predicting fluorescent labels in unlabeled images. It is the result of a collaboration between Google Accelerated Science and two external labs: the Lee Rubin lab at Harvard and the Steven Finkbeiner lab at Gladstone. See also our blog post and our full dataset, including many predictions we couldn't fit in the paper.

This code in this repository can be used to run training and inference of our model on a single machine, and can be adapted for distributed training. It also contains a set of weights created by training the model on the data in Conditions A, B, C, and D from the paper.

This README will explain how to:

Restore the model from the provided checkpoint and run inference on an image from our test set.
Train the pre-trained model on a new dataset (Condition E).

Please note, the example in this README deviates from the way transfer learning is demonstrated in the paper. In the paper, the model is trained once on all datasets, including Condition E; this is also called multi-task learning. In this guide, we're starting with a pre-trained model on Conditions A, B, C, and D, and incrementally training it on Condition E. The benefit of the latter technique is it requires less time to learn a new task, but it is likely to overfit if overtrained (see below).

The model here differs from the model described in the paper in one significant way: this model does an initial linear projection to 16 features per pixel, where the model in the paper does not have this step. This allows the model to take as input z-stacks with varying numbers of z depths; the only thing that needs to be relearned is the initial linear projection. Here, we'll use it to take full advantage of the 26 z depths in Condition E (the other conditions have 13 z depths).

You can also find the complete set of preprocessed data used for the paper, along with predictions, here.

This is not an official Google product.

Disclaimers

This code is not being developed

This code exists primarily to let readers reproduce some of the results in the Cell paper. We welcome pull requests, but sadly don't have the time to make improvements ourselves. If you want an easy-to-use, state-of-the-art, image-to-image predictor for your microscopy problem, you should probably look elsewhere.

This code is not suitable for de novo training

This code does CPU fine-tuning on a single machine, while for the paper we used a distributed system consisting of many CPU workers. It would take a long time to train from scratch on a single workstation, and in any case training on CPUs is a much worse option now that it used to be - for a network like this it would make more sense to train on a GPU or TPU.

Installation

Code

The code requires Python 3 and depends on NumPy, TensorFlow, and OpenCV. You can install TensorFlow using this guide, which will also install NumPy. If you follow that guide you will have installed pip3 and you'll be able to install OpenCV with the command: pip3 install --upgrade opencv-python.

We're using Bazel to build the code and run tests. Though not strictly necessary, we suggest you install it (the rest of this README will assume you have it).

This code has been tested in Debian 10 with TensorFlow 1.9 on a machine with 64 GB RAM. It is not optimized for memory use, and has been reported to fail with a Python MemoryError on a machine with 16 GB RAM. Based on tests with ulimit, you may be able to squeak by with 32 GB RAM, but some users have reported needing 64 GB.

Update on code installation (2020 October)

As of 2020 October, TensorFlow 1.9 is no longer available for installation via pip. However, TensorFlow 1.15.4 is still available, and it appears to be compatible with the code. To install it, you can do something like the following:

Create a virtualenv environment for Python 3.7, which is the latest version for which TensorFlow 1.15.4 is available via pip: virtualenv -p python3.7 my_venv. Then source ./my_venv/bin/activate.
Install the Python packages: pip install --upgrade pip, pip install tensorflow==1.15.4, and pip install --upgrade opencv-python.
Install Bazel as described above.

Data

We'll work with the data sample in data_sample.zip (Warning: 2 GB download) and a checkpoint from the pre-trained model in checkpoints.zip. If you have gsutil installed, you can also use the commands:

gsutil -m cp gs://in-silico-labeling/checkpoints.zip .
gsutil -m cp gs://in-silico-labeling/data_sample.zip .

The rest of this README will assume you've downloaded and extracted these archives to checkpoints/ and data_sample/.

Running the pre-trained model.

Training and inference are controlled by the script isl/launch.py. We recommend you invoke the script with Bazel (see below), because it will handle dependency management for you.

The checkpoints.zip file contains parameters from a model trained on Conditions A, B, C, and D. To run the model on a sample from Condition B, run this command from the project root:

export BASE_DIRECTORY=/tmp/isl
bazel run isl:launch -- \
  --alsologtostderr \
  --base_directory $BASE_DIRECTORY \
  --mode EVAL_EVAL \
  --metric INFER_FULL \
  --stitch_crop_size 1500 \
  --restore_directory $(pwd)/checkpoints \
  --read_pngs \
  --dataset_eval_directory $(pwd)/data_sample/condition_b_sample \
  --infer_channel_whitelist DAPI_CONFOCAL,MAP2_CONFOCAL,NFH_CONFOCAL

If you get a syntax error, make sure you're using Python 3, not Python 2.

In the above:

BASE_DIRECTORY is the working directory for the model. It will be created if it doesn't already exist, and it's where the model predictions will be written. You can set it to whatever you want.
alsologtostderr will cause progress information to be printed to the terminal.
stitch_crop_size is the size of the crop for which we'll perform inference. If set to 1500 it may take an hour on a single machine, so try smaller numbers first.
infer_channel_whitelist is the list of fluorescence channels we wish to infer. For the Condition B data, this should be a subset of DAPI_CONFOCAL, MAP2_CONFOCAL, and NFH_CONFOCAL.

If you run this command, you should get a target_error_panel.png that looks like this:

Each row is one of the whitelisted channels you provided; in this case it's one row for each of the DAPI_CONFOCAL, MAP2_CONFOCAL, and NFH_CONFOCAL channels. The boxes with the purple borders show the predicted images (in this case the medians of the per-pixel distributions). The boxes with the teal borders show the true fluorescence images. The boxes with the black borders show errors, with false positives in orange and false negatives in blue.

The script will also generate a file called input_error_panel.png, which shows the 26 transmitted light input images along with auto-encoding predictions and errors. For this condition, there were only 13 z-depths, so this visualization will show each z-depth twice.

Training the pre-trained model on a new dataset.

Condition E contains a cell type not previously seen by the model (human cancer cells), imaged with a transmitted light modality not previously seen (DIC), and labeled with a marker not previously seen (CellMask). We have two wells in our sample data (B2 and B3), so let's use B2 for training and B3 for evaluation.

To see how well the model can predict labels on the evaluation dataset before training, run:

export BASE_DIRECTORY=/tmp/isl
bazel run isl:launch -- \
  --alsologtostderr \
  --base_directory $BASE_DIRECTORY \
  --mode EVAL_EVAL \
  --metric INFER_FULL \
  --stitch_crop_size 1500 \
  --restore_directory $(pwd)/checkpoints \
  --read_pngs \
  --dataset_eval_directory $(pwd)/data_sample/condition_e_sample_B3 \
  --infer_channel_whitelist DAPI_CONFOCAL,CELLMASK_CONFOCAL \
  --noinfer_simplify_error_panels

This should produce this target error panel:

This is like the error panels above, but the first row is DAPI_CONFOCAL and the second is CELLMASK_CONFOCAL. And because we used noinfer_simplify_error_panels it includes more statistics of the pixel distribution. Previously, there was one purple-bordered box which showed the medians of the pixel distributions. Now there are four purple-bordered boxes which show, in order, the mode, median, mean, and standard deviation. There are now three boxes with black borders, showing the same error visualization as before, but for the mode and mean as well as the median. The white-bordered boxes are a new kind of error visualization, in which colors on the grayline between black and white correspond to correct predictions, orange corresponds to a false positive, and blue corresponds to a false negative. The final mango-bordered box is not informative for this exercise.

The pre-trained model hasn't seen the Condition E dataset, so we should expect its predictions to be poor. Note, however, there is some transfer of the nuclear label even before training.

You can find the input images, consisting of a z-stack of 26 images, here (Warning: 100 MB download).

Training

We can train the network on the Condition E data on a single machine using a command like:

export BASE_DIRECTORY=/tmp/isl
bazel run isl:launch -- \
  --alsologtostderr \
  --base_directory $BASE_DIRECTORY \
  --mode TRAIN \
  --metric LOSS \
  --master "" \
  --restore_directory $(pwd)/checkpoints \
  --read_pngs \
  --dataset_train_directory $(pwd)/data_sample/condition_e_sample_B2

By default, this uses the ADAM optimizer with a learning rate of 1e-4. If you wish to visualize training progress, you can run TensorBoard on BASE_DIRECTORY:

tensorboard --logdir $BASE_DIRECTORY

You should eventually see a training curve that looks like this:

After 50,000 steps, which takes about a week on a 32-core machine, predictions on the eval data should have substantially improved. You can run this command to generate predictions:

export BASE_DIRECTORY=/tmp/isl
bazel run isl:launch -- \
  --alsologtostderr \
  --base_directory $BASE_DIRECTORY \
  --mode EVAL_EVAL \
  --metric INFER_FULL \
  --stitch_crop_size 1500 \
  --read_pngs \
  --dataset_eval_directory $(pwd)/data_sample/condition_e_sample_B3 \
  --infer_channel_whitelist DAPI_CONFOCAL,CELLMASK_CONFOCAL \
  --noinfer_simplify_error_panels

Note, we've dropped the restore_directory argument, so the model will run inference using the latest checkpoint it finds in BASE_DIRECTORY.

Here's what the predictions should look like on the evaluation data:

Note, there is a bug in the normalization of the DAPI_CONFOCAL channel causing it to have a reduced dynamic range in the ground-truth image. Comparing the initial nuclear predictions with these, it is clear the network learned to match the reduced dynamic range.

For reference, here's what predictions should look like on the train data:

Note, if we train too long the model will eventually overfit on the train data and predictions will worsen. This was not an issue in the paper, because there we simultaneously trained on all tasks, so that each task regularized the others.

Citing the code

If you use this code, please cite our paper:

Christiansen E, Yang S, Ando D, Javaherian A, Skibinski G, Lipnick S, Mount E, O'Neil A, Shah K, Lee A, Goyal P, Fedus W, Poplin R, Esteva A, Berndl M, Rubin L, Nelson P, Finkbeiner S. In silico labeling: Predicting fluorescent labels in unlabeled images. Cell. 2018

BibTeX:

@article{christiansen2018isl,
  title={In silico labeling: Predicting fluorescent labels in unlabeled images},
  author={Christiansen, Eric M and Yang, Samuel J and Ando, D Michael and Javaherian, Ashkan and Skibinski, Gaia and Lipnick, Scott and Mount, Elliot and O’Neil, Alison and Shah, Kevan and Lee, Alicia K and Goyal, Piyush and Fedus, William and Poplin, Ryan and Esteva, Andre and Berndl, Marc and Rubin, Lee L and Nelson, Philip and Finkbeiner, Steven},
  journal={Cell},
  year={2018},
  publisher={Elsevier}
}

TODOs

Fix the tests.
Fix the DAPI_CONFOCAL normalization bug for the Condition E data. Note: This bug was introduced after data was generated for the paper.

in-silico-labeling's People

Contributors

Stargazers

Watchers

Forkers

gregjohnso neomatrixcode ylq-127 ml-lab baifengbai jacke121 therealnyg8 luoyuan1984 ieee820 tonydeep locussam zpeng1989 charliechou1001 ultrai a7032018 shubhampachori12110095 frankccb inigo0178 george-wu509 vadally weissrose wzhangneu mxylin sreekanthkura7 chrissem addabbyjin alam-shahul simurgh818 calicocat-117 richardxie1119 samuelschen hengma1001 tangmingwei sike-ma mrhaohaoliu wandaoming henley13 yanxiliang1991 buraktango minh-doan huangshaoguang niklastr swarchal qifuliang matthew-mcateer liurita simonlsp michalstepniewski hefv57 gwaybio csmolnar bme-yang anu-bioinfo rkwarner-github liuwenhaha cai-xvkun zixin-yong zarayang naitojunpei apradhan prosetta670 lmroger frannecki neotim zhongkey99 haixing-hu keepersecond d-dark-man jacobllie rfechtner isabella232 huangyq89 alexshevelkin python-repository-hub test-mass-forker-org-1 elijahahianyo ayushsomani001 iq-scm strategist922 ghas-results yunque1026 wqq333

in-silico-labeling's Issues

Trying to create a new label

Hi,

Hope you are doing well!
I got this when trying to create a new label:
KeyError: 'GFAP_CONFOCAL'

I see the channels you have are ('channel', ('DAPI_CONFOCAL', 'DAPI_WIDEFIELD', 'CELLMASK_CONFOCAL', 'TUJ1_WIDEFIELD', 'NFH_CONFOCAL', 'MAP2_CONFOCAL', 'ISLET_WIDEFIELD', 'DEAD_CONFOCAL', 'NEURITE_CONFOCAL')),

Is there a way to add new ones?
Thanks!

typecheck.Error

I am running the code on AWS r5a.xlarge machine with
| Deep Learning AMI (Ubuntu) Version 14.0 - ami-0089d61bf6a518044-- | --

The machine has 32GB RAM and 4 vCPUs.

Running on
tensorflow installed through conda
(I guess python3.6 is used).
tensorflow.contrib.labeled_tensor.python.ops._typecheck.Error: ['batch', 'row', 'column', ('z', 0.0), ('channel', 'TRANSMISSION'), 'class'] of type <class 'list'> is not an instance of the allowed type Collection(Union((str,), Tuple((str,), collections.abc.Hashable))) for <function expand_dims at 0x7fa78c2bb840>

std::bad_alloc in tensorflow

I am running the code on AWS r5a.xlarge machine with
| Deep Learning AMI (Ubuntu) Version 14.0 - ami-0089d61bf6a518044-- | --

The machine has 32GB RAM and 4 vCPUs.
I run into:

2019-05-17 16:24:17.915477: W tensorflow/core/framework/allocator.cc:124] Allocation of 3328000000 exceeds 10% of system memory.

terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc

and the running seems to stall.

output flag

I'm trying to the network on a new image and I was wondering if there's a flag that could only get the entire predicted output image on its own. I tried changing the mode from -EVAL_EVAL to EXPORT but it gave me the same result (a cropped form of the predicted image along with the error image, which is not what I'm looking for).

Run on GPU

Is there any way I could run this repo on GPU instead of CPU for some speed-up?

ResourceExhaustedError at training with GPU

environment:

OS Platform and Distribution: Linux Ubuntu 16.04
TensorFlow installed from : pip
TensorFlow version:Tensorflow_gpu_1.4.0
Python version: 3.6.4
CUDA version:8.0.61
Cudnn version: 6.0.21
GPU model and memory: two or eight of Tesla K40c(11GB memory)

I made a docker container and tested a training scripts with GPU or CPU as below:

python isl/launch.py --alsologtostderr --base_directory . --mode TRAIN --metric LOSS --master "" --restore_directory checkpoints2 --read_pngs --dataset_train_directory data_sample/condition_e_sample_B2 --preprocess_batch_size 2 --preprocess_shuffle_batch_num_threads 2 --preprocess_batch_capacity 8 --loss_crop_size 260

When I used CPU, there are no issues to train.
When I used GPU, training started but stopped with

Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1323, in _do_call
    return fn(*args)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1302, in _run_fn
    status, run_metadata)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2,218,12,12]
	 [[Node: setup_losses/get_input_target_and_predicted/model/concordance_core/downscale_4_3/expand_rv2/Conv/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](setup_losses/get_input_target_and_predicted/model/concordance_core/downscale_4_3/expand_rv2/concat, downscale_4_3/expand_rv2/Conv/weights/read)]]
	 [[Node: Adam/update/_18502 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_40035_Adam/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py", line 954, in managed_session
    yield sess
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 763, in train
    sess, train_op, global_step, train_step_kwargs)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 487, in train_step
    run_metadata=run_metadata)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
    options, run_metadata)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2,218,12,12]
	 [[Node: setup_losses/get_input_target_and_predicted/model/concordance_core/downscale_4_3/expand_rv2/Conv/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](setup_losses/get_input_target_and_predicted/model/concordance_core/downscale_4_3/expand_rv2/concat, downscale_4_3/expand_rv2/Conv/weights/read)]]
	 [[Node: Adam/update/_18502 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_40035_Adam/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'setup_losses/get_input_target_and_predicted/model/concordance_core/downscale_4_3/expand_rv2/Conv/Conv2D', defined at:
  File "isl/launch.py", line 624, in <module>
    app.run()
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "isl/launch.py", line 606, in main
    train(gitapp)
  File "isl/launch.py", line 426, in train
    total_loss_op, _, _ = total_loss(gitapp)
  File "isl/launch.py", line 402, in total_loss
    input_loss_lts, target_loss_lts = controller.setup_losses(gitapp)
  File "/tmp/isl/controller.py", line 399, in setup_losses
    predict_target_lt) = get_input_target_and_predicted(gitapp)
  File "/tmp/isl/controller.py", line 263, in get_input_target_and_predicted
    gitapp.core_model, gitapp.add_head, pp, gitapp.is_train, input_lt)
  File "/tmp/isl/tensorcheck.py", line 179, in new_f
    return f(*new_args, **kwds)
  File "/tmp/isl/controller.py", line 173, in model
    core_model_op = core_model(is_train=is_train, input_op=input_op, name=name)
  File "/tmp/isl/tensorcheck.py", line 179, in new_f
    return f(*new_args, **kwds)
  File "/tmp/isl/models/concordance.py", line 155, in core
    scale_op = foveate(lls[4][8], scale_op, 'downscale_4_3')
  File "/tmp/isl/models/concordance.py", line 129, in foveate
    op, name)
  File "/tmp/isl/models/model_util.py", line 341, in learned_fovea
    name=name)
  File "/tmp/isl/tensorcheck.py", line 179, in new_f
    return f(*new_args, **kwds)
  File "/tmp/isl/models/model_util.py", line 270, in module
    name='expand_rv2')
  File "/tmp/isl/tensorcheck.py", line 179, in new_f
    return f(*new_args, **kwds)
  File "/tmp/isl/models/model_util.py", line 103, in residual_v2_conv
    conv_op = slim.conv2d()
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
    return func(*args, **current_args)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1033, in convolution
    outputs = layer.apply(inputs)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 671, in apply
    return self.__call__(inputs, *args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 575, in __call__
    outputs = self.call(inputs, *args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/layers/convolutional.py", line 167, in call
    outputs = self._convolution_op(inputs, self.kernel)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 835, in __call__
    return self.conv_op(inp, filter)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 499, in __call__
    return self.call(inp, filter)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 187, in __call__
    name=self.name)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 631, in conv2d
    data_format=data_format, name=name)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[2,218,12,12]
	 [[Node: setup_losses/get_input_target_and_predicted/model/concordance_core/downscale_4_3/expand_rv2/Conv/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](setup_losses/get_input_target_and_predicted/model/concordance_core/downscale_4_3/expand_rv2/concat, downscale_4_3/expand_rv2/Conv/weights/read)]]
	 [[Node: Adam/update/_18502 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_40035_Adam/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "isl/launch.py", line 624, in <module>
    app.run()
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "isl/launch.py", line 606, in main
    train(gitapp)
  File "isl/launch.py", line 482, in train
    saver=tf.train.Saver(keep_checkpoint_every_n_hours=2.0),
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 775, in train
    sv.stop(threads, close_summary_writer=True)
  File "/opt/conda/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py", line 964, in managed_session
    self.stop(close_summary_writer=close_summary_writer)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py", line 792, in stop
    stop_grace_period_secs=self._stop_grace_secs)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "/opt/conda/lib/python3.6/site-packages/six.py", line 693, in reraise
    raise value
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/training/queue_runner_impl.py", line 238, in _run
    enqueue_callable()
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1231, in _single_operation_run
    target_list_as_strings, status, None)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
	 [[Node: setup_losses/get_input_target_and_predicted/provide_preprocessed_data/cropped_input_and_target/load_image_set_as_tensor/PyFunc_86/_17999 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_723_setup_losses/get_input_target_and_predicted/provide_preprocessed_data/cropped_input_and_target/load_image_set_as_tensor/PyFunc_86", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
	 [[Node: setup_losses/get_input_target_and_predicted/provide_preprocessed_data/setup_losses/get_input_target_and_predicted/provide_preprocessed_data/target_1/_18083 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1319_setup_losses/get_input_target_and_predicted/provide_preprocessed_data/setup_losses/get_input_target_and_predicted/provide_preprocessed_data/target_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1323, in _do_call
    return fn(*args)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1302, in _run_fn
    status, run_metadata)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2,218,12,12]
	 [[Node: setup_losses/get_input_target_and_predicted/model/concordance_core/downscale_4_3/expand_rv2/Conv/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](setup_losses/get_input_target_and_predicted/model/concordance_core/downscale_4_3/expand_rv2/concat, downscale_4_3/expand_rv2/Conv/weights/read)]]
	 [[Node: Adam/update/_18502 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_40035_Adam/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py", line 954, in managed_session
    yield sess
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 763, in train
    sess, train_op, global_step, train_step_kwargs)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 487, in train_step
    run_metadata=run_metadata)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
    options, run_metadata)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2,218,12,12]
	 [[Node: setup_losses/get_input_target_and_predicted/model/concordance_core/downscale_4_3/expand_rv2/Conv/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](setup_losses/get_input_target_and_predicted/model/concordance_core/downscale_4_3/expand_rv2/concat, downscale_4_3/expand_rv2/Conv/weights/read)]]
	 [[Node: Adam/update/_18502 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_40035_Adam/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'setup_losses/get_input_target_and_predicted/model/concordance_core/downscale_4_3/expand_rv2/Conv/Conv2D', defined at:
  File "isl/launch.py", line 624, in <module>
    app.run()
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "isl/launch.py", line 606, in main
    train(gitapp)
  File "isl/launch.py", line 426, in train
    total_loss_op, _, _ = total_loss(gitapp)
  File "isl/launch.py", line 402, in total_loss
    input_loss_lts, target_loss_lts = controller.setup_losses(gitapp)
  File "/tmp/isl/controller.py", line 399, in setup_losses
    predict_target_lt) = get_input_target_and_predicted(gitapp)
  File "/tmp/isl/controller.py", line 263, in get_input_target_and_predicted
    gitapp.core_model, gitapp.add_head, pp, gitapp.is_train, input_lt)
  File "/tmp/isl/tensorcheck.py", line 179, in new_f
    return f(*new_args, **kwds)
  File "/tmp/isl/controller.py", line 173, in model
    core_model_op = core_model(is_train=is_train, input_op=input_op, name=name)
  File "/tmp/isl/tensorcheck.py", line 179, in new_f
    return f(*new_args, **kwds)
  File "/tmp/isl/models/concordance.py", line 155, in core
    scale_op = foveate(lls[4][8], scale_op, 'downscale_4_3')
  File "/tmp/isl/models/concordance.py", line 129, in foveate
    op, name)
  File "/tmp/isl/models/model_util.py", line 341, in learned_fovea
    name=name)
  File "/tmp/isl/tensorcheck.py", line 179, in new_f
    return f(*new_args, **kwds)
  File "/tmp/isl/models/model_util.py", line 270, in module
    name='expand_rv2')
  File "/tmp/isl/tensorcheck.py", line 179, in new_f
    return f(*new_args, **kwds)
  File "/tmp/isl/models/model_util.py", line 103, in residual_v2_conv
    conv_op = slim.conv2d()
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
    return func(*args, **current_args)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1033, in convolution
    outputs = layer.apply(inputs)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 671, in apply
    return self.__call__(inputs, *args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 575, in __call__
    outputs = self.call(inputs, *args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/layers/convolutional.py", line 167, in call
    outputs = self._convolution_op(inputs, self.kernel)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 835, in __call__
    return self.conv_op(inp, filter)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 499, in __call__
    return self.call(inp, filter)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 187, in __call__
    name=self.name)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 631, in conv2d
    data_format=data_format, name=name)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[2,218,12,12]
	 [[Node: setup_losses/get_input_target_and_predicted/model/concordance_core/downscale_4_3/expand_rv2/Conv/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](setup_losses/get_input_target_and_predicted/model/concordance_core/downscale_4_3/expand_rv2/concat, downscale_4_3/expand_rv2/Conv/weights/read)]]
	 [[Node: Adam/update/_18502 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_40035_Adam/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "isl/launch.py", line 624, in <module>
    app.run()
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "isl/launch.py", line 606, in main
    train(gitapp)
  File "isl/launch.py", line 482, in train
    saver=tf.train.Saver(keep_checkpoint_every_n_hours=2.0),
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 775, in train
    sv.stop(threads, close_summary_writer=True)
  File "/opt/conda/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py", line 964, in managed_session
    self.stop(close_summary_writer=close_summary_writer)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py", line 792, in stop
    stop_grace_period_secs=self._stop_grace_secs)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "/opt/conda/lib/python3.6/site-packages/six.py", line 693, in reraise
    raise value
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/training/queue_runner_impl.py", line 238, in _run
    enqueue_callable()
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1231, in _single_operation_run
    target_list_as_strings, status, None)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
	 [[Node: setup_losses/get_input_target_and_predicted/provide_preprocessed_data/cropped_input_and_target/load_image_set_as_tensor/PyFunc_86/_17999 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_723_setup_losses/get_input_target_and_predicted/provide_preprocessed_data/cropped_input_and_target/load_image_set_as_tensor/PyFunc_86", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
	 [[Node: setup_losses/get_input_target_and_predicted/provide_preprocessed_data/setup_losses/get_input_target_and_predicted/provide_preprocessed_data/target_1/_18083 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1319_setup_losses/get_input_target_and_predicted/provide_preprocessed_data/setup_losses/get_input_target_and_predicted/provide_preprocessed_data/target_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.