Git Product home page Git Product logo

main_loop_tf's People

Contributors

fral92 avatar fvisin avatar marcociccone avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

main_loop_tf's Issues

Swap sess and unhookedsess

People tend to use self.sess in validation, which causes the hooks to be run. When the validation hook is among the hooks this causes an infinite loop.

It's probably better to change self.sess to become self.sess_with_hooks and make self.unhookedsess become self.sess. Will work on it as soon as I can, but happy to accept PRs.

Checkpoints saving best/last model

Checkpoints shouldn't be saved at each minibatch but at the end of each training epoch.
Moreover, the best model and last model should be saved with different names.

training on gpu

I got the memory issue
OOM when allocating tensor with shape[1,60,60,1024] [[Node: gpu0_train/my_model/resnet_v2_101/block3/unit_8/bottleneck_v2/preact/moments/SquaredDifference = SquaredDifference[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](gpu0_train/my_model/resnet_v2_101/block3/unit_7/bottleneck_v2/add, gpu0_train/my_model/resnet_v2_101/block3/unit_8/bottleneck_v2/preact/moments/StopGradient)]]

Which is caused by training on cpu. I changed the devices to gpu but it is still on cpu. Also I changed the with tf.device('/cpu:0') line to with tf.device('/gpu:0'). But nothing changed. How can I make training on one gpu device.
Thanks

IoU is not shown correctly after reloading

As reported in #13 (review) there seems to be a problem with the IoU graph when the training is restarted. It probably has to do with the way we compute the incremental counter for the x-axis of the graph, should be verified.

The only thing I noticed it's that when you reload the parameters and continue the training, the plot of the IoU metrics become a mess

Fix wiggle lines in TB when experiment is resumed

The model checkpoints and tensorboard events are not saved with the same frequency. When the model is reloaded and training resumes this causes the main loop to write new events to the event file that have the same global step (x-coordinate) as some previous event. As a result the TB graph looks weird. This can be fixed by means of using SessionLog messages as suggested in the documentation.

I don't have time to work on this right now, but I'd be happy to review a PR.

[Feature Request] Fix random seed

It could be useful to have the random seed as FLAG parameter in order to have experiment reproducibility. If not set then it could be fixed or random

Expose useful placeholders

As suggested by @marcociccone self.placeholders is not easily comprehensible from the user perspective.

This is required to access the targets that are actually used at validation time:

val_labels = [el['labels'] for el in self.placeholders[False]]
actual_val_labels = tf.concat(val_targets[:self.sym_num_devs])

or more easily use recursive_truncate_dict.

I guess I should:

  1. rename self.placeholders --> self._placeholders or self._per_dev_placeholders
  2. store in self.placeholders the placeholders that can be of some use for the end user

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.