Git Product home page Git Product logo

Comments (9)

arashno avatar arashno commented on July 20, 2024

Thanks for catching this.
I agree. It should be like that.
Although it shouldn't have a significant effect in practice.
What do you mean by "it has error that it is not iteration"?

from tensorflow_multigpu_imagenet.

John1231983 avatar John1231983 commented on July 20, 2024

Hi. Thanks for your comment. I have added it and it show some error

    with tf.control_dependencies(batchnorm_updates_op):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3782, in control_dependencies
    return get_default_graph().control_dependencies(control_inputs)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3510, in control_dependencies
    for c in control_inputs:
TypeError: 'Operation' object is not iterable

Thanks

from tensorflow_multigpu_imagenet.

arashno avatar arashno commented on July 20, 2024

I see.
Have you tried putting the parameter in brackets([])?

from tensorflow_multigpu_imagenet.

John1231983 avatar John1231983 commented on July 20, 2024

You mean with tf.control_dependencies([batchnorm_updates_op]):. It worked now. Thanks.

One more thing, I have checked your implementation again and it may be missing the collection moving average variables_averages_op such as

variable_averages = tf.train.ExponentialMovingAverage(self.conf.MOVING_AVERAGE_DECAY, global_step)
variables_averages_op = variable_averages.apply(tf.trainable_variables())
batchnorm_updates_op = tf.group(*batchnorm_updates)
with tf.control_dependencies([batchnorm_updates_op]):
      train_op = tf.group(apply_gradient_op, batchnorm_updates_op,variables_averages_op)

Do you think so?

from tensorflow_multigpu_imagenet.

arashno avatar arashno commented on July 20, 2024

Yes, this is exactly what I meant.

What is the purpose of "variables_averages_op"?
I already have moving average and moving sd.

from tensorflow_multigpu_imagenet.

John1231983 avatar John1231983 commented on July 20, 2024

For second point, I referred from the inception v3 . I do not know the purpose

from tensorflow_multigpu_imagenet.

arashno avatar arashno commented on July 20, 2024

They mentioned the following, so I think it is not necessary.

Track the moving averages of all trainable variables.
Note that we maintain a "double-average" of the BatchNormalization
global statistics. This is more complicated than need be but we employ
this for backward-compatibility with our previous models.

from tensorflow_multigpu_imagenet.

John1231983 avatar John1231983 commented on July 20, 2024

Thanks. Did you try to run the resnet in multiple and single gpu and compare performance? In my case working in segmentation, the multiple gpu (2gpus) got 68.9%, while 1 gpu achived 73.4%. I guess the problem may come from batch norm layer which may only use bn statistics of 1 tower information, instead of average them

from tensorflow_multigpu_imagenet.

arashno avatar arashno commented on July 20, 2024

Yes,
I have tried that.
When using more GPUs the accuracy is slightly lower.
You can alleviate it by using a little bit more batches per epoch.
I think the batch norm layers do not cause the problem because the values for all the towers converge to the same value fast.
Moreover, many other multi-GPU implementations use the same trick.
Taking the average and synching values between the towers will decrease the performance.

from tensorflow_multigpu_imagenet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.