Following the note in offical tensorflow, Note: when tr

For second point, I referred from the<a href="https://github.com/tensorflow/models/blo

Missing the condition of batch norm update about tensorflow_multigpu_imagenet HOT 9 CLOSED

arashno commented on July 20, 2024

Missing the condition of batch norm update

from tensorflow_multigpu_imagenet.

Comments (9)

arashno commented on July 20, 2024

Thanks for catching this.
I agree. It should be like that.
Although it shouldn't have a significant effect in practice.
What do you mean by "it has error that it is not iteration"?

from tensorflow_multigpu_imagenet.

John1231983 commented on July 20, 2024

Hi. Thanks for your comment. I have added it and it show some error

    with tf.control_dependencies(batchnorm_updates_op):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3782, in control_dependencies
    return get_default_graph().control_dependencies(control_inputs)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3510, in control_dependencies
    for c in control_inputs:
TypeError: 'Operation' object is not iterable

Thanks

from tensorflow_multigpu_imagenet.

arashno commented on July 20, 2024

I see.
Have you tried putting the parameter in brackets([])?

from tensorflow_multigpu_imagenet.

John1231983 commented on July 20, 2024

You mean with tf.control_dependencies([batchnorm_updates_op]):. It worked now. Thanks.

One more thing, I have checked your implementation again and it may be missing the collection moving average variables_averages_op such as

variable_averages = tf.train.ExponentialMovingAverage(self.conf.MOVING_AVERAGE_DECAY, global_step)
variables_averages_op = variable_averages.apply(tf.trainable_variables())
batchnorm_updates_op = tf.group(*batchnorm_updates)
with tf.control_dependencies([batchnorm_updates_op]):
      train_op = tf.group(apply_gradient_op, batchnorm_updates_op,variables_averages_op)

Do you think so?

from tensorflow_multigpu_imagenet.

arashno commented on July 20, 2024

Yes, this is exactly what I meant.

What is the purpose of "variables_averages_op"?
I already have moving average and moving sd.

from tensorflow_multigpu_imagenet.

John1231983 commented on July 20, 2024

For second point, I referred from the inception v3 . I do not know the purpose

from tensorflow_multigpu_imagenet.

arashno commented on July 20, 2024

They mentioned the following, so I think it is not necessary.

Track the moving averages of all trainable variables.
Note that we maintain a "double-average" of the BatchNormalization
global statistics. This is more complicated than need be but we employ
this for backward-compatibility with our previous models.

from tensorflow_multigpu_imagenet.

John1231983 commented on July 20, 2024

Thanks. Did you try to run the resnet in multiple and single gpu and compare performance? In my case working in segmentation, the multiple gpu (2gpus) got 68.9%, while 1 gpu achived 73.4%. I guess the problem may come from batch norm layer which may only use bn statistics of 1 tower information, instead of average them

from tensorflow_multigpu_imagenet.

arashno commented on July 20, 2024

Yes,
I have tried that.
When using more GPUs the accuracy is slightly lower.
You can alleviate it by using a little bit more batches per epoch.
I think the batch norm layers do not cause the problem because the values for all the towers converge to the same value fast.
Moreover, many other multi-GPU implementations use the same trick.
Taking the average and synching values between the towers will decrease the performance.

from tensorflow_multigpu_imagenet.

Missing the condition of batch norm update about tensorflow_multigpu_imagenet HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent