Comments (9)
Thanks for catching this.
I agree. It should be like that.
Although it shouldn't have a significant effect in practice.
What do you mean by "it has error that it is not iteration"?
from tensorflow_multigpu_imagenet.
Hi. Thanks for your comment. I have added it and it show some error
with tf.control_dependencies(batchnorm_updates_op):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3782, in control_dependencies
return get_default_graph().control_dependencies(control_inputs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3510, in control_dependencies
for c in control_inputs:
TypeError: 'Operation' object is not iterable
Thanks
from tensorflow_multigpu_imagenet.
I see.
Have you tried putting the parameter in brackets([])?
from tensorflow_multigpu_imagenet.
You mean with tf.control_dependencies([batchnorm_updates_op]):
. It worked now. Thanks.
One more thing, I have checked your implementation again and it may be missing the collection moving average variables_averages_op
such as
variable_averages = tf.train.ExponentialMovingAverage(self.conf.MOVING_AVERAGE_DECAY, global_step)
variables_averages_op = variable_averages.apply(tf.trainable_variables())
batchnorm_updates_op = tf.group(*batchnorm_updates)
with tf.control_dependencies([batchnorm_updates_op]):
train_op = tf.group(apply_gradient_op, batchnorm_updates_op,variables_averages_op)
Do you think so?
from tensorflow_multigpu_imagenet.
Yes, this is exactly what I meant.
What is the purpose of "variables_averages_op"?
I already have moving average and moving sd.
from tensorflow_multigpu_imagenet.
For second point, I referred from the inception v3 . I do not know the purpose
from tensorflow_multigpu_imagenet.
They mentioned the following, so I think it is not necessary.
Track the moving averages of all trainable variables.
Note that we maintain a "double-average" of the BatchNormalization
global statistics. This is more complicated than need be but we employ
this for backward-compatibility with our previous models.
from tensorflow_multigpu_imagenet.
Thanks. Did you try to run the resnet in multiple and single gpu and compare performance? In my case working in segmentation, the multiple gpu (2gpus) got 68.9%, while 1 gpu achived 73.4%. I guess the problem may come from batch norm layer which may only use bn statistics of 1 tower information, instead of average them
from tensorflow_multigpu_imagenet.
Yes,
I have tried that.
When using more GPUs the accuracy is slightly lower.
You can alleviate it by using a little bit more batches per epoch.
I think the batch norm layers do not cause the problem because the values for all the towers converge to the same value fast.
Moreover, many other multi-GPU implementations use the same trick.
Taking the average and synching values between the towers will decrease the performance.
from tensorflow_multigpu_imagenet.
Related Issues (20)
- Alexnet Convolution layer filter size different. HOT 1
- Evaluating Densenet Model HOT 1
- Cannot evaluate on DenseNet HOT 2
- std::bad_alloc HOT 1
- Loss cannot convergent HOT 4
- No module named 'utils' HOT 1
- pre-trained model HOT 1
- can I have your old version of training? HOT 1
- VGG acc always keeps 0 HOT 3
- It seems computation does not run on GPU HOT 11
- tensorflow.python.framework.errors_impl.PermissionDeniedError: resnet50; Permission denied HOT 2
- when evaluate the trained model,the accuracy is not changing ,just as following,whta's wrong
- run slowly,my code run very slowly,gpu use rate is to low HOT 2
- Beginners ask for help
- Sure muti-gpus? HOT 1
- got a “None values not supported” error for densenet121 training. HOT 2
- A bug? HOT 6
- Using vgg or googlenet results in Value Error HOT 4
- A little doubt about implementation of Googlenet HOT 1
- Ask for a pre-trained ImageNet model HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensorflow_multigpu_imagenet.