Git Product home page Git Product logo

Comments (11)

soloice avatar soloice commented on May 24, 2024

I just updated my code to allow passing batch_size as a command line parameter, tested it, and observed an accuracy of 99.4% when using batch_size=1 in inference phase, which coincides with the results obtained from batch_size 50 or 100.

In principle, batch size should not affect model's performance in inference phase, and my experiments agree with the theory. You can play with my new code to verify it yourself.

If you mean you use batch size = 1 in both training phase and inference phase, I think it will not work. Because during training, BN layer collects batch statistics (e.g. mean and variance), so a large batch helps the model to converge, while a batch of size 1 provides no statistics and doesn't help at all.

But if you use a large batch size in training phase and use batch size = 1 only in inference phase, that's fine. Because in inference phase, no batch statistics is used (instead, model use estimated population statistics collected in training phase), therefore you can feed either a piece of data or a bunch of data, and the results should be the same.

from mnist-bn.

thunguyenphuoc avatar thunguyenphuoc commented on May 24, 2024

Hello,

I realised my problem is not with the batch size per se, but the way I handle the scope/reuse variable.
My problem is that training works fine, but testing is horrible when I set "is_training" to false. However, when I set it to True during testing, it works, but I think it only uses the moving mean and variance inside the test batch.

Some other people also mention the significance of the param "Reuse" in the batch norm function (False in training and True in Testing) but I don't see you deal with that in your code?

Sorry for asking a lot of questions but I couldn't find any coherent tutorial on using batchnorm in TF.....

Thank you :)

from mnist-bn.

soloice avatar soloice commented on May 24, 2024

You shouldn't set is_training to True during test, because it'll use statistics collected from test data instead of moving mean/variance.

I didn't use reuse parameter, because I only set up one model. When testing it on new data, I start a new process and reload model parameters. But if you use multiple models, e.g.: one for training, one for validation, one for testing, and make them share parameters, you'll need to deal with reuse parameter carefully.

from mnist-bn.

thunguyenphuoc avatar thunguyenphuoc commented on May 24, 2024

I agree that I should not use is_training=True during testing, but setting it to False just generates exactly the same wrong samples, despite me lowering down the decay to 0.9 like discussed too.

I also set up me training_op like yours:
`optimizer = tf.train.AdamOptimizer(learning_rate=e_eta,beta1=0.5)
global_step = tf.Variable(0, trainable=False)
train_op = slim.learning.create_train_op(BCE_loss, optimizer, global_step = global_step)

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)#variables of EMA from batch normalisation
if update_ops:
updates = tf.group(*update_ops)
BCE_loss = control_flow_ops.with_dependencies([updates], BCE_loss)`

and during training:
train,_, bceLoss = sess.run([train_op, update_ops, BCE_loss], feed_dict={real_image_in: batch_images, is_training: True})

I have been fighting with this for more than weeks now and it is getting a bit too frustrating :(

from mnist-bn.

soloice avatar soloice commented on May 24, 2024

What's your batch size during training?

from mnist-bn.

thunguyenphuoc avatar thunguyenphuoc commented on May 24, 2024

100 during training VS 1 during testing, so not crazy I guess?

from mnist-bn.

soloice avatar soloice commented on May 24, 2024

Ummm, that should be fine in principle...

from mnist-bn.

soloice avatar soloice commented on May 24, 2024

Hi, friend,
I read through your previous reply again, and noticed you mentioned setting it to False just generates exactly the same wrong samples. I guess this is suggesting you are using batch statistics rather than estimated population statistics. Because your batch size is 1, you'll always normalize the input image to have zero mean and unit variance, so the outputs are the same regardless of your input image.

For a more detailed discussion, see section Making predictions with the model in this blog post.

from mnist-bn.

thunguyenphuoc avatar thunguyenphuoc commented on May 24, 2024

Hello,

Thank you so much for your help. Sorry for the late reply, I have been away from my laptop.
I discovered that blog post around the same time with you, and I think that is that main cause.
I decided to restructure my code and am writing from scratch, just to make sure I did not make any silly but evil mistake. I will keep you posted :)

Thanks again:)

from mnist-bn.

soloice avatar soloice commented on May 24, 2024

Good luck, bro! Waiting for your good news!

from mnist-bn.

Yu-Wu avatar Yu-Wu commented on May 24, 2024

@thunguyenphuoc
Hi, I have met the same problem with you before. Could you give some ideas on how it fixed?

from mnist-bn.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.