Git Product home page Git Product logo

Comments (17)

lukeyeager avatar lukeyeager commented on May 13, 2024

I'm not sure I have enough information to help you. I'm not saying it couldn't be a bug in DIGITS, but it certainly sounds like you may have set up your datasets incorrectly. Can you give me some more detail on how you created your dataset? I assume that you've looked at the instructions here?

from digits.

Dezmon avatar Dezmon commented on May 13, 2024

I looked over the instructions and have run both examples of LeNet on MNIST and AlexNet on imagenet data, and they worked the way they are supposed to.

I'm happy to believe it is me, but something does seem off with the accuracy reported. I have image sequences (medical, video like, all very similar in appearance) the sequences are labeled as class one or two. I split the sequences into a training and val group (half in each). Finally the sequences are split into individual frames and these are my images. Images in the training set are strictly from training sequences and images in the validation set are strictly from the validation sequence.

I guess my issue boils down to after training, the DIGITS graphs is showing 100% accuracy but when I test individual images from the validation set I get mis classifications.

Update: I Have now been running with a batch size of 1 as opposed to 'default' and so far training looks a lot more normal (ie it is not converging yet, val loss is effectively constant, training loss is oscillating, and accuracy is a little over 50%) maybe this related to have a large N compared to the number of classes?

from digits.

lukeyeager avatar lukeyeager commented on May 13, 2024

maybe this related to have a large N compared to the number of classes?

That does seem to make sense. Let me know if that fixes the problem for you - I could add a check to save others from this problem in the future.

something does seem off with the accuracy reported

I agree. I'll try to look into it. Hopefully today, if I find the time.

from digits.

Dezmon avatar Dezmon commented on May 13, 2024

Great, thank you. I won't have a result with my data for a couple days. But I can also try an reproduce the problem with a sub-set of the imageNet data if that would be helpful.

from digits.

drozdvadym avatar drozdvadym commented on May 13, 2024

@Dezmon
are you training network with the same sizes (for example 256x256)?

from digits.

Dezmon avatar Dezmon commented on May 13, 2024

Yes, the images get scaled/padded to 256x256 same as with the ImageNet data. Though I am relying on DIGITS to do that for me, but it works fine from the ImageNet JPG's (and when I test single images from my data). So I'm assuming that part is ok.

from digits.

lukeyeager avatar lukeyeager commented on May 13, 2024

Hmm, I may have messed up my data manipulation for testing somewhere (see BVLC/caffe#2255). I'll get back to you on this.

from digits.

lukeyeager avatar lukeyeager commented on May 13, 2024

@Dezmon, I just upgraded to a newer version of caffe and changed the way that I do image preprocessing. Will you upgrade your DIGITS and NVIDIA/caffe installations, and then see if that fixes the issue for you?

from digits.

Dezmon avatar Dezmon commented on May 13, 2024

I did a fresh install and build of Caffe (NVIDAs) and DIGITS, I rebuilt the dataset from the raw PNGs and I'm still seeing the problem. That said I think it has todo with batch size (and some lack of basic understanding on my part). With batch sizes other than network default I get very different behavior, mostly the loss explodes which is not a great outcome but at least is understandable.

digits

from digits.

thatguymike avatar thatguymike commented on May 13, 2024

Can you post your DB build page with the distribution of classes? This looks like an overfit problem of some sort.

from digits.

Dezmon avatar Dezmon commented on May 13, 2024

Sure I'll post it and yes the data is a little unbalanced, but how would overfitting drive the validation accuracy to 100% and the validation loss to 0? Training I would understand but validation doesn't make sense to me, but I am new to ML and open to suggestions.

I have now reproduced it with ImageNet Data. I took two classes from the full data set (n04404412 and n04409515) which are separated into training (2600) and test (26) directories. I trained using defaults for alexnet and got the same behavior. 100% accuracy shown but testing individual image from the validation directory give mixed results.

Here is the training for the two class imagenet data:
twoimagenet

Here is my data:
databalance

from digits.

thatguymike avatar thatguymike commented on May 13, 2024

That is a tiny amount of data for a pretty large network. You are going to overfit quickly. Better would be to attempt finetuning from a fully trained. More importantly, you have a TINY amount of validation images, generally we shoot for >10%, more like 25% of the number of training images.

from digits.

Dezmon avatar Dezmon commented on May 13, 2024

Hi Mike, the little set was just to show another example of the problem. So it could be reproduced on a standard image set.

For my data (the first training plot in this thread) I am using 22,838 training images and 25,549 validation image (much more than 25%) per class. Is this still a tiny amount of data? I thought ImageNet used only 1,000 per class

from digits.

thatguymike avatar thatguymike commented on May 13, 2024

That should be working better. My hunch is still that you are overfitting your data. ImageNet has ~1000 images per class, but ~1.2M base training images. Still, I would expect better performance. We for example, we have taking Pascal VOC crops and trained on those from scratch successfully, but starting from a pretrained network trained on AlexNet/CaffeNet does produce better overall results.

Let's look at batch sizes and learning rate carefully. Generally if you mess with the batch size you also need to adjust your learning rate and decays. Alex K talks about this in Section 5 of his "One Weird Trick" paper.

Still, you are getting high traning accuracy and your 2 loss curves look correct. Your training and validation sets have no overlap in samples, correct?

from digits.

Dezmon avatar Dezmon commented on May 13, 2024

That is correct. I'm not expecting this to work all that well and I'll add a lot more data when I get my understanding of the tools worked out a little better. I'm just trying to figure out the high reported accuracy and subsequent poor single image performance for prediction. I will re-read his paper.

from digits.

Dezmon avatar Dezmon commented on May 13, 2024

You are correct, adjusting the learning rate up by his recommended sqrt(k) gives very different network performance (exploding training loss, woohoo :/ ). Should I close this? Since the problem appears to only comes up with pathologically un/under-trained networks that for some reason are reporting low validation loss and 100% accuracy.

Thank you both very much for your help.

from digits.

lukeyeager avatar lukeyeager commented on May 13, 2024

No problem. I'll look into handling the learning rate vs. batch size adjustment automatically in the future.

from digits.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.