Comments (17)
I'm not sure I have enough information to help you. I'm not saying it couldn't be a bug in DIGITS, but it certainly sounds like you may have set up your datasets incorrectly. Can you give me some more detail on how you created your dataset? I assume that you've looked at the instructions here?
from digits.
I looked over the instructions and have run both examples of LeNet on MNIST and AlexNet on imagenet data, and they worked the way they are supposed to.
I'm happy to believe it is me, but something does seem off with the accuracy reported. I have image sequences (medical, video like, all very similar in appearance) the sequences are labeled as class one or two. I split the sequences into a training and val group (half in each). Finally the sequences are split into individual frames and these are my images. Images in the training set are strictly from training sequences and images in the validation set are strictly from the validation sequence.
I guess my issue boils down to after training, the DIGITS graphs is showing 100% accuracy but when I test individual images from the validation set I get mis classifications.
Update: I Have now been running with a batch size of 1 as opposed to 'default' and so far training looks a lot more normal (ie it is not converging yet, val loss is effectively constant, training loss is oscillating, and accuracy is a little over 50%) maybe this related to have a large N compared to the number of classes?
from digits.
maybe this related to have a large N compared to the number of classes?
That does seem to make sense. Let me know if that fixes the problem for you - I could add a check to save others from this problem in the future.
something does seem off with the accuracy reported
I agree. I'll try to look into it. Hopefully today, if I find the time.
from digits.
Great, thank you. I won't have a result with my data for a couple days. But I can also try an reproduce the problem with a sub-set of the imageNet data if that would be helpful.
from digits.
@Dezmon
are you training network with the same sizes (for example 256x256)?
from digits.
Yes, the images get scaled/padded to 256x256 same as with the ImageNet data. Though I am relying on DIGITS to do that for me, but it works fine from the ImageNet JPG's (and when I test single images from my data). So I'm assuming that part is ok.
from digits.
Hmm, I may have messed up my data manipulation for testing somewhere (see BVLC/caffe#2255). I'll get back to you on this.
from digits.
@Dezmon, I just upgraded to a newer version of caffe and changed the way that I do image preprocessing. Will you upgrade your DIGITS and NVIDIA/caffe installations, and then see if that fixes the issue for you?
from digits.
I did a fresh install and build of Caffe (NVIDAs) and DIGITS, I rebuilt the dataset from the raw PNGs and I'm still seeing the problem. That said I think it has todo with batch size (and some lack of basic understanding on my part). With batch sizes other than network default I get very different behavior, mostly the loss explodes which is not a great outcome but at least is understandable.
from digits.
Can you post your DB build page with the distribution of classes? This looks like an overfit problem of some sort.
from digits.
Sure I'll post it and yes the data is a little unbalanced, but how would overfitting drive the validation accuracy to 100% and the validation loss to 0? Training I would understand but validation doesn't make sense to me, but I am new to ML and open to suggestions.
I have now reproduced it with ImageNet Data. I took two classes from the full data set (n04404412 and n04409515) which are separated into training (2600) and test (26) directories. I trained using defaults for alexnet and got the same behavior. 100% accuracy shown but testing individual image from the validation directory give mixed results.
Here is the training for the two class imagenet data:
from digits.
That is a tiny amount of data for a pretty large network. You are going to overfit quickly. Better would be to attempt finetuning from a fully trained. More importantly, you have a TINY amount of validation images, generally we shoot for >10%, more like 25% of the number of training images.
from digits.
Hi Mike, the little set was just to show another example of the problem. So it could be reproduced on a standard image set.
For my data (the first training plot in this thread) I am using 22,838 training images and 25,549 validation image (much more than 25%) per class. Is this still a tiny amount of data? I thought ImageNet used only 1,000 per class
from digits.
That should be working better. My hunch is still that you are overfitting your data. ImageNet has ~1000 images per class, but ~1.2M base training images. Still, I would expect better performance. We for example, we have taking Pascal VOC crops and trained on those from scratch successfully, but starting from a pretrained network trained on AlexNet/CaffeNet does produce better overall results.
Let's look at batch sizes and learning rate carefully. Generally if you mess with the batch size you also need to adjust your learning rate and decays. Alex K talks about this in Section 5 of his "One Weird Trick" paper.
Still, you are getting high traning accuracy and your 2 loss curves look correct. Your training and validation sets have no overlap in samples, correct?
from digits.
That is correct. I'm not expecting this to work all that well and I'll add a lot more data when I get my understanding of the tools worked out a little better. I'm just trying to figure out the high reported accuracy and subsequent poor single image performance for prediction. I will re-read his paper.
from digits.
You are correct, adjusting the learning rate up by his recommended sqrt(k) gives very different network performance (exploding training loss, woohoo :/ ). Should I close this? Since the problem appears to only comes up with pathologically un/under-trained networks that for some reason are reporting low validation loss and 100% accuracy.
Thank you both very much for your help.
from digits.
No problem. I'll look into handling the learning rate vs. batch size adjustment automatically in the future.
from digits.
Related Issues (20)
- How to upload pretrained model with digits:20.03-tensorflow-py3
- Isn't the dataset parallelized?
- Convert YOLO Labels to KITTI Labels
- AWS DIGITS: ERROR: ValueError: invalid literal for int() with base 10
- Failed to build Caffe on Xavier NX
- KITTI Data Trains in TF but not in Digits
- DIGITS OBJECT DETECT DATASET TRAINING ERROR HOT 1
- map validation
- How to change digits default gpu number '0'?
- DIGITS 6 docker - Can't train MNIST demo
- Nvidia A100 & Caffe Container HOT 1
- where to download Digit 21.2 ?
- Building Caffe with CUDA
- Detectnet, Multiple class object detection with an imbalanced dataset, Poor results (mAP) for one class
- Module Creation erros
- cannot see detectnet bounding boxes using Caffe model on Nano
- Is there a way to backup the entire database? HOT 1
- I'm confused between which version of DIGITS to install
- Inference HOT 7
- DIGITS DOCKET CONTAINER INSTALLING SUNNY PLUGIN HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from digits.