Git Product home page Git Product logo

Comments (10)

erkil1452 avatar erkil1452 commented on August 16, 2024

Hi, I have originally validated the code with the version of pytorch as posted (1.1.0). There has previously been a case of broken predictions due to changes of image normalization filter in the pytorch library. I would first try to run it with the original (bit ancient) versions of the libraries (sea requirements.txt). Generally, the main thing I would check are the data loaders. Are the pixel values scaled properly to a normalized range? Next, I would maybe try to visualize the predictions (plot them to 2d scatter plot) and correlate this with ground truth. Is there a bias or scale error? If correlation is there, it must be some data scaling issue. If there is no correlation, perhaps the checkpoint is not loaded properly. Hope it helps.

from gazecapture.

appleRtsan avatar appleRtsan commented on August 16, 2024

I change the version as requirements.txt shown, got the same results. And I plot the prediction gaze point and ground truth, where 'x' mark means truth and '*' mark means prediction
disaster

I am sure checkpoints loads properly , it just torch.load('checkpoint.pth.tar'), there is no other methods, isn't it? I think that it might be data normalization issue , I know I should use the function SubstractMean in ITrackerData.py, but I don't know how to visualize it to check if it works normally or not.

from gazecapture.

erkil1452 avatar erkil1452 commented on August 16, 2024

The means that we subtract (saved in the mat files provided) are in the 0-255 range. Therefore, depending on the image loader, one needs to either load the images as 0-255, subtract the means and then divide by 255 - OR - load the images as 0-1 and then subtract the (means/255) . If the ranges gets mixed, it will go badly. The same will happen if the input to the network is not normalized. I think in practice the inputs should be a bit unusual range [-0.5, 0.5]. In any case, you can always retrain the network (you can start from our checkpoint) and see if the error goes down.

from gazecapture.

appleRtsan avatar appleRtsan commented on August 16, 2024

I retrained the checkpoint model and the mean validation distance loss still high to 11 cm, but training distance loss is down to 0.4 cm, It seems there is no advantage in this training, even get overfitting. Is there other method to refine it? Really thanks for helping!

By the way, I just use about 210,000 records for train and 30,000 records for test. Does that matter to the bad result?

from gazecapture.

erkil1452 avatar erkil1452 commented on August 16, 2024

Ok, that is a useful observation. So you can decrease the prediction error for the training samples yet the prediction error for the test samples does not go down. What I would do next is to step by step debug what are the differences between training and test passes. Do the input data (shapes, value ranges,...) look the same? How come there is such a difference in the outcome? Can you feed the training data to your test script and get a low prediction error? In theory, the network could of course overfit and learn labels for the training data without any generalization capability, however, most likely there is some simple technical explanation to such weird behavior.

from gazecapture.

appleRtsan avatar appleRtsan commented on August 16, 2024

I tried to split some train data to test but it doesn't work. I have no idea how to call and observe train and test input value ranges. Should I do something in train loader loop?

Do transfer learning works in this issue? If so, which layers should I freeze first? The whole CNN layers?

from gazecapture.

erkil1452 avatar erkil1452 commented on August 16, 2024

I think the issue is that there is some difference between the training and test code. Try training and testing on the exactly same data. Does the train and test error become the same?

from gazecapture.

appleRtsan avatar appleRtsan commented on August 16, 2024

I try your method and I got a weird result : train data and test data MSE loss is 0.84 & 0.7, distance loss is 1.6 & 0. 951 (cm) , test error is less than train error! What's worse, I feed the same data as validation, the MSE loss comes up to 30 and distance loss comes up to 7 cm. I think it is saving checkpoint problem now. How to debug that part? Thanks a lot!!

from gazecapture.

erkil1452 avatar erkil1452 commented on August 16, 2024

It is hard to give any concrete advice in such case. Try to rewrite the code or write a minimum working example that reproduces the behavior. It may just be a tiny little network that takes single floats as input. You can also easily inspect the weights during training, before saving and after loading. Look into pytorch documentation to see how to access weights of any layer. You can then keep printing e.g. the mean and std and see if they stay the same before save and after load.

from gazecapture.

appleRtsan avatar appleRtsan commented on August 16, 2024

I examined the code carefully, and I find when the code starts validating(--sink), there is no command to let checkpoint be loaded. So my parameters in validation is from default, not from checkpoint. I add doLoad = args.sink in main to solve the whole problem. the solution is so easy that I feel really awful.
Really much thanks for your help!

from gazecapture.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.