Hi, I use your model at github (checkpoint.pth) as validation, and I calculate the dis

Errors in data ( gaze distance ) about gazecapture HOT 10 CLOSED

appleRtsan commented on August 16, 2024

Errors in data ( gaze distance )

from gazecapture.

Comments (10)

erkil1452 commented on August 16, 2024

Hi, I have originally validated the code with the version of pytorch as posted (1.1.0). There has previously been a case of broken predictions due to changes of image normalization filter in the pytorch library. I would first try to run it with the original (bit ancient) versions of the libraries (sea requirements.txt). Generally, the main thing I would check are the data loaders. Are the pixel values scaled properly to a normalized range? Next, I would maybe try to visualize the predictions (plot them to 2d scatter plot) and correlate this with ground truth. Is there a bias or scale error? If correlation is there, it must be some data scaling issue. If there is no correlation, perhaps the checkpoint is not loaded properly. Hope it helps.

from gazecapture.

appleRtsan commented on August 16, 2024

I change the version as requirements.txt shown, got the same results. And I plot the prediction gaze point and ground truth, where 'x' mark means truth and '*' mark means prediction

I am sure checkpoints loads properly , it just torch.load('checkpoint.pth.tar'), there is no other methods, isn't it? I think that it might be data normalization issue , I know I should use the function SubstractMean in ITrackerData.py, but I don't know how to visualize it to check if it works normally or not.

from gazecapture.

erkil1452 commented on August 16, 2024

The means that we subtract (saved in the mat files provided) are in the 0-255 range. Therefore, depending on the image loader, one needs to either load the images as 0-255, subtract the means and then divide by 255 - OR - load the images as 0-1 and then subtract the (means/255) . If the ranges gets mixed, it will go badly. The same will happen if the input to the network is not normalized. I think in practice the inputs should be a bit unusual range [-0.5, 0.5]. In any case, you can always retrain the network (you can start from our checkpoint) and see if the error goes down.

from gazecapture.

appleRtsan commented on August 16, 2024

I retrained the checkpoint model and the mean validation distance loss still high to 11 cm, but training distance loss is down to 0.4 cm, It seems there is no advantage in this training, even get overfitting. Is there other method to refine it? Really thanks for helping!

By the way, I just use about 210,000 records for train and 30,000 records for test. Does that matter to the bad result?

from gazecapture.

erkil1452 commented on August 16, 2024

Ok, that is a useful observation. So you can decrease the prediction error for the training samples yet the prediction error for the test samples does not go down. What I would do next is to step by step debug what are the differences between training and test passes. Do the input data (shapes, value ranges,...) look the same? How come there is such a difference in the outcome? Can you feed the training data to your test script and get a low prediction error? In theory, the network could of course overfit and learn labels for the training data without any generalization capability, however, most likely there is some simple technical explanation to such weird behavior.

from gazecapture.

appleRtsan commented on August 16, 2024

I tried to split some train data to test but it doesn't work. I have no idea how to call and observe train and test input value ranges. Should I do something in train loader loop?

Do transfer learning works in this issue? If so, which layers should I freeze first? The whole CNN layers?

from gazecapture.

erkil1452 commented on August 16, 2024

I think the issue is that there is some difference between the training and test code. Try training and testing on the exactly same data. Does the train and test error become the same?

from gazecapture.

appleRtsan commented on August 16, 2024

I try your method and I got a weird result : train data and test data MSE loss is 0.84 & 0.7, distance loss is 1.6 & 0. 951 (cm) , test error is less than train error! What's worse, I feed the same data as validation, the MSE loss comes up to 30 and distance loss comes up to 7 cm. I think it is saving checkpoint problem now. How to debug that part? Thanks a lot!!

from gazecapture.

erkil1452 commented on August 16, 2024

It is hard to give any concrete advice in such case. Try to rewrite the code or write a minimum working example that reproduces the behavior. It may just be a tiny little network that takes single floats as input. You can also easily inspect the weights during training, before saving and after loading. Look into pytorch documentation to see how to access weights of any layer. You can then keep printing e.g. the mean and std and see if they stay the same before save and after load.

from gazecapture.

appleRtsan commented on August 16, 2024

I examined the code carefully, and I find when the code starts validating(--sink), there is no command to let checkpoint be loaded. So my parameters in validation is from default, not from checkpoint. I add doLoad = args.sink in main to solve the whole problem. the solution is so easy that I feel really awful.
Really much thanks for your help!

from gazecapture.

Errors in data ( gaze distance ) about gazecapture HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent