csailvision / gazecapture Goto Github PK

View Code? Open in Web Editor NEW

921.0 921.0 249.0 87.9 MB

Eye Tracking for Everyone

Home Page: http://gazecapture.csail.mit.edu

License: Other

MATLAB 48.74% Python 51.26%

gazecapture's People

Contributors

Stargazers

Watchers

Forkers

jtiger0431 liuxiabing150 rnsandeep pramodyn amos-zq zzmjohn wanjinchang sunjieee kinuith hyzcn lunzueta gpengzhi sneheshs boosting officea1t zmxu tangxinkevin issac8huxley vbillys jzyztzn leconize jwgu john-you hbinol nchicas dhruvmalik007 1165048017 lei-wang-github rdroste antoniodefalco ieyer ken2yliu proverbs sennraf dabin22 pandinosaurus overlordxxx vladislavtitov lamhocn buaa1309lk liujigang82 oveddan rofailag18 ai3dvision cvclass-fun thinkronize vcchy irfanmustafas peterzhousz johnsnow511 youngxz bovey0809 qjxx mythke xingbod swapnilpande finger563 robinkalia lianlengyunyu lamichhanekamal joseph-zhong qingfeng0210 artembelopolsky hvthong apongos waltersharpwei yoonkih filipabit dvarnai mattanimation iscas-lee klqulei jiajiongcao thejackest zhuoqchang pan0322 edmig ankur-123 horanyinora jw526 aodamiaomiao tobiasschlatter dongyanchaotj raptor16 chomolungma leftthink feirenlg yu-ig amenieljerbi cherubines hhcyforever evilyingyun hnjm gwliu213 gurovnik yusameki ankupnku levexis mingmchen bubblyyi

gazecapture's Issues

What is train_y and val_y in dataset?

I expect both train_y and val_y are the 2D coordination of eye graze .

But the result is quite strange. Most of the coordination are very small or negative. That means most eye graze always point at the left top side. But the fact doesn't like that. Most of them are wrong.

Please concern the circle and the center of circle. The center is the predicted point of eye gaze.

Somebody told me that the data was normalized, but how? How do I find the true gaze point.

Can use for car driver monitor and how to improve real-time performance?

Thank you for this great work!
I have 2 questions.

Can I use this for driver eye track monitor in the car?
Would you please provide some ideas on how to improve the real-time performance, e.g. 20 fps or so.
Thanks a lot!

Main website down

http://gazecapture.csail.mit.edu/ seems to be down once again, similar to #7

About train and test augmentation mentioned in the paper

Hello, could you pls share the parameters of train/test augmentation that are mentioned in the Itracker paper?
The only description in the paper was 'shifting the eyes and the face, changing face grid appropriately.' Could u pls tell us the values/ranges? We just can't reproduce ur accuracy with test augmentions...
Thanks!

How to use the caffe model and the relation about pytorch code and the trained model?

HI, my nice friend! I have to disturb you again.
Now I want to use the itracker_iter_92000.caffemodel directly for the inference. but I met some problems, I want to be clear more.

I saw the code, the training and testing code is pytorch, I wonder the caffemodel is trained by torch or by caffe code?
If I want to inference the image, whether I will do by the way as the pytorch code does
(1) the image is read by RGB format, then divde 255 to 0--1 range
(2) I think the mean image is RGB sequence to, mean image still divdes 255 to 0--1 range
(3) Input image(left eye, right eye, face) will seperately subtract their mean image, so some values may be below zero, we call left_eye_sub_mean_img, right_eye_sub_mean_img, face_sub_mean_img
(4) So left_eye_sub_mean_img, right_eye_sub_mean_img, face_sub_mean_img and face_mask will be as input images to inference the two values.

I wonder what I describe is right? or maybe I miss something?
Please check for me! Thank you very much!
Best Regards

about SubtractMean

When I test the model with a new picture, a SubtractMean is applied after the picture transferred to Tensor.
MeanImg refers to the mean of the training set or the mean of this picture?

Unable to login and download GazeCapture dataset

I have registered using my institute e-mail and verified my account. But I'm unable to login and download the dataset from https://gazecapture.csail.mit.edu/download.php. So could you please provide another available download solution. Thank you. @quantombone @adikhosla @andrewowens @visionATcsail

can not untar checkpoint.pth.tar

Hi,

It seems that the weights of pytorch model is not correct.
Please help me to re-upload it.

Thanks

Data split

Hi,

You guys mentioned the the data is split by patient. Where can I get the patients ids used in train/validation/test set?

Model for real-time inference

Hi,
First of all thank you for making your dataset and code available to the public!
We would like to replicate your model for real-time inference (Section 4.2 in the paper). Is the precise network layout / pre-trained model available somewhere?

Thanks in advance,
Tobias

GazeCapture dataset Download problem

Good work. But I‘m sorry that I couldn't download GazeCapture dataset from the https://gazecapture.csail.mit.edu/download.php. Account is always unable to log in to get the correct download link due to an unknown error. I have tried many times to register and login but failed. So could you provide another available download solution. Thank you. @quantombone @adikhosla @andrewowens @visionATcsail

Regarding Face Grid

Hey,
May i please know what are the FrameW and FrameH arguments, are they the original frame width and height(480640) or is it the Resized value(224224).

How The faceGrid width and height values are same in json for ex. 13*13??

Thanks,
Madan

Matlab Compatibility Code in prepareDataset.py

In prepareDataset.py, I encounter the following code section -

GazeCapture/pytorch/prepareDataset.py

Line 103 in e09c285

 faceBbox = bboxFromJson(appleFace) + [-1,-1,1,1] # for compatibility with matlab code 

I understand that after reading the values from appleFace.json as int, the X & Y pixel coordinates are treated as 1-indexed values (which is Matlab compatible). So, for converting it to 0-indexed in Python, we should add [-1,-1,0,0] to [X,Y,W,H]. But in the code, [-1,-1,1,1] is added (which will increase the width & height of face crops by 1 pixel).
Can you please clarify the reason why 1 is added to W & H? I know that increment of 1 pixel wouldn't matter much, but I'd like to get clarified.
Also, for leftEyeBbox & rightEyeBbox, [-1,-1,0,0] should be added instead of [0,-1,0,0], according to me.

Thanks.

I can not download the data

I have register an account on the website, but I can not sign in, and I can not get the GazeCapture Dataset

Cannot download the data

I registered to the website and verified my institutional email and yet cannot login to download the data. Please advise if I am missing something.

Inconsistency between described and actual model?

I am currently going through the pytorch eye model code and stumbled across an inconsistency that I suspect is being left out in the article on purpose, which just needs confirming. The described model does not mention max pooling but is being used between each layer in the code.

The described model is as follows:

The output is the distance, in centimeters, from the camera. CONV rep-
resents convolutional layers (with filter size/number of kernels: CONV-E1,CONV-F1: 11 × 11/96, CONV-E2,CONV-F2:
5 × 5/256, CONV-E3,CONV-F3: 3 × 3/384, CONV-E4,CONV-F4: 1 × 1/64)

but a max pool later can be seen between each layer in there code:

class ItrackerImageModel(nn.Module):
    # Used for both eyes (with shared weights) and the face (with unqiue weights)
    def __init__(self):
        super(ItrackerImageModel, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=0),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.CrossMapLRN2d(size=5, alpha=0.0001, beta=0.75, k=1.0),
            nn.Conv2d(96, 256, kernel_size=5, stride=1, padding=2, groups=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.CrossMapLRN2d(size=5, alpha=0.0001, beta=0.75, k=1.0),
            nn.Conv2d(256, 384, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 64, kernel_size=1, stride=1, padding=0),
            nn.ReLU(inplace=True),
       )

Compatibility with latest iPhones

Hi and thank you for the awesome work you have done here.

One question regarding the inference on new generation iPhones. I want to infer the model on an iPhone XR the main differences between this model and the previous version (used for the training) are the position of the camera (The camera is almost part of the screen now), and the size of the screen (6.06 inches diagonally for the XR vs 4.6 inches diagonally for the 6 ). As a result, the output of the model is always underestimated.

Is calibration the only answer to this issue? Or can we apply a kind of transformation to the output based on the iPhone size ? How would you tackle this problem?

Many thanks

how to inferece use your model on my own video or image?

hi,
firstly, thanks your grate work!
but when I use your pytorch model I have some question:
(1)which is pre-trained model to is useable?
(2)if I want to inference on my own image or video, how it can be work?

thanks very much!

data processing

Hi，I would like to know if you have any human eye and human eye image preprocessing, thank you for your answer！

Unable to reproduce the result

I am using the given pretrained caffe model but getting a euclidean loss of much more than mentioned in the paper.Please look into my code and tell me where I am making mistake.
loading caffe model ,doing a forward pass to get the output.
CaffeModel.zip

I/O Error(Errno 5) when running prepareDataset.py

Solution: Check down my last comment.It was caused by a fault in my hardware.
I am facing an I/O error "OSError: [Errno 5] Input/output error" when running
prepareDataset.py
I found in internet that I can redirect the output using >/dev/null 2>&1 after
the command,but it doesnt create all subdirectories

python prepareDataset.py --dataset_path [A = where extracted] --output_path [B = where to save new data]

Is this command running fine at your system? I am using Ubuntu 16.04 LTS
I write below the error backtrace:
Traceback (most recent call last):
File "prepareDataset.py", line 273, in
main()
File "prepareDataset.py", line 125, in main
img = np.array(img.convert('RGB'))
File "/home/user/anaconda3/envs/myenv/lib/python3.6/site-packages/PIL/Image.py", line 934, in convert
self.load()
File "/home/user/anaconda3/envs/myenv/lib/python3.6/site-packages/PIL/ImageFile.py", line 234, in load
s = read(self.decodermaxblock)
File "/home/user/anaconda3/envs/myenv/lib/python3.6/site-packages/PIL/JpegImagePlugin.py", line 398, in load_read
s = self.fp.read(read_bytes)

training stuck problem

Hi, thanks for sharing code and dataset.
I have some problem in training
I just run prepareDataset.py and then run the main.py
when running main.py, it stuck in the line 161:enumerate(train_loader)
when running that, the program runs out of my RAM by executing dozens of python and then the program get stucked, even not showing warning message, so I am sure that it is not memory error
My OS is windows and I use anaconda power shell, CUDA version is 10.1
How to fix that problem?

Relative change in the results does not follow relative change in eye movement

Hello! Thanks for your hard work!

Recently i've been trying to use the caffe model but since i want to try to use this fully on CPU anyways and i dont want to try to install caffe as a first option, i opted to load the caffe model through openCV's DNN module instead.

I cant see how the caffe model is being used in the repo so i tried to implement the sampe pipeline as the pytorch one. But unfortunately, i have met with strange results..

i stared straight at the camera, but it's way off (6 cm horizontal and 2 cm vertical). I tried looking left and right but seemingly, they yield no effect. The values does not vary according to the general direction of my eyes (as in, the relative changes), so i want to ask if my pipeline is correct?

Get face
Get eyes
crop face and eyes and create grid (grid is all 1's where the face is within the grid)
I resized the face and eyes so they are warped
divide face and eyes images by 255
load the means, and divide them by 255
substract the means from their respective images
resize them and shove them inside the model
get results

Also i used some assumptions i observed and i confirmed from the pytorch code

RGB image
Right eye is the eye detected on the left and not the right side of the image

As a background, im using it on my laptop but i noticed this repo succeeds with usage not in a mobile device so what im asking is, what is the expected face distance from the camera?

Thanks!

Is there any visualization code or APP？

This work is great and interesting!
I have run the pytorch code in Linux but I didn't find any visualization. Is there any code or APP in Windows/Linux/IOS/Android so that I can have an intuitive experience.
Thanks!

Project website is down?

I failed to visit http://gazecapture.csail.mit.edu (connection timed out) when trying to download the dataset. Is it down or under maintainance?

Dataset is access denied

I am not to open the url http://gazecapture.csail.mit.edu/download.php to download the dataset.

pytorch checkpoint is corrupted

The tarfile seems to be corrupted: pytorch/checkpoint.pth.tar

Anyone had luck with pytorch inference?

Has anyone achieved accurate predictions using their own data with the pytorch implementation? I am getting inaccurate predictions after just changing the data loading paths in ITrackerData.py and using the checkpoint.

Face Grid Arguments

Hey,
I wanted to know what you guys are passing as arguments to get face grid values. Is Frame W/H is original image size or what values are you passing??. is Grid W/H fixed grid size 25*25?? is labelface x.y.w.h are face detection values?? Please let me know. I am stuck and perplexed in this. Thank you.

nan while evaluating

i am getting nan while evaluating it on the mpiigaze dataset. I am using pytorch for implementation.

Performance problems in later pytorch versions

Hi, is there a known reason why the Pytorch version chosen is 0.4.1?

It seems that later versions of pytorch take ~20x longer in the computation of gradients (back-propagation). I wonder if this is a known issue and the main reason why this version of torch was chosen. I encountered this behavior because I need a later version of pytorch to get some extra features.

how to draw 3D gaze direction line in the 2D image by the prediction coordinate

hello, dear friend!
Now I want to draw 3D gaze direction line in the 2D image by the prediction coordinate,
would you please give me any ideas?
Thanks a lot!

Best Regards

is it likely to run with latest pytorch on ngc cloud pytorch docker container?

Steps to reproduce the issue:

start a cloud instance with GPU, at Google Cloud Platform: https://console.cloud.google.com/marketplace/details/nvidia-ngc-public/nvidia-gpu-cloud-pytorch-image?q=pytorch&id=9b7fee75-4130-40c6-b055-c56a4f49d42a
clone the repo:
git clone https://github.com/CSAILVision/GazeCapture/ cd GazeCapture
pull a docker container:
docker pull nvcr.io/nvidia/pytorch:20.02-py3
download the dataset:

mkdir database && cd database
wget -O gazecapture.tar "https://gazecapture.csail.mit.edu/dataset.php?

tar -xvf gazecapture.tar
cat *.tar.gz | tar zxvf - -i

start the container
docker run --gpus all -it --rm -v /home/user/GazeCapture:/mount nvcr.io/nvidia/pytorch:20.02-py3
enter the folder and execute the following steps
cd /mount
python prepareDataset.py --dataset_path base/--output_path output
python main.py --data_path output --reset

Eye Bounding Box

Hi,
As far as I know the iOS face/eye detection service does not provide bounding boxes for eyes, only eye and eyebrow control points (landmarks). What was the procedure/algorithm that you used to generate bounding boxes for eyes from the provided information?
Thanks, Botond

Size of FaceGrid's Content

Given that the input image is a rectangle and input face is square, and face grid is calculated from these two images. However, when I view the facegrid data, I find that the height and width are the same, for example 14 x 14. How to generate face grid with square shape content?

checkpoint file doesn't work

I try to load the GazeCapture dataset and use the checkpoint file to test, which sad it can reach L2 error of 2.46cm. But the checkpoint file totally not work and reach the L2 error of about 25cm, I don't know where my problem is, can anybody help? Thanks!!!

Inference on webcam

Hello. Thank you for sharing youor code.

I'm currently is trying to launch your pytorch code on webcam. As i understand, i need to firstly detect face and both eyes on the frame and then launch model on that data and i can put anything as y-data since i'm only want to evaluate, not train. But one question still remains - how to get faceGrid? What this array contains and is it possible to get it somehow?

Division by 255 in SubtractMean

Hii...
Sorry to bother with a small issue - The ITrackerData.py file was revised on 26 Jan 2019, where division by 255 was introduced in SubtractMean class. Can you please specify the reason why this change was introduced? I mean, the code worked fine earlier without division with 255. What changed in due course of time that this change was introduced?

Thanks

how about the predictin result in gaze360?

hello, dear friend!
I wonder the prediction result is (Yaw, Pitch, Roll)?
and how to change to Two dimensional coordinates？
Thanks a lot!

Project Website Down

Can anyone access the website?

Laptop Gaze Inference

Can I use the models for predicting gaze point on the laptop screen.

I am working on a project to track the gaze to move the mouse pointer around the screen and want to know if I can use your models to predict this gaze from a laptop's builtin webcam.

Thank you.

Project Site is again not working.

The project site is not working. please fix this issue.

Calculating faceGrid.json from appleFace.json

There were already 3 issues raised regarding faceGrid, but none of them resolved my issue, which is the following -
How is [xLo, yLo, w, h] calculated for faceGrid.json from given face bounding box [X,Y,W,H] in appleFace.json?
E.g. - For recording 00002 & frame 00000.jpg -
[frameW, frameH] = [480, 640]
scaleX = 25/480 = 0.052
scaleY = 25/640 = 0.039
face bounding box [X,Y,W,H] = [38.15, 230.04, 343.68, 343.67]

Now, according to the following code snippet -

GazeCapture/code/faceGridFromFaceRect.m

Lines 29 to 37 in e09c285

 % Use one-based image coordinates. 

 xLo = round(labelFaceX(i) * scaleX) + 1; 

 yLo = round(labelFaceY(i) * scaleY) + 1; 

 w = round(labelFaceW(i) * scaleX); 

 h = round(labelFaceH(i) * scaleY); 

 if parameterized 

 labelFaceGrid(i, :) = [xLo yLo w h]; 

 else

xLo = round(38.15 x 0.052) + 1 = 3
yLo = round(230.04 x 0.039) + 1 = 10
w = round(343.68 x 0.052) = 18
h = round(343.67 x 0.039) = 13
i.e. [xLo, yLo, w, h] = [3, 10, 18, 13]
but in faceGrid.json, the corresponding value given is [6, 10, 13, 13].

Why is there this significant difference in faceGrid.json values? Are these values calculated by using above formulae, or some other formulae? Also, I'm beginning to suspect that the faceGrid.json might have been obtained independently & not by using some formula on appleFace.json. Please clarify..

Thanks

pytorch pre-trained model

Hi,
I have some questions. Is the checkpoint.pth.tar the pre-trained model used in the publish work?
It's shown that caffe pre-trained model is provided, is it possible that you can provide us the pytorch version? Is there a way to convert the caffe model to pytorch model? (we can't find a reliable tool)
Thanks >v<

Strategy to Crop Face and Eyes

Hello, thank you for your public source code and dataset. I want to use the model in android phone to control an application by eye movement. Thus, I need to know how did the face and eye parts are cropped in the dataset so that I can feed new samples cropped in android to the model.

Unable to Login

I am unable to login and download the dataset. After creating an account, I tried logging in and access was denied.

Explore page not working on https://gazecapture.csail.mit.edu/explore.php

Navigate to https://gazecapture.csail.mit.edu/explore.php. On that page, there is only a black image. There are some errors in the console, and debugging the JavaScript at a high level (since the code is minified) shows that it calling code in https://gazecapture.csail.mit.edu/js/load-image.all.min.js.

HTML1521: Unexpected "" or end of file. All open elements should be closed before the end of the document.

Link to paper is incorrect

The link to the paper on https://gazecapture.csail.mit.edu/index.php is not working.

Current Link: https://gazecapture.csail.mit.edu/cvpr2016_gazecapture.pdf
Result: 404

Working Link: https://people.csail.mit.edu/khosla/papers/cvpr2016_Khosla.pdf

how to get labels

sorry to bother you，could you please describe more details to get labels， Thank you so much！！

Run time Error in Reading Kernel Image

After Getting the metadata i am running it with main.py but when i do initially i get the warning "Found GPU0 Quadro K1100M which is of cuda capability 3.0. PyTorch no longer supports this GPU because it is too old. The minimum cuda capability that we support is 3.5." and then i get the run time error "CUDA Cannot read kernel image". I am not able to relate this problem??. I am running through anaconda Windows 10, CUDA 9.0 and pytorch 0.4.1. Thank you.

	% Use one-based image coordinates.
	xLo = round(labelFaceX(i) * scaleX) + 1;
	yLo = round(labelFaceY(i) * scaleY) + 1;
	w = round(labelFaceW(i) * scaleX);
	h = round(labelFaceH(i) * scaleY);

	if parameterized
	labelFaceGrid(i, :) = [xLo yLo w h];
	else

csailvision / gazecapture Goto Github PK

gazecapture's People

Contributors

Stargazers

Watchers

Forkers

gazecapture's Issues

Recommend Projects

Recommend Topics

Recommend Org