Git Product home page Git Product logo

pose-hg-train's Introduction

Stacked Hourglass Networks for Human Pose Estimation (Training Code)

This is the training pipeline used for:

Alejandro Newell, Kaiyu Yang, and Jia Deng, Stacked Hourglass Networks for Human Pose Estimation, arXiv:1603.06937, 2016.

A pretrained model is available on the project site. You can use the option -loadModel path/to/model to try fine-tuning.

To run this code, make sure the following are installed:

Getting Started

Download the full MPII Human Pose dataset, and place the images directory in data/mpii. From there, it is as simple as running th main.lua -expID test-run (the experiment ID is arbitrary). To run on FLIC, again place the images in a directory data/flic/images then call th main.lua -dataset flic -expID test-run.

Most of the command line options are pretty self-explanatory, and can be found in src/opts.lua. The -expID option will be used to save important information in a directory like pose-hg-train/exp/mpii/test-run. This directory will include snapshots of the trained model, training/validations logs with loss and accuracy information, and details of the options set for that particular experiment.

Running experiments

There are a couple features to make experiments a bit easier:

  • Experiment can be continued with th main.lua -expID example-exp -continue it will pick up where the experiment left off with all of the same options set. But let's say you want to change an option like the learning rate, then you can do the same call as above but add the option -LR 1e-5 for example and it will preserve all old options except for the new learning rate.

  • In addition, the -branch option allows for the initialization of a new experiment directory leaving the original experiment intact. For example, if you have trained for a while and want to drop the learning rate but don't know what to change it to, you can do something like the following: th main.lua -branch old-exp -expID new-exp-01 -LR 1e-5 and then compare to a separate experiment th main.lua -branch old-exp -expID new-exp-02 -LR 5e-5.

In src/misc there's a simple script for monitoring a set of experiments to visualize and compare training curves.

Getting final predictions

To generate final test set predictions for MPII, you can call:

th main.lua -branch your-exp -expID final-preds -finalPredictions -nEpochs 0

This assumes there is an experiment that has already been run. If you just want to provide a pre-trained model, that's fine too and you can call:

th main.lua -expID final-preds -finalPredictions -nEpochs 0 -loadModel /path/to/model

Training accuracy metric

For convenience during training, the accuracy function evaluates PCK by comparing the output heatmap of the network to the ground truth heatmap. The normalization in this case will be slightly different than the normalization done when officially evaluating on FLIC or MPII. So there will be some discrepancy between the numbers, but the heatmap-based accuracy still provides a good picture of how well the network is learning during training.

Final notes

In the paper, the training time reported was with an older version of cuDNN, and after switching to cuDNN 4, training time was cut in half. Now, with a Titan X NVIDIA GPU, training time from scratch is under 3 days for MPII, and about 1 day for FLIC.

pypose/

Included in this repository is a folder with a bunch of old python code that I used. It hasn't been updated in a while, and might not be totally functional at the moment. There are a number of useful functions for doing evaluation and analysis on pose predictions and it is worth digging into. It will be updated and cleaned up soon.

Questions?

I am sure there is a lot not covered in the README at the moment so please get in touch if you run into any issues or have any questions!

Acknowledgements

Thanks to Soumith Chintala, this pipeline is largely built on his example ImageNet training code available at: https://github.com/soumith/imagenet-multiGPU.torch

pose-hg-train's People

Contributors

anewell avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pose-hg-train's Issues

Would you please also share your train.log and valid.log for reference?

Hi, would you share the train.log and valid.log for reference? Since it could be a great help for monitoring the training process.

ps. Did you train the model for only 100 epochs? Since the paper mentioned that you drop the learning rate several times. If so, at which epochs did you drop the learning rate? Thank you.

Ask for Tensorflow Version

HI, I had recently read your publications "Hourglass" + "Associative Embedding", and noticed that you were using tensorflow to implement "Associative Embedding".

Is this possible for you to release the tensorflow version in the future? including both two works? After all , lua and torch is not as popular as tf. It is quite inconveniet for other people to reproduce.

Head and Shoulder are not included in computing the accuracy

Hi @anewell ,

I am reading your code and find the accuracy of MPII dataset is computed only based on these predefined joints " self.accIdxs = {1,2,3,4,5,6,11,12,15,16}". In your paper, joints of Head, Shoulder, Elbow, Wrist, Hip, Knee and Ankle are reported. It seems Head and Shoulder are missing in "self.accIdxs". Could you explain why?

Difference between *.h5 annotations and official .mat annotation

Hi i noticed that for some of the images there is a difference between the annotations in the h5 files and the official mpii_human_pose_v1_u12_1.mat file.

For the first entry for example, the order of the parts is different (not really that big of a deal) and the value for scale (3.7763 vs 3.02) and center (594,257 vs 594,302) are different. I made sure that both refer to the same image and the same person id within the image.
Is there any particular reason for this? Does it make a difference if i use one or the other?

Final prediction seems not working with a trained model

Hi!

I have trained a model and during the training I got 0.84/0.80 accuracy. However when I use that trained model to do final prediction, it outputs 0.02 accuracy. Did I use the final prediction incorrectly?

Thanks a lot!

th main.lua  -expID thanksgiving  -trainBatch 3 -continue -LR 1e-8 -LRdecay 1e-2 -nEpochs 10
Saving everything to: /home/ming/pose-hg-train/exp/mpii/thanksgiving
Input is a tensor with dimensions: 3 x 256 x 256
Output is a table
         Entry 1 is a tensor with dimensions: 16 x 64 x 64
         Entry 2 is a tensor with dimensions: 16 x 64 x 64
         Entry 3 is a tensor with dimensions: 16 x 64 x 64
         Entry 4 is a tensor with dimensions: 16 x 64 x 64
         Entry 5 is a tensor with dimensions: 16 x 64 x 64
         Entry 6 is a tensor with dimensions: 16 x 64 x 64
         Entry 7 is a tensor with dimensions: 16 x 64 x 64
         Entry 8 is a tensor with dimensions: 16 x 64 x 64
==> Loading model from: /home/ming/pose-hg-train/exp/mpii/thanksgiving/final_model.t7
==> Converting model to CUDA
==> Starting epoch: 181/190
 [======================================== 8000/8000 ==================================>]  Tot: 30m53s | Step: 232ms    
      train : Loss: 0.0048804 Acc: 0.8402
 [======================================== 1000/1000 ==================================>]  Tot: 1m13s | Step: 72ms      
      valid : Loss: 0.0057696 Acc: 0.8002

th main.lua -expID finalp -finalPredictions -nEpochs 0 -loadModel ../exp/mpii/thanksgiving/final_model.t7 
Saving everything to: /home/ming/pose-hg-train/exp/mpii/finalp
Input is a tensor with dimensions: 3 x 256 x 256
Output is a table
         Entry 1 is a tensor with dimensions: 16 x 64 x 64
         Entry 2 is a tensor with dimensions: 16 x 64 x 64
         Entry 3 is a tensor with dimensions: 16 x 64 x 64
         Entry 4 is a tensor with dimensions: 16 x 64 x 64
         Entry 5 is a tensor with dimensions: 16 x 64 x 64
         Entry 6 is a tensor with dimensions: 16 x 64 x 64
         Entry 7 is a tensor with dimensions: 16 x 64 x 64
         Entry 8 is a tensor with dimensions: 16 x 64 x 64
==> Loading model from: ../exp/mpii/thanksgiving/final_model.t7
==> Converting model to CUDA
==> Generating predictions...
 [======================================== 11731/11731 ================================>]  Tot: 14m9s | Step: 71ms      
      test : Loss: 0.0122433 Acc: 0.0002

meaning of imgname in annot.h5

Hi,

great work!

I don't fully understand what is what you write as imgname in annot.h5. I am training with my own dataset and I am not quite sure what shall be there specified.

right now, in you code, it looks like:
DATASET "imgname" {
DATATYPE H5T_IEEE_F64LE
DATASPACE SIMPLE { ( 40614, 16 ) / ( 40614, 16 ) }
DATA {
(0,0): 48, 51, 55, 52, 53, 52, 48, 49, 50, 46, 106, 112, 103, 0, 0, 0,

which does not correspond to the names of the images. How shall they be interpreted?

thanks in advance
Javier

Official PCKh on the validation set?

Hi Alejandro,

Thanks for sharing your codes!

Since you have noted that the PCKh in the log files is not the official one, could you tell me how to obtain the official PCKh on the MPII validation set using your code, just like the PCKh results reported in your paper?

Thanks,
Wei

Question on the transform function in img.lua

function transform(pt, center, scale, rot, res, invert)
    local pt_ = torch.ones(3)
    pt_[1],pt_[2] = pt[1]-1,pt[2]-1

    local t = getTransform(center, scale, rot, res)
    if invert then
        t = torch.inverse(t)
    end
    local new_point = (t*pt_):sub(1,2):add(1e-4)

    return new_point:int():add(1)
end
  1. Is there any special reason why do you add 1e-4 to the new_point in the end?
  2. Why do you subtract 1 from pt at the start and then add back 1 in the end (instead of just use the original pt)?

Structure of .h5 file

Hi, anewell,

Thank you for sharing the code! It's great help for me to understand the network structure.

Under '/data/mpii/annot/', 'Datasets' for test.h5, valid.h5 and train.h5 are different. What they have in common are:
1-center
2-imgname
3-index
4-normalize
5-part
6-person
7-scale
8-torsoangle
9-visible

I'm trying to fine-tune with my own data. What datasets should I provide in the .h5 file? And what's the meaning of them?

bad argument #6 to 'sub'

Hi, anewell! I've met this issue during training from scratch on different epochs (4,5,12) after latest commits.
==> Starting epoch: 12/100 torch/install/bin/luajit: /opt/torch/install/share/lua/5.1/threads/threads.lua:183: [thread 3 callback] pose-hg-train/src/util/img.lua:115: bad argument #6 to 'sub' (out of range at torch/pkg/torch/generic/Tensor.c:330)

Any ideas about the ways it was caused and the ways it could be fixed?
Thank you in advance

The amplitude of predicted maps

since only relu used in the network, how to ensure that the the amplitude of predicted maps is [0-1] just same as the gaussian maps?

What does the 'modelArgs' mean?

modelArgs is in createModel(modelArgs) in the file model.lua. However, I can not find its definition.
By the way, how can I create a model with fake input data from scratch? I can not figure it out.

Image crop out of range during MPII training

Thanks for releasing the code! It is really helpful.

I tried the training on MPII and after 7 epochs it gave an error for the image cropping being out of range:

==> Starting epoch: 8/100
... pose-hg-train/src/util/img.lua:107: bad argument #6 to 'sub' (out of range...

Do you have any guess why this would happen?

some question about intermediate loss

read from the paper , it seems that the network trained end-to-end, i wonder how the intermediate loss used: is all the loss added together at the last loss layer and do bp or every single hourglass train independently? i do not find the code about the intermediate loss, can you help me ? thanks~

when will you update src/dataset/flic.lua file

Hello

I want to test on this FLIC dataset. Can you upload the file?

Another question, Associative Embedding: End-to-end Learning for Joint Detection and Grouping is a CVPR paper or not? I think the idea is as novel as Part Affinity Field, but results is a bit weak. Anyway, very interesting work.

gaussian kernel size

Sigma of Gaussian used as the target of MSE criterion might be 0.25, despite 1px described in the article.

See:
src/util/img.lua Line 214
local g = image.gaussian(size)

'image'

The default value of the horizontal and vertical standard deviation sigma_horz and sigma_vert of the Gaussian kernel is sigma, where the latter has a default value of 0.25.

Heatmap gaussian standard deviation

We're trying to implement a model similar to yours, but are hitting some confusion with generating the heatmaps for the targets.

In your paper, you say that you use 2D gaussians with a standard deviation of 1 px for targets. However, looking at your drawGaussian function: https://github.com/anewell/pose-hg-train/blob/c25da4b48c74a7b314384b07e8c09a15b0343e7f/src/util/img.lua#L204-L224, you set size to 6 * sigma + 1.

That hmGauss value appears to be 1 per https://github.com/anewell/pose-hg-train/blob/2fef6915fbd836a5d218a5d2f0c87c463532f1a6/src/opts.lua#L64.

This appears to lead to a size of 7, which per the Torch image.gaussian constructor should give a sigma of 1.75.

Am I misunderstanding something here?

clarification of scores variable in postprocessing function

Hi,

I am replicating your work and have one short doubt about the variable "scores" in postprocess function in pose.lua.

What is the meaning of this variable? I cannot fully understand by checking the code what is being stored here. I have seen it is a tensor of 20x1 with values between 0 and 1, could you please explain me a bit what is being stored here?

thanks in advance

How to train this network on new dataset?

Hi @anewell

Thank you so much for sharing this training code. I am wondering if there is a tutorial for training on a new dataset. For example the cub-200-2011 birds dataset. It has provided key points locations and bounding box annotations. Are these enough for making the network run? Thanks!

A question about eval

Hey,I read your paper and implemented it using tensorflow,and after train,I have a question.That is,during test,if I don't know the scale and center of an image(the person maybe in the center,maybe not),what if I just resize a input image to 256x256 directly without scale and center information,and then do prediction?Would the result be bad? Do you have had a try?And if so,is there anyway to get a good result without scale and center?

Using nn.gModule

I have another question, I'm new to machine learning, and especially to torch. Why you wrapped your model with nn.gModule, neither than nn.Module. What is the difference?

Evaluation on MPII Human Pose Dataset

Hi, @anewell, I want to use the evaluation script evaluatePCKh.m downloaded on MPII Human Pose Dataset to measure my model's performance, but I could not find the annolist_dataset_v12.mat file loaded in this line of the script.

load([p.gtDir '/annolist_dataset_v12'],'annolist');

Do you know how to get this file, thanks.

idxs, preds, hms, inp = loadPreds('mpii/test-run/preds', true, false) doesn't work.

The following error pops up:

/home/yxchng/torch/install/share/lua/5.1/hdf5/group.lua:312: HDF5Group:read() - no such child 'heatmaps' for [HDF5Group 33554432 /]
stack traceback:
[C]: in function 'error'
/home/yxchng/torch/install/share/lua/5.1/hdf5/group.lua:312: in function 'read'
/home/yxchng/pose-hg-train/src/util/eval.lua:10: in function 'loadPreds'
[string "idxs, preds, hms, inp = loadPreds('mpii/test-..."]:1: in main chunk
[C]: in function 'xpcall'
/home/yxchng/torch/install/share/lua/5.1/itorch/main.lua:210: in function </home/yxchng/torch/install/share/lua/5.1/itorch/main.lua:174>
/home/yxchng/torch/install/share/lua/5.1/lzmq/poller.lua:75: in function 'poll'
/home/yxchng/torch/install/share/lua/5.1/lzmq/impl/loop.lua:307: in function 'poll'
/home/yxchng/torch/install/share/lua/5.1/lzmq/impl/loop.lua:325: in function 'sleep_ex'
/home/yxchng/torch/install/share/lua/5.1/lzmq/impl/loop.lua:370: in function 'start'
/home/yxchng/torch/install/share/lua/5.1/itorch/main.lua:389: in main chunk
[C]: in function 'require'
(command line):1: in main chunk
[C]: at 0x00405d50

-task'pose-int' vs 'pose'

Hi anewell,

Thanks again for the update!

When set -task from 'pose-int' to 'pose', (not using intermediate supervision) the training goes wrong.
Similar situation when set -nStack to 1.

I guess somewhere when the program anticipates a sequence of length 1, it gets a tensor instead.

Is there any quick fix to this?

Thanks!

about HDF5 errors

when i run th main.lua -expID test-run,there are some errors in the following:
HDF5-DIAG: Error detected in HDF5 (1.8.17) thread 140244906858368:
#000: H5F.c line 604 in H5Fopen(): unable to open file
major: File accessibilty
minor: Unable to open file
#1: H5Fint.c line 992 in H5F_open(): unable to open file: time = Mon Apr 23 15:39:05 2018
, name = '/home/master-grade2-1/linli/pose/pose-hg-train-master/data/mpii/train.h5', tent_flags = 0
major: File accessibilty
minor: Unable to open file
#2: H5FD.c line 993 in H5FD_open(): open failed
major: Virtual File Layer
minor: Unable to initialize object
#3: H5FDsec2.c line 339 in H5FD_sec2_open(): unable to open file: name = '/home/master-grade2-1/linli/pose/pose-hg-train-master/data/mpii/train.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0
major: File accessibilty
minor: Unable to open file
/home/master-grade2-1/linli/torch/torch/install/bin/luajit: ...-1/linli/torch/torch/install/share/lua/5.1/hdf5/file.lua:12: HDF5File: fileID -1 is not valid
stack traceback:
[C]: in function 'error'
...-1/linli/torch/torch/install/share/lua/5.1/hdf5/file.lua:12: in function '__init'
...1/linli/torch/torch/install/share/lua/5.1/torch/init.lua:91: in function <...1/linli/torch/torch/install/share/lua/5.1/torch/init.lua:87>
[C]: in function 'open'
...inli/pose/pose-hg-train-master/src/util/dataset/mpii.lua:19: in function '__init'
...1/linli/torch/torch/install/share/lua/5.1/torch/init.lua:91: in function <...1/linli/torch/torch/install/share/lua/5.1/torch/init.lua:87>
[C]: in function 'Dataset'
...ter-grade2-1/linli/pose/pose-hg-train-master/src/ref.lua:29: in main chunk
[C]: in function 'dofile'
main.lua:2: in main chunk
[C]: in function 'dofile'
...orch/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00406670

how can i solve this problem?thanks:)

Memory consumption of the model

Hi, I am trying to implement your code in theano and I am bumping into memory problems. Can I know how much memory does your model takes on the GPU? I am using Titan X as well. Thanks.

usage of visualize_results

Hi,

I have a short question regarding the usage of the lua-file for visualizing results "misc/visualize_results.lua"

This file is expecting you to store your predictions in exp/mpii/best

is this "best" file and "best_preds" automatically generated when validating your net? When is the group "pred_heatmap" generated in the preds.h5? I have not been able to find this on the code

Thank you very much in advance

Error running with "-continue"

Hi anewell, thanks for sharing this project!
I was running the command th main.lua -expID test-run successfully, and I had to stop after epoch 1. Then I wanted to resume it by the command th main.lua -expID test-run -continue but I met the error below.

Saving everything to: /home/ubuntu/pose-hg-train/exp/mpii/test-run  
/home/ubuntu/torch/install/bin/luajit: /home/ubuntu/pose-hg-train/src/opts.lua:111: attempt to perform arithmetic on field 'lastEpoch' (a nil value)
stack traceback:
    /home/ubuntu/pose-hg-train/src/opts.lua:111: in main chunk
    [C]: in function 'dofile'
    /home/ubuntu/pose-hg-train/src/ref.lua:17: in main chunk
    [C]: in function 'dofile'
    main.lua:2: in main chunk
    [C]: in function 'dofile'
    ...untu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00406670

Is that the lastEpoch is not saved correctly? Have you seen similar errors before and any idea how to solve this?
Thank you in advance.

A PyTorch version

Thanks for sharing your code!

I wrote a pytorch version of hourglass network. Hope this could be helpful for who are not familiar with Torch. Many codes for data processing are brought from your code (src/pypose). Thank the author again!

However, the code cannot reproduce the results perfectly (83.58 [email protected] score for the simplified 4-stack hourglass). Some details might be missed, especially the post processing part (e.g., coordinates mapping, etc). Anyone interested in this project is welcomed to contribute to this project!

Unable to run final prediction

Hi,

I've tried to run the final prediction with the model file downloaded from your website, but get an error. Could you help? Thanks!

th main.lua -expID finalp -finalPredictions -nEpochs 0 -loadModel umich-stacked-hourglass.t7

Input is a tensor with dimensions: 3 x 256 x 256
Output is a table
         Entry 1 is a tensor with dimensions: 16 x 64 x 64
         Entry 2 is a tensor with dimensions: 16 x 64 x 64
         Entry 3 is a tensor with dimensions: 16 x 64 x 64
         Entry 4 is a tensor with dimensions: 16 x 64 x 64
         Entry 5 is a tensor with dimensions: 16 x 64 x 64
         Entry 6 is a tensor with dimensions: 16 x 64 x 64
         Entry 7 is a tensor with dimensions: 16 x 64 x 64
         Entry 8 is a tensor with dimensions: 16 x 64 x 64
==> Loading model from: ../../pose-hg-demo/umich-stacked-hourglass.t7
==> Converting model to CUDA
==> Generating predictions...
/home/ming/torch/install/bin/luajit: /home/ming/torch/install/share/lua/5.1/nn/MSECriterion.lua:13: attempt to index local 'input' (a nil value)
stack traceback:
        /home/ming/torch/install/share/lua/5.1/nn/MSECriterion.lua:13: in function 'updateOutput'
        ...ing/torch/install/share/lua/5.1/nn/ParallelCriterion.lua:23: in function 'forward'
        /home/ming/pose-hg-train/src/train.lua:47: in function 'step'
        /home/ming/pose-hg-train/src/train.lua:106: in function 'predict'
        main.lua:38: in main chunk
        [C]: in function 'dofile'
        ...ming/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
        [C]: at 0x00405d50

inaccuracy caused by transform

Hi newell:

I notice that transform function causes the inaccuracy

function transform(pt, center, scale, rot, res, invert)
    local pt_ = torch.ones(3)
    pt_[1],pt_[2] = pt[1]-1,pt[2]-1

    local t = getTransform(center, scale, rot, res)
    if invert then
        t = torch.inverse(t)
    end
    local new_point = (t*pt_):sub(1,2)

    return new_point:int():add(1)
end

the type of return value is int
this will introduce the inaccuracy.

You can find that the variable diff's norm is very big in the following testing code.
That's a big problem.

t_pt = transform(pt, center, scale, rot, res, False)
recovered_pt = transform(t_pt, center, scale, rot, res, True)
diff = pt-recovered_pt

In the train.lua, you use the following code to overcome this problem when it's validating or testing
Am I right?

 -- Validation: Get flipped output
output = applyFn(function (x) return x:clone() end, output)
local flippedOut = model:forward(flip(input))
flippedOut = applyFn(function (x) return flip(shuffleLR(x)) end, flippedOut)
output = applyFn(function (x,y) return x:add(y):div(2) end, output, flippedOut)

Multi GPU training

Hi anewell,

I'm new to Torch. Is there a way to train the model using multiple GPUs like Caffe does? Thank you!

Training on MSCOCO keypoint dataset

Hi @anewell ,

First of all thank you for sharing your code.

I'm preparing COCO keypoint annotations and dataset specific interface file to train on COCO. I've done mostly except one issue. MPII dataset provides head bbox for each person which COCO doesn't have. To overcome this I found a work around:

  • If shoulders are visible use distance between shoulders to define head bbox, else use person bbox and ideal human body ratio to define head bbox.

But this approach is erroneous when the image doesn't contain whole body.

Is there any advice to properly define the head size?

Below code snippet belongs to src/misc/convert_annot.py file:

# Find shoulder coordinates
left_shoulder = (ann['keypoints'][0::3][6], ann['keypoints'][1::3][6])
right_shoulder = (ann['keypoints'][0::3][5], ann['keypoints'][1::3][5])
            
# If shoulders not visible then approximate head bbox with person bbox values
if left_shoulder == (0,0) or right_shoulder == (0,0):
    diff = np.array([ann['bbox'][3]/7.5, ann['bbox'][2]/7.5], np.float)
    normalization = np.linalg.norm(diff) * .6

# If shoulders are visible define head bbox according to dist between shoulders
else:
    dist = math.sqrt((right_shoulder[0] - left_shoulder[0])**2 + (right_shoulder[1] - left_shoulder[1])**2)
    diff = np.array([dist/2, dist/1.5], np.float)
    normalization = np.linalg.norm(diff) * .6

annot['normalize'] += [normalization]

Missing labels of data/mpii/annot.h5'

There are lots of missing entries in annot.h5 files.
One direct result is that the main.lua -eval returns nan for accuracy.
I also print out several parts labels directly from the dataset.

After creating the dataset obj, run following scripts

for i = 1, 100 do   -- print the  randomly selected  pts
    local pts = dataset:getPartInfo(i)
    print('the joint centers of test ',i,'\n coordinates are',pts )  -- totally empty , the
end

part of the results

the joint centers of test 62
coordinates are 1033 538
1044 463
1051 344
1099 352
1116 458
1117 543
1075 348
1077 255
1078 252
1116 195
1125 253
1105 293
1048 259
1105 250
1108 307
1110 357
[torch.DoubleTensor of size 16x2]

the joint centers of test 63
coordinates are 0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
[torch.DoubleTensor of size 16x2]

the joint centers of test 64
coordinates are 0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
[torch.DoubleTensor of size 16x2]

the joint centers of test 65
coordinates are 0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
[torch.DoubleTensor of size 16x2]

the joint centers of test 66
coordinates are 0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
[torch.DoubleTensor of size 16x2]

Actually, from 63 to 97, the labels are totally empty.

Could you provide a correct annotation files ?
Thanks.

Validation loss/accuracy inconsistency problem

Hi @anewell ,

I found that during training, I can get the following log

==> Starting epoch: 40/100
 [======================================== 8000/8000 ==================================>]  Tot: 1h10m | Step: 526ms     
      train : Loss: 0.0056964 Acc: 0.8067
 [======================================== 1000/1000 ==================================>]  Tot: 1m54s | Step: 116ms     
      valid : Loss: 0.0060296 Acc: 0.8033

Then I use the following cmd to do testing on validation set

th main.lua -dataDir ../data/ -expDir ../exp/ -trainIters 0 -loadModel ../exp/mpii/default/model_40.t7

and get the following result! The validation accuracy and loss are different with those during training.

Saving everything to: /home/wyang/code/pose/iccv17/pose-hg-train/exp/mpii/default	
Input is a tensor with dimensions: 3 x 256 x 256	
Output is a table	
	 Entry 1 is a tensor with dimensions: 16 x 64 x 64	
	 Entry 2 is a tensor with dimensions: 16 x 64 x 64	
	 Entry 3 is a tensor with dimensions: 16 x 64 x 64	
	 Entry 4 is a tensor with dimensions: 16 x 64 x 64	
	 Entry 5 is a tensor with dimensions: 16 x 64 x 64	
	 Entry 6 is a tensor with dimensions: 16 x 64 x 64	
	 Entry 7 is a tensor with dimensions: 16 x 64 x 64	
	 Entry 8 is a tensor with dimensions: 16 x 64 x 64	
==> Loading model from: ../exp/mpii/default/model_40.t7	
==> Converting model to CUDA	
==> Starting epoch: 1/100	
 [======================================== 1000/1000 ==================================>]  Tot: 2m42s | Step: 115ms     
      valid : Loss: 0.0050312 Acc: 0.8510	

I check your code many times but still have no idea. Can you repeat my problem? How to solve this? Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.