Git Product home page Git Product logo

simdr's Introduction

Is 2D Heatmap Even Necessary for Human Pose Estimation?

PyTorch training code and pretrained models for SimDR (Simple yet effective Disentangled Representation for keypoint coordinate)

The 2D heatmap representation has dominated human pose estimation for years due to its high performance. However, heatmap-based approaches suffer from several shortcomings:

    1. The performance drops dramatically in the low-resolution images, which are frequently encountered in real-world scenarios.
    1. To improve the localization precision, multiple upsample layers may be needed to recover the feature map resolution from low to high, which are computationally expensive.
    1. Extra coordinate refinement is usually necessary to reduce the quantization error of downscaled heatmaps.

Intro: Given the shortcomings revealed above, we don't think 2D heatmap is the final solution for keypoint coordinate representation to this field. By contrast, SimDR is a simple yet effective scheme which gets rid of extra post-processing and reduces the quantisation error by the coordinate representation design. For the first time, SimDR brings heatmap-free methods to the competitive performance level of heatmap-based methods, outperforming the latter by a large margin in low input resolution cases. Additionally, SimDR allows one to directly remove the time-consuming upsampling module of some methods, which may inspire new researches on lightweight models for Human Pose Estimation

We hope proposed SimDR will motivate the community to rethink the design of coordinate representation for 2D human pose estimation.

For details see Is 2D Heatmap Even Necessary for Human Pose Estimation by Yanjie Li, Sen Yang, Shoukui Zhang, Zhicheng Wang, Wankou Yang, Shu-Tao Xia, Erjin Zhou.

image

News!

  • [2021.08.17] The pretrained models are released in Google Drive!
  • [2021.07.09] The codes for SimDR and SimDR* (space-aware SimDR) are released!

Experiments

Results on COCO test-dev set

Method Representation Input size GFLOPs AP AR
SimBa-Res50 heatmap 384x288 20.0 71.5 76.9
SimBa-Res50 SimDR* 384x288 20.2 72.7 78.0
HRNet-W48 heatmap 256x192 14.6 74.2 79.5
HRNet-W48 SimDR* 256x192 14.6 75.4 80.5
HRNet-W48 heatmap 384x288 32.9 75.5 80.5
HRNet-W48 SimDR* 384x288 32.9 76.0 81.1

Note:

  • Flip test is used.
  • Person detector has person AP of 60.9 on COCO test-dev2017 dataset.
  • GFLOPs is for convolution and linear layers only.

Results on COCO validation set

Method Representation Input size #Params GFLOPs Extra post. AP AR
SimBa-Res50 heatmap 64x64 34.0M 0.7 Y 34.4 43.7
heatmap 64x64 34.0M 0.7 N 25.8 36.0
SimDR (ours) 64x64 34.1M 0.7 N 40.8 49.6
heatmap 128x128 34.0M 3.0 Y 60.3 67.6
heatmap 128x128 34.0M 3.0 N 55.4 63.6
SimDR (ours) 128x128 34.8M 3.0 N 62.6 69.5
heatmap 256x192 34.0M 8.9 Y 70.4 76.3
heatmap 256x192 34.0M 8.9 N 68.5 74.8
SimDR (ours) 256x192 36.8M 9.0 N 71.4 77.4
TokenPose-S heatmap 64x64 4.9M 1.4 Y 57.1 64.8
heatmap 64x64 4.9M 1.4 N 35.9 47.0
SimDR (ours) 64x64 4.9M 1.4 N 62.8 70.1
heatmap 128x128 5.2M 1.6 Y 65.4 71.6
heatmap 128x128 5.2M 1.6 N 57.6 64.9
SimDR (ours) 128x128 5.1M 1.6 N 71.4 76.4
heatmap 256x192 6.6M 2.2 Y 72.5 78.0
heatmap 256x192 6.6M 2.2 N 69.9 75.8
SimDR (ours) 256x192 5.5M 2.2 N 73.6 78.9
SimBa-Res101 heatmap 64x64 53.0M 1.0 Y 34.1 43.5
heatmap 64x64 53.0M 1.0 N 25.7 36.1
SimDR (ours) 64x64 53.1M 1.0 N 39.6 48.9
heatmap 128x128 53.0M 4.1 Y 59.2 66.7
heatmap 128x128 53.0M 4.1 N 54.4 62.5
SimDR (ours) 128x128 53.5M 4.1 N 63.1 70.1
heatmap 256x192 53.0M 12.4 Y 71.4 77.1
heatmap 256x192 53.0M 12.4 N 69.5 75.6
SimDR (ours) 256x192 53.7M 12.4 N 72.3 78.0
HRNet-W32 heatmap 64x64 28.5M 0.6 Y 45.8 55.3
heatmap 64x64 28.5M 0.6 N 34.6 45.6
SimDR (ours) 64x64 28.6M 0.6 N 56.4 64.9
heatmap 128x128 28.5M 2.4 Y 67.2 74.1
heatmap 128x128 28.5M 2.4 N 61.9 69.4
SimDR (ours) 128x128 29.1M 2.4 N 70.7 76.7
heatmap 256x192 28.5M 7.1 Y 74.4 79.8
heatmap 256x192 28.5M 7.1 N 72.3 78.2
SimDR 256x192 31.3M 7.1 N 75.3 80.8
HRNet-W48 heatmap 64x64 63.6M 1.2 Y 48.5 57.8
heatmap 64x64 63.6M 1.2 N 36.9 47.8
SimDR (ours) 64x64 63.7M 1.2 N 59.7 67.5
heatmap 128x128 63.6M 4.9 Y 68.9 75.3
heatmap 128x128 63.6M 4.9 N 63.3 70.5
SimDR (ours) 128x128 64.1M 4.9 N 72.0 77.9
heatmap 256x192 63.6M 14.6 Y 75.1 80.4
heatmap 256x192 63.6M 14.6 N 73.1 78.7
SimDR (ours) 256x192 66.3M 14.6 N 75.9 81.2

Note:

  • Flip test is used.
  • Person detector has person AP of 56.4 on COCO val2017 dataset.
  • GFLOPs is for convolution and linear layers only.
  • Extra post. = extra post-processing towards refining the predicted keypoint coordinate.

Results on higher input resolution

Results on the COCO validation set with the input size of 384ร—288.

Method Representation AP AP_50 AP_75 AP_M AP_L AR
SimBa-Res50 heatmap 72.2 89.3 78.9 68.1 79.7 77.6
SimDR (ours) 73.0 89.3 79.7 69.5 79.9 78.6
SimDR* (ours) 73.4 89.2 80.0 69.7 80.6 78.8
SimBa-Res101 heatmap 73.6 89.6 80.3 69.9 81.1 79.1
SimDR (ours) 74.2 89.6 80.9 70.7 80.9 79.8
SimBa-Res152 heatmap 74.3 89.6 81.1 70.5 81.6 79.7
SimDR (ours) 74.9 89.9 81.5 71.4 81.7 80.4
HRNet-W48 heatmap 76.3 90.8 82.9 72.3 83.4 81.2
SimDR* (ours) 76.9 90.9 83.2 73.2 83.8 82.0

Note:

  • Flip test is used.
  • Person detector has person AP of 56.4 on COCO val2017 dataset.

Results on MPII val set

Method Representation Input size Hea Sho Elb Wri Hip Kne Ank Mean
[email protected]
HRNet-W32 heatmap 64x64 89.7 86.6 75.1 65.7 77.2 69.2 63.6 76.4
SimDR (ours) 64x64 96.5 89.5 77.5 67.6 79.8 71.5 65.0 78.7
heatmap 256x256 97.1 95.9 90.3 86.4 89.1 87.1 83.3 90.3
SimDR (ours) 256x256 96.8 95.9 90.0 85.0 89.1 85.4 81.3 89.6
SimDR* (ours) 256x256 97.2 96.0 90.4 85.6 89.5 85.8 81.8 90.0
[email protected]
HRNet-W32 heatmap 64x64 12.9 11.7 9.7 7.1 7.2 7.2 6.6 9.2
SimDR (ours) 64x64 30.9 23.3 18.1 15.0 10.5 13.1 12.8 18.5
heatmap 256x256 44.5 37.3 37.5 36.9 15.1 25.9 27.2 33.1
SimDR (ours) 256x256 50.1 41.0 45.3 42.4 16.6 29.7 30.3 37.8

Note:

  • Flip test is used.
  • It seems that there is a bug while computing [email protected] in the original code, we have it fixed in this repo.

Results on CrowdPose

Method Representation Input size AP AP_50 AP_75 AP_E AP_M AP_H
HRNet-W32 heatmap 64x64 42.4 69.6 45.5 51.2 43.1 31.8
SimDR (ours) 64x64 46.5 70.9 50.0 56.0 47.5 34.7
heatmap 256x192 66.4 81.1 71.5 74.0 67.4 55.6
SimDR (ours) 256x192 66.7 82.1 72.0 74.1 67.8 56.2

Start to use

1. Dependencies installation & data preparation

Please refer to THIS to prepare the environment step by step.

2. Model Zoo

Pretrained models are provided in our model zoo.

3. Trainging

Training on COCO train2017 dataset

To train with SimDR as keypoint coordinate representation :

python tools/train.py \
    --cfg experiments/coco/hrnet/simdr/nmt_w48_256x192_adam_lr1e-3.yaml\

To train with SimDR* as keypoint coordinate representation :

python tools/train.py \
    --cfg experiments/coco/hrnet/sa_simdr/w48_256x192_adam_lr1e-3_split2_sigma4.yaml\

*Note: After using SimDR, the decovonlution layers of SimpleBaseline can be reserved or removed.

Training on MPII dataset

To train with SimDR as keypoint coordinate representation :

python tools/train.py \
    --cfg experiments/mpii/hrnet/simdr/norm_w32_256x256_adam_lr1e-3_ls2e1.yaml

To train with SimDR* as keypoint coordinate representation :

python tools/train.py \
    --cfg experiments/mpii/hrnet/sa_simdr/w32_256x256_adam_lr1e-3_split2_sigma6.yaml

4. Testing

Testing on COCO val2017 dataset using model zoo's models

python tools/test.py \
    --cfg experiments/coco/hrnet/simdr/nmt_w48_256x192_adam_lr1e-3.yaml \
    TEST.MODEL_FILE _PATH_TO_CHECKPOINT_ \
    TEST.USE_GT_BBOX False
python tools/test.py \
    --cfg experiments/coco/hrnet/sa_simdr/w48_256x192_adam_lr1e-3_split2_sigma4.yaml \
    TEST.MODEL_FILE _PATH_TO_CHECKPOINT_ \
    TEST.USE_GT_BBOX False

Testing on MPII dataset using model zoo's models

python tools/test.py \
    --cfg experiments/mpii/hrnet/simdr/norm_w32_256x256_adam_lr1e-3_ls2e1.yaml \
    TEST.MODEL_FILE _PATH_TO_CHECKPOINT_ TEST.PCKH_THRE 0.5

Citations

If you use our code or models in your research, please cite with:

@misc{li20212d,
      title={Is 2D Heatmap Representation Even Necessary for Human Pose Estimation?}, 
      author={Yanjie Li and Sen Yang and Shoukui Zhang and Zhicheng Wang and Wankou Yang and Shu-Tao Xia and Erjin Zhou},
      year={2021},
      eprint={2107.03332},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgement

Thanks for the open-source HRNet.

simdr's People

Contributors

leeyegy avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.