vitae-transformer / vitpose Goto Github PK

The official repo for [NeurIPS'22] "ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation" and [TPAMI'23] "ViTPose++: Vision Transformer for Generic Body Pose Estimation"

License: Apache License 2.0

Python 99.77% Dockerfile 0.05% Shell 0.19%

deep-learning distillation mae pose-estimation pytorch self-supervised-learning vision-transformer

vitpose's Introduction

ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond

Updates | Introduction | Statement |

Current applications

Image Classification: Please see ViTAE-Transformer for image classification;

Object Detection: Please see ViTAE-Transformer for object detection;

Sementic Segmentation: Please see ViTAE-Transformer for semantic segmentation;

Animal Pose Estimation: Please see ViTAE-Transformer for animal pose estimation;

Matting: Please see ViTAE-Transformer for matting;

Remote Sensing: Please see ViTAE-Transformer for Remote Sensing;

Updates

09/04/2021

The pretrained models for ViTAE on matting and remote sensing are released! Please try and have fun!

24/03/2021

The pretrained models for both ViTAE and ViTAEv2 are released. The code for downstream tasks are also provided for reference.

07/12/2021

The code is released!

19/10/2021

The paper is accepted by Neurips'2021! The code will be released soon!

06/08/2021

The paper is post on arxiv! The code will be made public available once cleaned up.

Introduction

This repository contains the code, models, test results for the paper ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias. It contains several reduction cells and normal cells to introduce scale-invariance and locality into vision transformers. In ViTAEv2, we explore the usage of window attentions without shift operations to obtain a better balance between memory footprint, speed, and performance. We also stack the proposed RC and NC in a multi-stage manner to faciliate the learning on other vision tasks including detection, segmentation, and pose.

Fig.1 - The details of RC and NC design in ViTAE.

Fig.2 - The multi-stage design of ViTAEv2.

Statement

This project is for research purpose only. For any other questions please contact yufei.xu at outlook.com qmzhangzz at hotmail.com .

Citing ViTAE and ViTAEv2

@article{xu2021vitae,
  title={Vitae: Vision transformer advanced by exploring intrinsic inductive bias},
  author={Xu, Yufei and Zhang, Qiming and Zhang, Jing and Tao, Dacheng},
  journal={Advances in Neural Information Processing Systems},
  volume={34},
  year={2021}
}
@article{zhang2022vitaev2,
  title={ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond},
  author={Zhang, Qiming and Xu, Yufei and Zhang, Jing and Tao, Dacheng},
  journal={arXiv preprint arXiv:2202.10108},
  year={2022}
}

vitpose's People

Stargazers

Watchers

Forkers

yangyin2016 jinwook-shim sruthi5797 fosstheory shangdibufashi mornydew nielsrogge chjxu qwopqwop200 gjtjx victoria-brami luczot ykk648 amalaj7 sadjadasghari safwennaimi sonalily trassir winstondeng asdf2kr ak391 chenzhutian d2e19 jiahongwu1995 huismiling tornikeo ucprer vghost2008 aiyongy 111111m acamargofb sehwanyoo wheemyungshin tkpham3105 ligaoqi2 bibersay donghappyyy trellixvulnteam xuliangcs kzwyj c0rvus-ix chunsheng13 dl-vit rogerzhangzz superz678 jamesz309 andrewyguo vatsalrathod16 iantimmis study-ml-cv-nlp-slam exitudio masum06 mpattnaik97 z915287285 zmic kmkmkr frunyang wasedamagina superjay1996 jackie666666 samorange1 chenhuigou skutukov yonigozlan chenchy ferhatsb kulits yhl2018 jesse-vd-linden suki1504 vinace melvinebenezer diwaslamsal rettend mirapurkrabek chhaviilli ebenezero lareinam celeste-cj illiped andyroro x-facto wenyux marcoalves20 quent1fvr seaman1900 l846505908 hcp6897 shubham-goel michellelychan retail-intelligence cjh8817676 charliecr94 miracledance adamyerbin raojiyong using0601 jonasnoki sparklingyueran cjh88888

vitpose's Issues

configure the environment

（Windows 10）When I follow the instructions to configure the environment, I get an error：

The model and loaded state dict do not match exactly

Hi there,

First of all, thank you for reading this issue.

I am testing the following model and get the following error, it seems the config file does not match the pre-trained model. I am not sure what mistakes I have made. Many thanks if anyone could offer any hints.

Results from this repo on MS COCO val set (single-task training)

ViTPose-B | MAE | 256x192 | 75.8 | 81.1 | config | log | Onedrive (here is where I download the pth file)

I used the following command:
bash tools/dist_train.sh /home/zee/ViTPose/ViTPose/configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/ViTPose_base_coco_256x192.py 1 --cfg-options model.pretrained=/home/zee/ViTPose/vitpose-b.pth --seed 0

WARNING:root:The model and loaded state dict do not match exactly

unexpected key in source state_dict: backbone.pos_embed, backbone.patch_embed.proj.weight, backbone.patch_embed.proj.bias, backbone.blocks.0.norm1.weight, backbone.blocks.0.norm1.bias, backbone.blocks.0.attn.qkv.weight, backbone.blocks.0.attn.qkv.bias, backbone.blocks.0.attn.proj.weight, backbone.blocks.0.attn.proj.bias, backbone.blocks.0.norm2.weight, backbone.blocks.0.norm2.bias, backbone.blocks.0.mlp.fc1.weight, backbone.blocks.0.mlp.fc1.bias, backbone.blocks.0.mlp.fc2.weight, backbone.blocks.0.mlp.fc2.bias, backbone.blocks.1.norm1.weight, backbone.blocks.1.norm1.bias, backbone.blocks.1.attn.qkv.weight, backbone.blocks.1.attn.qkv.bias, backbone.blocks.1.attn.proj.weight, backbone.blocks.1.attn.proj.bias, backbone.blocks.1.norm2.weight, backbone.blocks.1.norm2.bias, backbone.blocks.1.mlp.fc1.weight, backbone.blocks.1.mlp.fc1.bias, backbone.blocks.1.mlp.fc2.weight, backbone.blocks.1.mlp.fc2.bias, backbone.blocks.2.norm1.weight, backbone.blocks.2.norm1.bias, backbone.blocks.2.attn.qkv.weight, backbone.blocks.2.attn.qkv.bias, backbone.blocks.2.attn.proj.weight, backbone.blocks.2.attn.proj.bias, backbone.blocks.2.norm2.weight, backbone.blocks.2.norm2.bias, backbone.blocks.2.mlp.fc1.weight, backbone.blocks.2.mlp.fc1.bias, backbone.blocks.2.mlp.fc2.weight, backbone.blocks.2.mlp.fc2.bias, backbone.blocks.3.norm1.weight, backbone.blocks.3.norm1.bias, backbone.blocks.3.attn.qkv.weight, backbone.blocks.3.attn.qkv.bias, backbone.blocks.3.attn.proj.weight, backbone.blocks.3.attn.proj.bias, backbone.blocks.3.norm2.weight, backbone.blocks.3.norm2.bias, backbone.blocks.3.mlp.fc1.weight, backbone.blocks.3.mlp.fc1.bias, backbone.blocks.3.mlp.fc2.weight, backbone.blocks.3.mlp.fc2.bias, backbone.blocks.4.norm1.weight, backbone.blocks.4.norm1.bias, backbone.blocks.4.attn.qkv.weight, backbone.blocks.4.attn.qkv.bias, backbone.blocks.4.attn.proj.weight, backbone.blocks.4.attn.proj.bias, backbone.blocks.4.norm2.weight, backbone.blocks.4.norm2.bias, backbone.blocks.4.mlp.fc1.weight, backbone.blocks.4.mlp.fc1.bias, backbone.blocks.4.mlp.fc2.weight, backbone.blocks.4.mlp.fc2.bias, backbone.blocks.5.norm1.weight, backbone.blocks.5.norm1.bias, backbone.blocks.5.attn.qkv.weight, backbone.blocks.5.attn.qkv.bias, backbone.blocks.5.attn.proj.weight, backbone.blocks.5.attn.proj.bias, backbone.blocks.5.norm2.weight, backbone.blocks.5.norm2.bias, backbone.blocks.5.mlp.fc1.weight, backbone.blocks.5.mlp.fc1.bias, backbone.blocks.5.mlp.fc2.weight, backbone.blocks.5.mlp.fc2.bias, backbone.blocks.6.norm1.weight, backbone.blocks.6.norm1.bias, backbone.blocks.6.attn.qkv.weight, backbone.blocks.6.attn.qkv.bias, backbone.blocks.6.attn.proj.weight, backbone.blocks.6.attn.proj.bias, backbone.blocks.6.norm2.weight, backbone.blocks.6.norm2.bias, backbone.blocks.6.mlp.fc1.weight, backbone.blocks.6.mlp.fc1.bias, backbone.blocks.6.mlp.fc2.weight, backbone.blocks.6.mlp.fc2.bias, backbone.blocks.7.norm1.weight, backbone.blocks.7.norm1.bias, backbone.blocks.7.attn.qkv.weight, backbone.blocks.7.attn.qkv.bias, backbone.blocks.7.attn.proj.weight, backbone.blocks.7.attn.proj.bias, backbone.blocks.7.norm2.weight, backbone.blocks.7.norm2.bias, backbone.blocks.7.mlp.fc1.weight, backbone.blocks.7.mlp.fc1.bias, backbone.blocks.7.mlp.fc2.weight, backbone.blocks.7.mlp.fc2.bias, backbone.blocks.8.norm1.weight, backbone.blocks.8.norm1.bias, backbone.blocks.8.attn.qkv.weight, backbone.blocks.8.attn.qkv.bias, backbone.blocks.8.attn.proj.weight, backbone.blocks.8.attn.proj.bias, backbone.blocks.8.norm2.weight, backbone.blocks.8.norm2.bias, backbone.blocks.8.mlp.fc1.weight, backbone.blocks.8.mlp.fc1.bias, backbone.blocks.8.mlp.fc2.weight, backbone.blocks.8.mlp.fc2.bias, backbone.blocks.9.norm1.weight, backbone.blocks.9.norm1.bias, backbone.blocks.9.attn.qkv.weight, backbone.blocks.9.attn.qkv.bias, backbone.blocks.9.attn.proj.weight, backbone.blocks.9.attn.proj.bias, backbone.blocks.9.norm2.weight, backbone.blocks.9.norm2.bias, backbone.blocks.9.mlp.fc1.weight, backbone.blocks.9.mlp.fc1.bias, backbone.blocks.9.mlp.fc2.weight, backbone.blocks.9.mlp.fc2.bias, backbone.blocks.10.norm1.weight, backbone.blocks.10.norm1.bias, backbone.blocks.10.attn.qkv.weight, backbone.blocks.10.attn.qkv.bias, backbone.blocks.10.attn.proj.weight, backbone.blocks.10.attn.proj.bias, backbone.blocks.10.norm2.weight, backbone.blocks.10.norm2.bias, backbone.blocks.10.mlp.fc1.weight, backbone.blocks.10.mlp.fc1.bias, backbone.blocks.10.mlp.fc2.weight, backbone.blocks.10.mlp.fc2.bias, backbone.blocks.11.norm1.weight, backbone.blocks.11.norm1.bias, backbone.blocks.11.attn.qkv.weight, backbone.blocks.11.attn.qkv.bias, backbone.blocks.11.attn.proj.weight, backbone.blocks.11.attn.proj.bias, backbone.blocks.11.norm2.weight, backbone.blocks.11.norm2.bias, backbone.blocks.11.mlp.fc1.weight, backbone.blocks.11.mlp.fc1.bias, backbone.blocks.11.mlp.fc2.weight, backbone.blocks.11.mlp.fc2.bias, backbone.blocks.12.norm1.weight, backbone.blocks.12.norm1.bias, backbone.blocks.12.attn.qkv.weight, backbone.blocks.12.attn.qkv.bias, backbone.blocks.12.attn.proj.weight, backbone.blocks.12.attn.proj.bias, backbone.blocks.12.norm2.weight, backbone.blocks.12.norm2.bias, backbone.blocks.12.mlp.fc1.weight, backbone.blocks.12.mlp.fc1.bias, backbone.blocks.12.mlp.fc2.weight, backbone.blocks.12.mlp.fc2.bias, backbone.blocks.13.norm1.weight, backbone.blocks.13.norm1.bias, backbone.blocks.13.attn.qkv.weight, backbone.blocks.13.attn.qkv.bias, backbone.blocks.13.attn.proj.weight, backbone.blocks.13.attn.proj.bias, backbone.blocks.13.norm2.weight, backbone.blocks.13.norm2.bias, backbone.blocks.13.mlp.fc1.weight, backbone.blocks.13.mlp.fc1.bias, backbone.blocks.13.mlp.fc2.weight, backbone.blocks.13.mlp.fc2.bias, backbone.blocks.14.norm1.weight, backbone.blocks.14.norm1.bias, backbone.blocks.14.attn.qkv.weight, backbone.blocks.14.attn.qkv.bias, backbone.blocks.14.attn.proj.weight, backbone.blocks.14.attn.proj.bias, backbone.blocks.14.norm2.weight, backbone.blocks.14.norm2.bias, backbone.blocks.14.mlp.fc1.weight, backbone.blocks.14.mlp.fc1.bias, backbone.blocks.14.mlp.fc2.weight, backbone.blocks.14.mlp.fc2.bias, backbone.blocks.15.norm1.weight, backbone.blocks.15.norm1.bias, backbone.blocks.15.attn.qkv.weight, backbone.blocks.15.attn.qkv.bias, backbone.blocks.15.attn.proj.weight, backbone.blocks.15.attn.proj.bias, backbone.blocks.15.norm2.weight, backbone.blocks.15.norm2.bias, backbone.blocks.15.mlp.fc1.weight, backbone.blocks.15.mlp.fc1.bias, backbone.blocks.15.mlp.fc2.weight, backbone.blocks.15.mlp.fc2.bias, backbone.blocks.16.norm1.weight, backbone.blocks.16.norm1.bias, backbone.blocks.16.attn.qkv.weight, backbone.blocks.16.attn.qkv.bias, backbone.blocks.16.attn.proj.weight, backbone.blocks.16.attn.proj.bias, backbone.blocks.16.norm2.weight, backbone.blocks.16.norm2.bias, backbone.blocks.16.mlp.fc1.weight, backbone.blocks.16.mlp.fc1.bias, backbone.blocks.16.mlp.fc2.weight, backbone.blocks.16.mlp.fc2.bias, backbone.blocks.17.norm1.weight, backbone.blocks.17.norm1.bias, backbone.blocks.17.attn.qkv.weight, backbone.blocks.17.attn.qkv.bias, backbone.blocks.17.attn.proj.weight, backbone.blocks.17.attn.proj.bias, backbone.blocks.17.norm2.weight, backbone.blocks.17.norm2.bias, backbone.blocks.17.mlp.fc1.weight, backbone.blocks.17.mlp.fc1.bias, backbone.blocks.17.mlp.fc2.weight, backbone.blocks.17.mlp.fc2.bias, backbone.blocks.18.norm1.weight, backbone.blocks.18.norm1.bias, backbone.blocks.18.attn.qkv.weight, backbone.blocks.18.attn.qkv.bias, backbone.blocks.18.attn.proj.weight, backbone.blocks.18.attn.proj.bias, backbone.blocks.18.norm2.weight, backbone.blocks.18.norm2.bias, backbone.blocks.18.mlp.fc1.weight, backbone.blocks.18.mlp.fc1.bias, backbone.blocks.18.mlp.fc2.weight, backbone.blocks.18.mlp.fc2.bias, backbone.blocks.19.norm1.weight, backbone.blocks.19.norm1.bias, backbone.blocks.19.attn.qkv.weight, backbone.blocks.19.attn.qkv.bias, backbone.blocks.19.attn.proj.weight, backbone.blocks.19.attn.proj.bias, backbone.blocks.19.norm2.weight, backbone.blocks.19.norm2.bias, backbone.blocks.19.mlp.fc1.weight, backbone.blocks.19.mlp.fc1.bias, backbone.blocks.19.mlp.fc2.weight, backbone.blocks.19.mlp.fc2.bias, backbone.blocks.20.norm1.weight, backbone.blocks.20.norm1.bias, backbone.blocks.20.attn.qkv.weight, backbone.blocks.20.attn.qkv.bias, backbone.blocks.20.attn.proj.weight, backbone.blocks.20.attn.proj.bias, backbone.blocks.20.norm2.weight, backbone.blocks.20.norm2.bias, backbone.blocks.20.mlp.fc1.weight, backbone.blocks.20.mlp.fc1.bias, backbone.blocks.20.mlp.fc2.weight, backbone.blocks.20.mlp.fc2.bias, backbone.blocks.21.norm1.weight, backbone.blocks.21.norm1.bias, backbone.blocks.21.attn.qkv.weight, backbone.blocks.21.attn.qkv.bias, backbone.blocks.21.attn.proj.weight, backbone.blocks.21.attn.proj.bias, backbone.blocks.21.norm2.weight, backbone.blocks.21.norm2.bias, backbone.blocks.21.mlp.fc1.weight, backbone.blocks.21.mlp.fc1.bias, backbone.blocks.21.mlp.fc2.weight, backbone.blocks.21.mlp.fc2.bias, backbone.blocks.22.norm1.weight, backbone.blocks.22.norm1.bias, backbone.blocks.22.attn.qkv.weight, backbone.blocks.22.attn.qkv.bias, backbone.blocks.22.attn.proj.weight, backbone.blocks.22.attn.proj.bias, backbone.blocks.22.norm2.weight, backbone.blocks.22.norm2.bias, backbone.blocks.22.mlp.fc1.weight, backbone.blocks.22.mlp.fc1.bias, backbone.blocks.22.mlp.fc2.weight, backbone.blocks.22.mlp.fc2.bias, backbone.blocks.23.norm1.weight, backbone.blocks.23.norm1.bias, backbone.blocks.23.attn.qkv.weight, backbone.blocks.23.attn.qkv.bias, backbone.blocks.23.attn.proj.weight, backbone.blocks.23.attn.proj.bias, backbone.blocks.23.norm2.weight, backbone.blocks.23.norm2.bias, backbone.blocks.23.mlp.fc1.weight, backbone.blocks.23.mlp.fc1.bias, backbone.blocks.23.mlp.fc2.weight, backbone.blocks.23.mlp.fc2.bias, backbone.last_norm.weight, backbone.last_norm.bias, keypoint_head.deconv_layers.0.weight, keypoint_head.deconv_layers.1.weight, keypoint_head.deconv_layers.1.bias, keypoint_head.deconv_layers.1.running_mean, keypoint_head.deconv_layers.1.running_var, keypoint_head.deconv_layers.1.num_batches_tracked, keypoint_head.deconv_layers.3.weight, keypoint_head.deconv_layers.4.weight, keypoint_head.deconv_layers.4.bias, keypoint_head.deconv_layers.4.running_mean, keypoint_head.deconv_layers.4.running_var, keypoint_head.deconv_layers.4.num_batches_tracked, keypoint_head.final_layer.weight, keypoint_head.final_layer.bias

missing keys in source state_dict: pos_embed, patch_embed.proj.weight, patch_embed.proj.bias, blocks.0.norm1.weight, blocks.0.norm1.bias, blocks.0.attn.qkv.weight, blocks.0.attn.qkv.bias, blocks.0.attn.proj.weight, blocks.0.attn.proj.bias, blocks.0.norm2.weight, blocks.0.norm2.bias, blocks.0.mlp.fc1.weight, blocks.0.mlp.fc1.bias, blocks.0.mlp.fc2.weight, blocks.0.mlp.fc2.bias, blocks.1.norm1.weight, blocks.1.norm1.bias, blocks.1.attn.qkv.weight, blocks.1.attn.qkv.bias, blocks.1.attn.proj.weight, blocks.1.attn.proj.bias, blocks.1.norm2.weight, blocks.1.norm2.bias, blocks.1.mlp.fc1.weight, blocks.1.mlp.fc1.bias, blocks.1.mlp.fc2.weight, blocks.1.mlp.fc2.bias, blocks.2.norm1.weight, blocks.2.norm1.bias, blocks.2.attn.qkv.weight, blocks.2.attn.qkv.bias, blocks.2.attn.proj.weight, blocks.2.attn.proj.bias, blocks.2.norm2.weight, blocks.2.norm2.bias, blocks.2.mlp.fc1.weight, blocks.2.mlp.fc1.bias, blocks.2.mlp.fc2.weight, blocks.2.mlp.fc2.bias, blocks.3.norm1.weight, blocks.3.norm1.bias, blocks.3.attn.qkv.weight, blocks.3.attn.qkv.bias, blocks.3.attn.proj.weight, blocks.3.attn.proj.bias, blocks.3.norm2.weight, blocks.3.norm2.bias, blocks.3.mlp.fc1.weight, blocks.3.mlp.fc1.bias, blocks.3.mlp.fc2.weight, blocks.3.mlp.fc2.bias, blocks.4.norm1.weight, blocks.4.norm1.bias, blocks.4.attn.qkv.weight, blocks.4.attn.qkv.bias, blocks.4.attn.proj.weight, blocks.4.attn.proj.bias, blocks.4.norm2.weight, blocks.4.norm2.bias, blocks.4.mlp.fc1.weight, blocks.4.mlp.fc1.bias, blocks.4.mlp.fc2.weight, blocks.4.mlp.fc2.bias, blocks.5.norm1.weight, blocks.5.norm1.bias, blocks.5.attn.qkv.weight, blocks.5.attn.qkv.bias, blocks.5.attn.proj.weight, blocks.5.attn.proj.bias, blocks.5.norm2.weight, blocks.5.norm2.bias, blocks.5.mlp.fc1.weight, blocks.5.mlp.fc1.bias, blocks.5.mlp.fc2.weight, blocks.5.mlp.fc2.bias, blocks.6.norm1.weight, blocks.6.norm1.bias, blocks.6.attn.qkv.weight, blocks.6.attn.qkv.bias, blocks.6.attn.proj.weight, blocks.6.attn.proj.bias, blocks.6.norm2.weight, blocks.6.norm2.bias, blocks.6.mlp.fc1.weight, blocks.6.mlp.fc1.bias, blocks.6.mlp.fc2.weight, blocks.6.mlp.fc2.bias, blocks.7.norm1.weight, blocks.7.norm1.bias, blocks.7.attn.qkv.weight, blocks.7.attn.qkv.bias, blocks.7.attn.proj.weight, blocks.7.attn.proj.bias, blocks.7.norm2.weight, blocks.7.norm2.bias, blocks.7.mlp.fc1.weight, blocks.7.mlp.fc1.bias, blocks.7.mlp.fc2.weight, blocks.7.mlp.fc2.bias, blocks.8.norm1.weight, blocks.8.norm1.bias, blocks.8.attn.qkv.weight, blocks.8.attn.qkv.bias, blocks.8.attn.proj.weight, blocks.8.attn.proj.bias, blocks.8.norm2.weight, blocks.8.norm2.bias, blocks.8.mlp.fc1.weight, blocks.8.mlp.fc1.bias, blocks.8.mlp.fc2.weight, blocks.8.mlp.fc2.bias, blocks.9.norm1.weight, blocks.9.norm1.bias, blocks.9.attn.qkv.weight, blocks.9.attn.qkv.bias, blocks.9.attn.proj.weight, blocks.9.attn.proj.bias, blocks.9.norm2.weight, blocks.9.norm2.bias, blocks.9.mlp.fc1.weight, blocks.9.mlp.fc1.bias, blocks.9.mlp.fc2.weight, blocks.9.mlp.fc2.bias, blocks.10.norm1.weight, blocks.10.norm1.bias, blocks.10.attn.qkv.weight, blocks.10.attn.qkv.bias, blocks.10.attn.proj.weight, blocks.10.attn.proj.bias, blocks.10.norm2.weight, blocks.10.norm2.bias, blocks.10.mlp.fc1.weight, blocks.10.mlp.fc1.bias, blocks.10.mlp.fc2.weight, blocks.10.mlp.fc2.bias, blocks.11.norm1.weight, blocks.11.norm1.bias, blocks.11.attn.qkv.weight, blocks.11.attn.qkv.bias, blocks.11.attn.proj.weight, blocks.11.attn.proj.bias, blocks.11.norm2.weight, blocks.11.norm2.bias, blocks.11.mlp.fc1.weight, blocks.11.mlp.fc1.bias, blocks.11.mlp.fc2.weight, blocks.11.mlp.fc2.bias, blocks.12.norm1.weight, blocks.12.norm1.bias, blocks.12.attn.qkv.weight, blocks.12.attn.qkv.bias, blocks.12.attn.proj.weight, blocks.12.attn.proj.bias, blocks.12.norm2.weight, blocks.12.norm2.bias, blocks.12.mlp.fc1.weight, blocks.12.mlp.fc1.bias, blocks.12.mlp.fc2.weight, blocks.12.mlp.fc2.bias, blocks.13.norm1.weight, blocks.13.norm1.bias, blocks.13.attn.qkv.weight, blocks.13.attn.qkv.bias, blocks.13.attn.proj.weight, blocks.13.attn.proj.bias, blocks.13.norm2.weight, blocks.13.norm2.bias, blocks.13.mlp.fc1.weight, blocks.13.mlp.fc1.bias, blocks.13.mlp.fc2.weight, blocks.13.mlp.fc2.bias, blocks.14.norm1.weight, blocks.14.norm1.bias, blocks.14.attn.qkv.weight, blocks.14.attn.qkv.bias, blocks.14.attn.proj.weight, blocks.14.attn.proj.bias, blocks.14.norm2.weight, blocks.14.norm2.bias, blocks.14.mlp.fc1.weight, blocks.14.mlp.fc1.bias, blocks.14.mlp.fc2.weight, blocks.14.mlp.fc2.bias, blocks.15.norm1.weight, blocks.15.norm1.bias, blocks.15.attn.qkv.weight, blocks.15.attn.qkv.bias, blocks.15.attn.proj.weight, blocks.15.attn.proj.bias, blocks.15.norm2.weight, blocks.15.norm2.bias, blocks.15.mlp.fc1.weight, blocks.15.mlp.fc1.bias, blocks.15.mlp.fc2.weight, blocks.15.mlp.fc2.bias, blocks.16.norm1.weight, blocks.16.norm1.bias, blocks.16.attn.qkv.weight, blocks.16.attn.qkv.bias, blocks.16.attn.proj.weight, blocks.16.attn.proj.bias, blocks.16.norm2.weight, blocks.16.norm2.bias, blocks.16.mlp.fc1.weight, blocks.16.mlp.fc1.bias, blocks.16.mlp.fc2.weight, blocks.16.mlp.fc2.bias, blocks.17.norm1.weight, blocks.17.norm1.bias, blocks.17.attn.qkv.weight, blocks.17.attn.qkv.bias, blocks.17.attn.proj.weight, blocks.17.attn.proj.bias, blocks.17.norm2.weight, blocks.17.norm2.bias, blocks.17.mlp.fc1.weight, blocks.17.mlp.fc1.bias, blocks.17.mlp.fc2.weight, blocks.17.mlp.fc2.bias, blocks.18.norm1.weight, blocks.18.norm1.bias, blocks.18.attn.qkv.weight, blocks.18.attn.qkv.bias, blocks.18.attn.proj.weight, blocks.18.attn.proj.bias, blocks.18.norm2.weight, blocks.18.norm2.bias, blocks.18.mlp.fc1.weight, blocks.18.mlp.fc1.bias, blocks.18.mlp.fc2.weight, blocks.18.mlp.fc2.bias, blocks.19.norm1.weight, blocks.19.norm1.bias, blocks.19.attn.qkv.weight, blocks.19.attn.qkv.bias, blocks.19.attn.proj.weight, blocks.19.attn.proj.bias, blocks.19.norm2.weight, blocks.19.norm2.bias, blocks.19.mlp.fc1.weight, blocks.19.mlp.fc1.bias, blocks.19.mlp.fc2.weight, blocks.19.mlp.fc2.bias, blocks.20.norm1.weight, blocks.20.norm1.bias, blocks.20.attn.qkv.weight, blocks.20.attn.qkv.bias, blocks.20.attn.proj.weight, blocks.20.attn.proj.bias, blocks.20.norm2.weight, blocks.20.norm2.bias, blocks.20.mlp.fc1.weight, blocks.20.mlp.fc1.bias, blocks.20.mlp.fc2.weight, blocks.20.mlp.fc2.bias, blocks.21.norm1.weight, blocks.21.norm1.bias, blocks.21.attn.qkv.weight, blocks.21.attn.qkv.bias, blocks.21.attn.proj.weight, blocks.21.attn.proj.bias, blocks.21.norm2.weight, blocks.21.norm2.bias, blocks.21.mlp.fc1.weight, blocks.21.mlp.fc1.bias, blocks.21.mlp.fc2.weight, blocks.21.mlp.fc2.bias, blocks.22.norm1.weight, blocks.22.norm1.bias, blocks.22.attn.qkv.weight, blocks.22.attn.qkv.bias, blocks.22.attn.proj.weight, blocks.22.attn.proj.bias, blocks.22.norm2.weight, blocks.22.norm2.bias, blocks.22.mlp.fc1.weight, blocks.22.mlp.fc1.bias, blocks.22.mlp.fc2.weight, blocks.22.mlp.fc2.bias, blocks.23.norm1.weight, blocks.23.norm1.bias, blocks.23.attn.qkv.weight, blocks.23.attn.qkv.bias, blocks.23.attn.proj.weight, blocks.23.attn.proj.bias, blocks.23.norm2.weight, blocks.23.norm2.bias, blocks.23.mlp.fc1.weight, blocks.23.mlp.fc1.bias, blocks.23.mlp.fc2.weight, blocks.23.mlp.fc2.bias, last_norm.weight, last_norm.bias

Bottom up vs top down model

Hi, can someone explain how the bottom up Vitpose model work? Can you give an example with VITpose_B. I am instrested in the smallest, fastest single person pose model among all while preserving decent accuracy on COCO. Would it be ViTpose_b in bottom up or top down manner?

Reproduce the video results in README

Which config and weight file have you used?

What is the full-window attention structure?

It is mentioned in the paper that full-window attention structure is used to reduce memory load, but I did not find the introduction of full-window attention structure. I would like to ask how this structure is realized.？

about the code of the transformer block

Thank you for open this great repo.
Hello, where is the code of the transformer block? I didn't find the corresponding code

I would be greatly appreciated if you could spend some of your time for a reply.

About model size

Hi，I used the pre-trained model you provided for fine-tuning. Performance and speed is competitive.But the size of the model is about three times larger than you.For example, the size of my vitpose-b is 1.xGB, but yours is 343MB. How can i get a same size model?

Can you share the models on GoogleDrive or BaiduDrive?

Thanks for your excellent work. I cannot get access to Onedrive. Is there any possible for you to share the trained models on GoogleDrive or BaiduDrive?

About the object detection method

Hi, can you please mention which object detector you used? I could not find it mentioned anywhere.

LibTorch version of ViTPose

I tried to follow the instructions for converting a PyTorch model to LibTorch (C++) using the tracing instructions found here:
https://pytorch.org/tutorials/advanced/cpp_export.html
but I ran into some difficulties.

Has anyone else managed to generate a LibTorch version of ViTPose and is it thought to be possible?

Thanks!

Does this model track hand keypoints?

about OmniPose-Lite

When can we use the OmniPose-Lite.I hope for it very much.

model mismatch

Hi, I encountered a mismatch issue when training ViTbase from the pretrained MAE.
'The model and loaded state dict do not match exactly
unexpected key in source state_dict: cls_token, norm.weight, norm.bias
missing keys in source state_dict: last_norm.weight, last_norm.bias'
And
'fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git [...] -- [...]'
'
But the training was not stopped. What actions should I take other than simply loading from the pretrained MAE?

config question

Can you explain what nMS_THr and OKs_THr mean and what they do?
Thank you very much!

Config path

What is the config path in train.sh ?

config.py and weight file to integrate with mmpose framework

Hi
could elaborate on where the corresponding config and weight files for the integration with mmpose framework

Looking Forward the Code

How long until the code is released？
Thanks

KeyError: 'ViT is not in the models registry'

I am trying to run top_down_video_demo_with_mmdet.py with the command:

python demo/top_down_video_demo_with_mmdet.py \
demo/mmdetection_cfg/yolov3_d53_320_273e_coco.py  \
https://download.openmmlab.com/mmdetection/v2.0/yolo/yolov3_d53_320_273e_coco/yolov3_d53_320_273e_coco-421362b6.pth \
configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/ViTPose_huge_coco_256x192.py \
../pretrained/ViTPose-H.pth \
--video-path ../UCF_Videos/Fighting/Fighting018_x264.mp4 \
--out-video-root ../output/test1

However, I am getting the following error:

Traceback (most recent call last):
  File "/home/s2435462/.conda/envs/open-mmlab/lib/python3.9/site-packages/mmcv/utils/registry.py", line 69, in build_from_cfg
    return obj_cls(**args)
  File "/home/s2435462/HRC/ViTPose/mmpose/mmpose/models/detectors/top_down.py", line 48, in __init__
    self.backbone = builder.build_backbone(backbone)
  File "/home/s2435462/HRC/ViTPose/mmpose/mmpose/models/builder.py", line 19, in build_backbone
    return BACKBONES.build(cfg)
  File "/home/s2435462/.conda/envs/open-mmlab/lib/python3.9/site-packages/mmcv/utils/registry.py", line 237, in build
    return self.build_func(*args, **kwargs, registry=self)
  File "/home/s2435462/.conda/envs/open-mmlab/lib/python3.9/site-packages/mmcv/cnn/builder.py", line 27, in build_model_from_cfg
    return build_from_cfg(cfg, registry, default_args)
  File "/home/s2435462/.conda/envs/open-mmlab/lib/python3.9/site-packages/mmcv/utils/registry.py", line 61, in build_from_cfg
    raise KeyError(
KeyError: 'ViT is not in the models registry'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/s2435462/HRC/ViTPose/demo/top_down_video_demo_with_mmdet.py", line 165, in <module>
    main()
  File "/home/s2435462/HRC/ViTPose/demo/top_down_video_demo_with_mmdet.py", line 76, in main
    pose_model = init_pose_model(
  File "/home/s2435462/HRC/ViTPose/mmpose/mmpose/apis/inference.py", line 43, in init_pose_model
    model = build_posenet(config.model)
  File "/home/s2435462/HRC/ViTPose/mmpose/mmpose/models/builder.py", line 39, in build_posenet
    return POSENETS.build(cfg)
  File "/home/s2435462/.conda/envs/open-mmlab/lib/python3.9/site-packages/mmcv/utils/registry.py", line 237, in build
    return self.build_func(*args, **kwargs, registry=self)
  File "/home/s2435462/.conda/envs/open-mmlab/lib/python3.9/site-packages/mmcv/cnn/builder.py", line 27, in build_model_from_cfg
    return build_from_cfg(cfg, registry, default_args)
  File "/home/s2435462/.conda/envs/open-mmlab/lib/python3.9/site-packages/mmcv/utils/registry.py", line 72, in build_from_cfg
    raise type(e)(f'{obj_cls.__name__}: {e}')
KeyError: "TopDown: 'ViT is not in the models registry'"

I installed everything with these commands:

conda create -n open-mmlab python=3.9 -y
conda activate open-mmlab

conda install pytorch torchvision cudatoolkit=11.3 -c pytorch

git clone https://github.com/ViTAE-Transformer/ViTPose.git
cd ViTPose
pip install -v -e .

pip install mmcv-full
pip install mmdet

rm -rf mmpose
git clone https://github.com/open-mmlab/mmpose.git
cd mmpose
pip install -r requirements.txt
pip install -e .

Can someone guide me on how to solve this?

where is optimize attention block ?

Thank you for open this great repo.
In paper table4, compare attention trick, However cant find it in this repo. such as window MSA or shift MSA etc.

Testing on CPU

How to test on CPU?

Setting number of GPUs to 0 don't work.

bash tools/dist_test.sh configs/body/2d_kpt_sview_rgb_img/deeppose/coco/res101_coco_256x192.py  ../weights/mae_pretrain_vit_base.pth 0

Error:

FutureWarning,
Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/ahmad/Desktop/RedBuffer/BowlingAngle/venv/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in <module>
    main()
  File "/home/ahmad/Desktop/RedBuffer/BowlingAngle/venv/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main
    launch(args)
  File "/home/ahmad/Desktop/RedBuffer/BowlingAngle/venv/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch
    run(args)
  File "/home/ahmad/Desktop/RedBuffer/BowlingAngle/venv/lib/python3.7/site-packages/torch/distributed/run.py", line 718, in run
    )(*cmd_args)
  File "/home/ahmad/Desktop/RedBuffer/BowlingAngle/venv/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/ahmad/Desktop/RedBuffer/BowlingAngle/venv/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 225, in launch_agent
    master_port=master_port,
  File "<string>", line 15, in __init__
  File "/home/ahmad/Desktop/RedBuffer/BowlingAngle/venv/lib/python3.7/site-packages/torch/distributed/elastic/agent/server/api.py", line 87, in __post_init__
    assert self.local_world_size > 0
AssertionError

Running video demo

Hello,

I tried to run the video demo using mmdet:

python demo/top_down_pose_tracking_demo_with_mmdet.py ./demo/mmdetection_cfg/faster_rcnn_r50_fpn_coco.py ./faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/ViTPose_base_coco_256x192.py ./vitpose-b.pt --video-path ./test.MOV --out-video-root ./output_video/
but I have errors due to the version compatibility between mmcv, mmdet et the current ViTPose (or mmpose) version.

So here what I do, I install mmcv from sources (1.3.9 version as recommended in the read me of this repo) and the mmdet from sources as well (I tried last mmdet, mmdet==2.14.0 as it is recommended in the mminstall.txt for mmpose 0.24.0: ['mmcv-full>=1.3.8', 'mmdet>=2.14.0', 'mmtrack>=0.6.0']., and mmdet==2.23.0)

here is what I have with the following versions for example (pip list) :
mmcv 1.3.9
mmdet 2.14.0
mmpose 0.24.0

Note that I use this :
torch 1.11.0+cu113
torchvision 0.12.0+cu113

I got this error :
/home/ubuntu/venv/lib/python3.8/site-packages/mmcv/cnn/bricks/transformer.py:27: UserWarning: Fail to import ``MultiScaleDeformableAttention`` from ``mmcv.ops.multi_scale_deform_attn``, You should install ``mmcv-full`` if you need this module. warnings.warn('Fail to import ``MultiScaleDeformableAttention`` from ' Traceback (most recent call last): File "demo/top_down_pose_tracking_demo_with_mmdet.py", line 190, in <module> main() File "demo/top_down_pose_tracking_demo_with_mmdet.py", line 74, in main assert has_mmdet, 'Please install mmdet to run the demo.' AssertionError: Please install mmdet to run the demo.

when I put mmdet to 2.23.0 I got this error :

AssertionError: MMCV==1.3.9 is used but incompatible. Please install mmcv>=1.3.17, <=1.6.0.

Tried to set mmcv>=1.3.17, did not resolve the problem !

can you please tell us which versions (mmcv and mmdet) are recommended to run ViTPose on videos ?

Config / weights for ViTPose-G

Is it possible to release the config and weights for ViTPose-G?

How to do inference on video by scripts?

Hey! I used the web version demo in #20 to do inference on a video file, but it's super slow! I'm wondering if there's any scripts to do so?

Another question, let's say I have an input image of size 1080x640x3 that contains 10 people. The detector could detect all of them, so after cliping and resizing, the actual data flowing into ViTPose is 10x3x256x192. And your speed #4 (900 fps) is measured on each 256x192x3. Am I correct?

Thanks in advance!

Training on test images when using CrowdPose?

Dear authors, thanks for the exciting work and I'd like to apologize in advance if I misunderstood.

As you may already know, CrowdPose dataset itself is constituted by cherry-picked crowd samples selected from MSCOCO, MPII and AIC, but CrowdPose did not specify if they treated train/val/test images from MSCOCO/MPII/AIC differently. They also re-annotated (presumably more accurately) these samples.

What we have noticed is that many of the test images in "MS COCO val set" also present in "CrowdPose train" and "CrowdPose train/val" splits. Although CrowdPose has renamed all their images, we have identified at least 181 images in "CrowdPose train/val" having the same md5 info as in "MS COCO val set".

For example, "108951.jpg" in "CrowdPose train" and "000000147740.jpg" in "MS COCO val set" are the same image with md5: f9fc120dc085166b30c08da3de333b69

We did not identify any image overlap between CrowdPose and MPII/AIC on md5 level for both train and test images, possibly because CrowdPose did some preprocessing for selected MPII/AIC images, but based on the finding on COCO, the possibility for such train-test overlap with MPII/AIC is notable. We have not checked if "CrowdPose test" images also present in "COCO train set" yet.

So if I did not miss anything, the model jointly trained on COCO+AIC+MPII+CrowdPose would have seen many of the test images (with labels, at least for COCO) during the training process, making the results untrustworthy.

About Inference speed

Are you sure that this method is faster than HRNet?
I have tried both with yolov5 as the detector in trt inference.
HRNet achieves around 30-35 fps while VitPose can reach 7 fps at the same video with trt.
Inference test I have conducted show that hrnet is 6-7 faster when using larger batch sizes for some reason (around 220 fps per target for fp16 and 450 fps for int8) while VitPose achieves around 60 fps per target in trt.

Model in video demo

Hello, I was wondering, what is the model which is on Web Demo for video in HuggingFace? I would like to test that using scripts. Are weights provided for that particular model? Thanks

pretrain model

can you provide us the pretrained MAE or VITAE?

Training device?

Hi, I'd like to re-train this model on my own data, however, out of memory error occurs even samples_per_gpu is set to 1. I'm using gtx 2080ti.

Inference speed

Thank you for the nice work! May I know if you all have done any analysis and comparison for the model's inference speed?

Code for multi task training.

Hi, thanks for this solid work. I'd like to know when you plan to release the code for multi task training

ViTAE-G config

would you like to share ViTAE-G config? Thanks!!!

Will this model work with unseen data?

Will this model work with unseen data (in the wild pose estimation) or does it require further training outside the COCO/AIC/MPII/CrowdPose datasets?

What is the pretrain model of MAE huge?

I found that in MAE project, the huge model has a patch_size = 14. But in the config, the patch size is set to 16. How do you load the MAE pretrained weights?

Running the project

Hi, can anyone summarize the installation setup and the quick start process (for instance using the demo and running the inference). The instructions mentioned in the README.md is confusing for beginners. Thank you !!!

Use ViTPose with Jetson AGX Orin

Hi, thanks for the great work you have done on the pose estimation, I used the deployment script pytorch2onnx.py to convert to onnx and then use trtexec to convert to an engine file, But the output heat map is different when using tensorRT inference,

where the model code?

I can not find the vitpose model in mmpose, can anybody know?

Keypoints absent from model output?

I've been trying to use the demo scripts but keep getting the following error:

Traceback (most recent call last):
  File "demo/top_down_video_demo_with_mmdet.py", line 165, in <module>
    main()
  File "demo/top_down_video_demo_with_mmdet.py", line 125, in main
    pose_results, returned_outputs = inference_top_down_pose_model(
  File "/home/nshah/work/packages/vitpose/mmpose/apis/inference.py", line 415, in inference_top_down_pose_model
    poses, heatmap = _inference_single_pose_model(
  File "/home/nshah/work/packages/vitpose/mmpose/apis/inference.py", line 307, in _inference_single_pose_model
    return result['preds'], result['output_heatmap']
KeyError: 'preds'

The model seemingly outputs only the heatmap and not the actual keypoint predictions. However, I noticed in some of the closed issues that people were able to get some of the demo scripts to work. I'm just wondering whether I'm missing something very obvious.

I'm using this config which does appear to have a keypoint head.

Question about the file vitpose-l-simple.pth.

The model file vitpose-l-simple.pth I downloaded cannot be loaded. I would like to confirm whether it is the problem that I have not downloaded or the problem of the uploaded model itself?

And below is a screenshot of my error.

Looking forward to your reply!

How to load pre-trained model?

Hi , When I was loading the pre-trained model, I used the params of "--resume-from" which followed by a pre-trained model path, I got the err message like this:

2022-06-10 10:51:38,626 - mmpose - INFO - load checkpoint from local path: models/epoch_1.pth Traceback (most recent call last): File "/home/pose/codes/ViTPose/tools/train.py", line 195, in <module> main() File "/home/pose/codes/ViTPose/tools/train.py", line 184, in main train_model( File "/home/pose/codes/ViTPose/mmpose/apis/train.py", line 197, in train_model runner.resume(cfg.resume_from) File "/home/pose/codes/ViTPose/ViT_venv/lib/python3.8/site-packages/mmcv/runner/base_runner.py", line 364, in resume self._iter = checkpoint['meta']['iter'] KeyError: 'iter'

so what's the right way to load a pre-trained model ? thank you for your patience and time !

About multitask log

Thx for the great work! It helps me l lot in my own study. Could you please release the training log of the multitask training? The .log file may better than .json. Thx again.

Test with a resolution different from 256x192

Hi, I want to test with an image at a size of 224x224. Could you please tell me how to modify the position embedding?

Video demo with VitPose Base

Hello, how to run video demo with Vit.Are the demo scripts using any Vit pose models? As it says in demo page - Using mmdet for human bounding box detection. We provide a demo script to run mmdet for human detection, and mmpose for pose estimation. How to use VItPose for videopose estimation?

How to edit number of transformer blocks?

Hi, Can you please point to part in config or scripts to change number of transformer blocks in the network for training?
Thanks

Onnx version of the model

Hi, thank you author for the great work.

I really impress with your work. By any chance could you realese the onnx version of Vitpose model?
I tried to run the vitpose B* but failed many times

What's the batchsize used in the configuration?

Hi, I can only see the 'samples_per_gpu' in the config, but can't find the number of GPU used in the experiment. I wonder what's the actual batch size being used in each experiment.

Demo codes.

Hi, I am very interested in your excellect work and I would like to ask where could I find the codes for the web demo? By the way, where can I get the quantitative intermediate output results of this APP (https://huggingface.co/spaces/Gradio-Blocks/ViTPose), such as detection boxes and keypoints? Looking forward to your reply !

Speed of Detection

Hi, I have only managed to get fps of around 5 fps for the topdown model under 2D pose estimation with GTX 1660 GPU for via demo/webcam.py with video testing. How can i also speed up the inference speed when i use with synchronous mode? Thank you!! :)

Top-down or bottom-up?

Hey! Was reading the paper, impressive stuff.

I was uncertain about what you actually predicted however. Do you do first crop the humans, and then do keypoint estimation (top-down I guess)? Or do you predict all humans at once (bottom-up), and then predict a part-affinity map (or the like) along with the keypoints?

If it's the latter, what exactly does the model output?

Thank you in advance 🙏

Would you provide bottom-up-based pretrained weights of ViTPose?

Thanks for your research contribution and publishing code!

I will be inferencing this model as bottom-up-based keypoints estimation process for research.

When I see code, I found bottom-up-based inferencing code, but I can not found bottom-up-based pretrained weights of ViTPose.

Would you provide bottom-up-based pretrained weights to me
?

AttributeError: 'ConfigDict' object has no attribute 'data'

When I try to run the code below in notebook ->

!bash tools/dist_test.sh /content/ViTPose/configs/body/2d_kpt_sview_rgb_vid/posewarper/posetrack18/hrnet_posetrack18_posewarper.yml /content/mask_rcnn_swin_tiny_patch4_window7_1x.pth 1

as you have mentioned in README.md ->

bash tools/dist_test.sh <Config PATH> <Checkpoint PATH> <NUM GPUs>

I get the error below ->

apex is not installed
apex is not installed
apex is not installed
/usr/local/lib/python3.7/dist-packages/mmcv/cnn/bricks/transformer.py:33: UserWarning: Fail to import MultiScaleDeformableAttention from mmcv.ops.multi_scale_deform_attn, You should install mmcv-full if you need this module.
warnings.warn('Fail to import MultiScaleDeformableAttention from '
Traceback (most recent call last):
File "tools/test.py", line 184, in
main()
File "tools/test.py", line 96, in main
setup_multi_processes(cfg)
File "/content/ViTPose/mmpose/utils/setup_env.py", line 30, in setup_multi_processes
if 'OMP_NUM_THREADS' not in os.environ and cfg.data.workers_per_gpu > 1:
File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/config.py", line 513, in getattr
return getattr(self._cfg_dict, name)
File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/config.py", line 49, in getattr
raise ex
AttributeError: 'ConfigDict' object has no attribute 'data'
Killing subprocess 743
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 340, in
main()
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 326, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 301, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-u', 'tools/test.py', '--local_rank=0', '/content/ViTPose/configs/body/2d_kpt_sview_rgb_vid/posewarper/posetrack18/hrnet_posetrack18_posewarper.yml', '/content/mask_rcnn_swin_tiny_patch4_window7_1x.pth', '--launcher', 'pytorch']' returned non-zero exit status 1.