hrnet / dekr Goto Github PK

This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

License: MIT License

Python 100.00%

dekr's People

Contributors

Stargazers

Watchers

dekr's Issues

ONNX file generation fails

I am using the script tools/valid.py to export ONNX file. Here is the error stack.

Total Parameters: 29,561,182
----------------------------------------------------------------------------------------------------------------------------------
Total Multiply Adds (For Convolution and Linear Layers only): 45,385,056,256
----------------------------------------------------------------------------------------------------------------------------------
Number of Layers
Conv2d : 395 layers   BatchNorm2d : 343 layers   ReLU : 306 layers   Bottleneck : 4 layers   BasicBlock : 105 layers   Upsample : 31 layers   HighResolutionModule : 8 layers   DeformConv2d : 34 layers   AdaptBlock : 34 layers   
=> loading model from pose_dekr_hrnetw32_coco.pth
/home/vinay/.local/lib/python3.6/site-packages/torch/onnx/symbolic_helper.py:347: UserWarning: You are trying to export the model with onnx:Upsample for ONNX opset version 9. This operator might cause results to not match the expected results by PyTorch.
ONNX's Upsample/Resize operator did not match Pytorch's Interpolation until opset 11. Attributes to determine how to transform the input were added in onnx:Resize in opset 11 to support Pytorch's behavior (like coordinate_transformation_mode and nearest_mode).
We recommend using opset 11 and above for models using this operator. 
  "" + str(_export_onnx_opset_version) + ". "
Traceback (most recent call last):
  File "tools/valid_hpe.py", line 122, in <module>
    main()
  File "tools/valid_hpe.py", line 119, in main
    torch.onnx.export(model, dump_input, "/home/vinay/Downloads/dekr.onnx", verbose=True)
  File "/home/vinay/.local/lib/python3.6/site-packages/torch/onnx/__init__.py", line 276, in export
    custom_opsets, enable_onnx_checker, use_external_data_format)
  File "/home/vinay/.local/lib/python3.6/site-packages/torch/onnx/utils.py", line 94, in export
    use_external_data_format=use_external_data_format)
  File "/home/vinay/.local/lib/python3.6/site-packages/torch/onnx/utils.py", line 698, in _export
    dynamic_axes=dynamic_axes)
  File "/home/vinay/.local/lib/python3.6/site-packages/torch/onnx/utils.py", line 465, in _model_to_graph
    module=module)
  File "/home/vinay/.local/lib/python3.6/site-packages/torch/onnx/utils.py", line 206, in _optimize_graph
    graph = torch._C._jit_pass_onnx(graph, operator_export_type)
  File "/home/vinay/.local/lib/python3.6/site-packages/torch/onnx/__init__.py", line 309, in _run_symbolic_function
    return utils._run_symbolic_function(*args, **kwargs)
  File "/home/vinay/.local/lib/python3.6/site-packages/torch/onnx/utils.py", line 994, in _run_symbolic_function
    return symbolic_fn(g, *inputs, **attrs)
  File "/home/vinay/.local/lib/python3.6/site-packages/torch/onnx/symbolic_opset9.py", line 1777, in slice
    raise RuntimeError("step!=1 is currently not supported")
RuntimeError: step!=1 is currently not supported

I add the following line valid.py to export the model.

torch.onnx.export(model, dump_input, "/home/dekr.onnx", verbose=True)

The same file with the above line works fine in the other repository such as HRNet Image Classification and Human Pose Estimation.

Questions about the salient regions in figure 1

Hi, thanks for sharing your code. Could you tell me how to generate the salient regions in figure 1 of paper? I have noticed that the toolbox you share in paper include many method, such as grad-sam, score-cam. However, these method are focus on the classification task. Is there any modification to make it compatible with your code？

Train costume dataset

Hello, Thank for your great work and open source selflessly.

I repclace coco dataset with my dataset, which without segmentation annotation, some error happened.

  File "/fastdata/computervision/liuxingyu/shared/projects/pose_estimation/DEKR/tools/../lib/dataset/COCOKeypoints.py", line 47, in __getitem__
    mask = self.get_mask(anno, image_info)
  File "/fastdata/computervision/liuxingyu/shared/projects/pose_estimation/DEKR/tools/../lib/dataset/COCOKeypoints.py", line 110, in get_mask
    obj['segmentation'], img_info['height'], img_info['width'])
KeyError: 'segmentation'

Could I replace segmentation with bbox annotation and then to train ? Will it affect model badly ?

Hi, can this network be used for hand joint point detection?

How to run DEKR in real-time

in default DEKR only allow to play video clip for pose estimation, Now i want to modify the code to detect pose in real time, and the problem is not responding.
Any way to fix it?

无法训练自己的数据

训练自己的数据的时候，修改了joint_nums 依然无法训练，很好奇这是为啥。
提示：“the number of joints should be 22” （我的数据是21个关键点）

在debug的时候，CocoKeypoints.py 中的 joints_list 一会18 一会22。

set_epoch for DistributedSampler

Describe the bug
PyTorch example suggests the use set_epoch function for DistributedSampler class before each epoch start. I could not find anywhere in your code.

https://github.com/pytorch/examples/blob/master/imagenet/main.py
Line 232-234

As can be seen from the DistributedSampler class code (https://github.com/pytorch/pytorch/blob/master/torch/utils/data/distributed.py), the set_epoch function is required to set the seed for each iter function call.

Can you confirm if this function has been called on DistributedSampler (for training dataset) at some point in your code?

Copyright Claim: I ask the same question as @ananyahjha93 did. Hence I copied and slight modified his post here: Lightning-AI/pytorch-lightning#224 (comment)

how can i vis the heatmap_avg

i can't understand the shape of heatmap_avg
such as: i input 320x240x3 image into hrnetw32_coco.pth and then the size of heatmap_avg is 6488064.
how can i vis the heatmap_avg?

RutimeError:CUDA out of memory. when i training on COCO train2017 dataset

when i run python tools/train.py --cfg experiments/coco/w32/w32_4x_reg03_bs10_512_adam_lr1e-3_coco_x140.yaml
RuntimeError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 10.91 GiB total capacity; 9.39 GiB already allocated; 125.50 MiB free; 9.48 GiB reserved in total by PyTorch) , i didn't find batch_size in the w32_4x_reg03_bs10_512_adam_lr1e-3_coco_x140.yaml? is there anything else that need to be modified?

Take this code to pretrain the model to extract something similar to Market1501

I am a rookie, I tried to demonstrate the inference_demo.py in this paper on a pedestrian recognition dataset such as Market1501, but the effect is not very good. Have you tried it? Or am I changing some parameters incorrectly? Welcome to discuss together! Thank you!

Does the code call the CocoKeypoints (CocoDataset) class? I found that this class was not created in the code and was automatically called during training.

about training loss

The loss of training is very small and decreases slowly when i finetune model on cocodatasets. Is't normal？
and how many epoches do you set?

Can this work achieve PoseTrack?or just pose estimation?Thanks.

About AdaptBlock

I notice that in the implement of AdaptBlock at line 126 show that:

        offset = torch.matmul(transform_matrix, self.regular_matrix)
        offset = offset-self.regular_matrix
        offset = offset.transpose(1,2).reshape((N,H,W,18)).permute(0,3,1,2)

why offset need subtract self.regular_matrix ?

about offset loss, difficult to convergence?

Hi, I am reproducing your work. I found that the offset loss is very difficult to convergence. The offset value is about 1000~2000. I think it's very abnormal. I don't know why this happen, could you help me?

Thank you very much.

In the using inference demo, where is the video file/ multi_ people. mp4?

Why not use mosaic, Have you tried it? 为什么不使用马赛克增强，有试过吗

Have you ever used mosaic in your work and how effective have they been?

Unable to download the pre training model, please share the download link

can it train on other dataset like posetrack?

can't download pretrained model

When I finished downloading on OneDrive, I got a ZIP error

RuntimeError:cuDNN error:CUDNN_STATUS_NOT_INITIALIZED

when I run the inference_demo.py as README.md, there was a problem "RuntimeError : cuDNN error : CUDNN_STATUS_NOT_INITIALIZED". How to solve the problem?

Why do you re create the mask without using the label of the heatmap and the label of the paf as the mask

HRNet or HrHRnet, which backbone do you use?

Hi~ Your paper uses HRNet while this repo seems to use HrHRNet at

DEKR/lib/models/hrnet_dekr.py

Line 33 in b3904a4

class PoseHigherResolutionNet(nn.Module):

Could you help me figure it out? Thanks!

JOINT_COCO_LINK_1 and JOINT_COCO_LINK_2

What does that mean ? lib/utils/rescore.py
JOINT_COCO_LINK_1 = [0, 0, 1, 1, 2, 3, 4, 5, 5, 5, 6, 6, 7, 8, 11, 11, 12, 13, 14]
JOINT_COCO_LINK_2 = [1, 2, 2, 3, 4, 5, 6, 6, 7, 11, 8, 12, 9, 10, 12, 13, 14, 15, 16]
How did you get these parameters？

How to test on a single image？ I‘m confused to use the offsets.

vision

Can you upload the code on how to visualize the attention map, I don't know how to use the mentioned reference tool

Queries regarding inference

Thank you for the amazing work! I wanted to know if there was any instance tracking implemented in the codebase? such as SORT was being used in HRNET. Thanks!
@Gengzigang

When I was training the scoring net, model/pose_coco/pose_dekr_hrnetw32_coco.pth in the command could not be found. It was a file, so I changed it to ‘output/coco_kpt/hrnet_dekr/w32_4x_reg03_bs10_512_adam_lr1e-3_coco_x140/final_state0.pth.tar’. is this command correct, because he reported an error that the picture size is inconsistent? I can't find how to change the picture size. Please help me solve it~

testing on a custom dataset

Hi!
First thanks for your work that show magnificient results.

I would like to use your model on a custom dataset to try out if it is efficient on my challenging pictures. Could you tell what are the different step ?

Thanks a lot!

KeyError: "There is no item named 'val2017/000000397133.jpg' in the archive"

I have prepared the coco2017 dataset as the format of .zip.
But when I run the test command
python tools/valid.py --cfg experiments/coco/w32/w32_4x_reg03_bs10_512_adam_lr1e-3_coco_x140.yaml TEST.MODEL_FILE model/pose_coco/pose_dekr_hrnetw32_coco.pth ,
I faced with the error:
Traceback (most recent call last):
File "tools/valid.py", line 212, in
main()
File "tools/valid.py", line 134, in main
for i, images in enumerate(data_loader):
File "/root/anaconda3/envs/dekr/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 517, in next
data = self._next_data()
File "/root/anaconda3/envs/dekr/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 557, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/root/anaconda3/envs/dekr/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/root/anaconda3/envs/dekr/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/root/lzj/DEKR/tools/../lib/dataset/COCODataset.py", line 104, in getitem
cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION
File "/root/lzj/DEKR/tools/../lib/utils/zipreader.py", line 53, in imread
data = _im_zfile[-1]['zipfile'].read(path_img)
File "/root/anaconda3/envs/dekr/lib/python3.7/zipfile.py", line 1465, in read
with self.open(name, "r", pwd) as fp:
File "/root/anaconda3/envs/dekr/lib/python3.7/zipfile.py", line 1504, in open
zinfo = self.getinfo(name)
File "/root/anaconda3/envs/dekr/lib/python3.7/zipfile.py", line 1431, in getinfo
'There is no item named %r in the archive' % name)
KeyError: "There is no item named 'val2017/000000397133.jpg' in the archive"
Do you have any idea to solve it?
Thanks a lot.

Replicate experiments for W48

Thanks for sharing your work.

I was able to replicate your results using the w32 weights for COCO, but when using the w48 model, the model doesn't perform over 40% mAP with the pre-trained weights or trying to re-train it. Do you have other correct weights for it or should I use a different protocol for training?

Thank you

Can the code generate 19 dimension heatmaps?

Can the code generate 19 dimension heatmaps? What parts need to be modified?I trained with the coco dataset.I only modified the dataset part of yaml.
DATASET:
DATASET: coco_kpt
DATASET_TEST: coco
DATA_FORMAT: zip
FLIP: 0.5
INPUT_SIZE: 512
OUTPUT_SIZE: 64
MAX_NUM_PEOPLE: 30
MAX_ROTATION: 30
MAX_SCALE: 1.5
SCALE_TYPE: 'short'
MAX_TRANSLATE: 40
MIN_SCALE: 0.75
NUM_JOINTS: 18
ROOT: 'data/coco'
TEST: val2017
TRAIN: train2017
OFFSET_RADIUS: 4
SIGMA: 2.0
CENTER_SIGMA: 4.0
BG_WEIGHT: 0.1
issue:
INFO:root:Dataset CocoKeypoints
Number of datapoints: 64115
Root Location: data/coco
Dataset CocoKeypoints
Number of datapoints: 64115
Root Location: data/coco
Traceback (most recent call last):
File "tools/train.py", line 295, in
main()
File "tools/train.py", line 108, in main
mp.spawn(
File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/ubuntu/DEKR-main1/tools/train.py", line 258, in main_worker
do_train(cfg, model, train_loader, loss_factory, optimizer, epoch,
File "/home/ubuntu/DEKR-main1/tools/../lib/core/trainer.py", line 32, in do_train
for i, (image, heatmap, mask, offset, offset_w) in enumerate(data_loader):
File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 517, in next
data = self._next_data()
File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
return self._process_data(data)
File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
data.reraise()
File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
data = fetcher.fetch(index)
File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/ubuntu/DEKR-main1/tools/../lib/dataset/COCOKeypoints.py", line 54, in getitem
joints, area = self.get_joints(anno)
File "/home/ubuntu/DEKR-main1/tools/../lib/dataset/COCOKeypoints.py", line 82, in get_joints
joints[i, :self.num_joints, :3] =
ValueError: could not broadcast input array from shape (17,3) into shape (18,3)

How's the speed for inference time ?

Excellent Work !
just wonder any benchmark for inferece time ?

Thanks !

Confidence score for each joint

Thank you for the amazing work! I am looking to access the confidence score for each joint in the model. Can you please guide me how can I do that? Thanks @Gengzigang

CrowdPose Api bug fixed

Hi, the bug in crowdpose api has been fixed according to Jeff-sjtu/CrowdPose@08cb339 , so the tutorial should remove that part?

installation issue: ncclSystemError: System call (socket, malloc, munmap, etc) failed.

Hi,

thank you for sharing your work.

I'm trying to test DEKR but facing with NCLL issue. When I run the train.py, it returns error:

RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:825, unhandled system error, NCCL version 2.7.8

ncclSystemError: System call (socket, malloc, munmap, etc) failed.

Could you give me some tips to overcome this?

Environment:
CUDA:
GPU:

NVIDIA GTX 1080 Ti PCIE-11GB
NVIDIA GTX 1080 Ti PCIE-11GB
NVIDIA GTX Titan PCIE-12GB
-NVIDIA GTX Titan PCIE-12GB
version: 10.2

System:

OS: Ubuntu 18.04
architecture:
64bit
processor: x86_64
python: 3.6.9

When I execute the first command in python tools/valid.py \ --cfg experiments/coco/rescore_coco.yaml \ TEST.MODEL_FILE model/pose_coco/pose_dekr_hrnetw32.pth, I can't find '../data/rescore_data/rescore_dataset_train_coco_kpt' in it, and an error will be reported after the execution of more than 110000 pictures. I think it's because '../data/rescore_data/rescore_dataset_train_coco_kpt' needs to be created by myself, so I created such a file and modified the path in 'score/data/rescore_data/recore_dataset_train_coco_kpt', but an error is still reported. I hope you can help me solve it. In addition, I have another question. I have changed it three times. Each time I need to run 110000 pictures first, so it takes a lot of time. Is there any way to make him not need to run these 110000 pictures again?

command：

in rescore_coco.yaml：
The path before modification is in the box, and the path after modification is in the ellipse：

I created this myself:

about backbone

why don't you use lite-hrnet as your backbone？ limited performence or lite-hrnet isn't suitable in this paper

ValueError: desired inference fps is 10 but video fps is 0.0

When i run "python tools/inference_demo.py --cfg experiments/coco/inference_demo_coco.yaml
--videoFile ../multi_people.mp4
--outputDir output
--visthre 0.3
TEST.MODEL_FILE model/pose_coco/pose_dekr_hrnetw32.pth"
I get error:
Traceback (most recent call last):
File "tools/inference_demo.py", line 286, in
main()
File "tools/inference_demo.py", line 208, in main
str(args.inferenceFps)+' but video fps is '+str(fps))
ValueError: desired inference fps is 10 but video fps is 0.0
can you tell me why and how to slove the problem

Exhausted GPU memory with batch size of 1

I have 12GB of GPU memory (Titan Xp) and I ran out of memory using a batch size of 1 (when running valid.py). Is this expected? Thanks.

The evaluation result of my running is not good

Hello, Thank for your great work.

I ran the evaluation code with your public trained model following your readme,
but the result is different from your paper.

Arch	AP	Ap .5	AP .75	AP (M)	AP (L)	AR	AR .5	AR .75	AR (M)	AR (L)
hrnet_dekr	0.365	0.484	0.400	0.335	0.463	0.684	0.844	0.732	0.619	0.778

Could you tell me that any error is in the code?
The code is https://github.com/asahiruyoru/dekr_eval

Unable to download the trained model

Unable to download the trained model POSE_DEKR_HRNETW32.pth

inference_demo.py csv_header little bug

Hi, in inference_demo.py line 268, when DATASET_TEST == 'crowdpose', the csv_header should use CROWDPOSE_KEYPOINT_INDEXES but not COCO_KEYPOINT_INDEXES. It's a little bug though the headers don't really matters ;)

How to make the GT offset maps?

Hello,could you tell me how to compute the GT offset maps because I couldn't understand when you form the regression loss,the predict is the local offset while the GT is seems the global offset.Thank you very much.

When I was running the training of crowdpose dataset, there were 'nan' and 'NaN or Inf found in input tensor' problems. I checked the data. This problem is caused by the disappearance or explosion of the gradient. After I adjusted the learning rate, this problem still can not be solved. Please help me.

Although it has this problem, it is still running normally. Is it right to continue running like this?

Visualization

Can key point regression be visualized in code?

Is there any comparison between adaptconv and deformable-conv?

As far as I know, deformable conv also learns the shape or apperance of an object and has been proven effective in various vision tasks. I'm not sure about the major difference between adapt conv and deformable-conv in terms of their design and performance. Looking forward to your reply.

Training custom dataset

Hello, thanks for your excellent work! Actually, I have a dataset containing some keypoints of a stable structure, and now I would like to detect the structure category via the keypoints. Should I directly formulate my dataset following the COCO format？ Or I must change the dataloader code to implement the experiment? Thank you!

hrnet / dekr Goto Github PK

dekr's People

Contributors

Stargazers

Watchers

Forkers

dekr's Issues

Recommend Projects

Recommend Topics

Recommend Org