Light

kyungminjin / hanet Goto Github PK

Kinematic-aware Hierarchical Attention Network for Human Pose Estimation in Videos (WACV 2023)

Python 100.00%

deep-learning pytorch transformer pose-estimation video wacv2023 2d-human-pose 3d-body-recovery 3d-human-pose 3d-pose-estimation

hanet's Introduction

Hello I'm Kyungmin Jin

Profile

🔭 I’m currently in LG Electronics researching artificial intelligence, computer vision field. I got my master's degree in artificial intelligence, Korea University. (Lab: Pattern Recognition and Machine Learning Lab) [PRML]

Research Interests

Designing a novel framework in computer vision domain: In particular, I conducted research on pose estimation architectures based on transformers combined with convolutional neural networks. My research interests are summarized as follows.

Pose estimation
Body mesh recovery
Transformer
Video understanding

More information

Curriculum Vitae

hanet's People

Contributors

Stargazers

Watchers

Forkers

cv-ip qinb sshuster tingtingch

hanet's Issues

Online mutual learning

Thanks for your greate work and sharing!

I didn't find online mutual learning part in code...
And I have a question:
1, where is the refined input pose p'

Best,
yuning

How to make a following graph？

Hello, @KyungMinJin thank you very much for your great open source project. How should you make the curve graph that follows the movement of bone points shown in your project? Looking forward to your reply.

RuntimeError: "baddbmm_cuda" not implemented for 'Int'

Namespace(body_representation='2D', cfg='configs/config_jhmdb_simplebaseline_2D.yaml', dataset_name='jhmdb', estimator='simplebaseline')

Seed value for the experiment is 4321
GPU name -> NVIDIA GeForce RTX 3060 Laptop GPU
GPU feat -> _CudaDeviceProperties(name='NVIDIA GeForce RTX 3060 Laptop GPU', major=8, minor=6, total_memory=6143MB, multi_processor_count=30)
{'BODY_REPRESENTATION': '2D',
'CUDNN': CfgNode({'BENCHMARK': True, 'DETERMINISTIC': False, 'ENABLED': True}),
'DATASET': {'AIST': {'DETECTED_PATH': './data\detected_poses/aist',
'GROUND_TRUTH_PATH': './data\groundtruth_poses/aist',
'KEYPOINT_NUM': 14,
'KEYPOINT_ROOT': [2, 3]},
'H36M': {'DETECTED_PATH': './data\detected_poses/h36m',
'GROUND_TRUTH_PATH': './data\groundtruth_poses/h36m',
'KEYPOINT_NUM': 17,
'KEYPOINT_ROOT': [0]},
'JHMDB': {'DETECTED_PATH': './data\detected_poses/jhmdb',
'GROUND_TRUTH_PATH': './data\groundtruth_poses/jhmdb',
'KEYPOINT_NUM': 15,
'KEYPOINT_ROOT': [2]},
'PW3D': {'DETECTED_PATH': './data\detected_poses/pw3d',
'GROUND_TRUTH_PATH': './data\groundtruth_poses/pw3d',
'KEYPOINT_NUM': 14,
'KEYPOINT_ROOT': [2, 3]}},
'DATASET_NAME': 'jhmdb',
'DEBUG': True,
'DEVICE': 'cuda',
'ESTIMATOR': 'simplebaseline',
'EVALUATE': {'DECODER': False,
'INTERP': 'linear',
'PRETRAINED': 'results/30-08-2022_16-06-59_jhmdb_simplebaseline_N10_10/[email protected]_0.89_checkpoint.pth.tar',
'RELATIVE_IMPROVEMENT': False,
'ROOT_RELATIVE': True,
'SLIDE_WINDOW_STEP_Q': 1,
'SLIDE_WINDOW_STEP_SIZE': 10},
'EXP_NAME': 'jhmdb_simplebaseline_N10_1_256',
'GPUS': ['0'],
'LOG': CfgNode({'NAME': ''}),
'LOGDIR': 'results\29-10-2022_17-15-29_jhmdb_simplebaseline_N10_1_256',
'LOSS': CfgNode({'LAMADA': 5.0, 'W_DECODER': 1.0}),
'MODEL': {'DECODER': 'transformer',
'DECODER_EMBEDDING_DIMENSION': 256,
'DECODER_HEAD': 4,
'DECODER_INTERP': 'linear',
'DECODER_RESIDUAL': True,
'DECODER_TOKEN_WINDOW': 5,
'DECODER_TRANSFORMER_BLOCK': 5,
'DROPOUT': 0.1,
'ENCODER_EMBEDDING_DIMENSION': 256,
'ENCODER_HEAD': 4,
'ENCODER_RESIDUAL': True,
'ENCODER_TRANSFORMER_BLOCK': 5,
'INTERVAL_N': 10,
'NAME': '',
'SAMPLE_TYPE': 'uniform',
'SLIDE_WINDOW': True,
'SLIDE_WINDOW_Q': 1,
'SLIDE_WINDOW_SIZE': 11,
'TYPE': 'network'},
'OUTPUT_DIR': 'results',
'SAMPLE_INTERVAL': 10,
'SEED_VALUE': 4321,
'SMPL_MODEL_DIR': 'data/smpl/',
'TRAIN': {'BATCH_SIZE': 16,
'EPOCH': 70,
'LR': 0.001,
'LRDECAY': 0.95,
'PRE_NORM': False,
'RESUME': None,
'USE_6D_SMPL': False,
'USE_SMPL_LOSS': False,
'VALIDATE': True,
'WORKERS_NUM': 0},
'VIS': {'END': 100,
'INPUT_VIDEO_NUMBER': 160,
'INPUT_VIDEO_PATH': 'data/videos/',
'OUTPUT_VIDEO_PATH': 'demo/',
'START': 0}}
#############################################################
You are loading the [training set] of dataset [jhmdb]
You are using pose esimator [simplebaseline]
The type of the data is [2D]
The frame number is [24372]
The sequence number is [687]
#############################################################
#############################################################
You are loading the [testing set] of dataset [jhmdb]
You are using pose esimator [simplebaseline]
The type of the data is [2D]
The frame number is [9228]
The sequence number is [261]
#############################################################
Slide window: 11
Sample interval: 10

Traceback (most recent call last):
File "train.py", line 109, in
main(cfg)
File "train.py", line 96, in main
Trainer(train_dataloader=train_loader,
File "D:\GitLoadWareHouse\HANet\lib\core\trainer.py", line 71, in run
self.train()
File "D:\GitLoadWareHouse\HANet\lib\core\trainer.py", line 124, in train
predicted_3d_pos, decoderd_3d_pos = self.model(
File "D:\Environment\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "D:\GitLoadWareHouse\HANet\lib\models\HANet.py", line 175, in forward
self.hierarchical_encoder, self.decoder = self.transformer.forward(
File "D:\GitLoadWareHouse\HANet\lib\models\HANet.py", line 335, in forward
output = self.decode(mem, encoder_mask, encoder_pos_embed[0], trans_tgt,
File "D:\GitLoadWareHouse\HANet\lib\models\HANet.py", line 373, in decode
hs = self.decoder(tgt,
File "D:\Environment\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "D:\GitLoadWareHouse\HANet\lib\models\HANet.py", line 429, in forward
output = layer(output,
File "D:\Environment\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "D:\GitLoadWareHouse\HANet\lib\models\HANet.py", line 633, in forward
return self.forward_post(tgt, memory, tgt_mask, memory_mask,
File "D:\GitLoadWareHouse\HANet\lib\models\HANet.py", line 570, in forward_post
tgt2 = self.self_attn(q,
File "D:\Environment\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "D:\Environment\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\activation.py", line 1153, in forward
attn_output, attn_output_weights = F.multi_head_attention_forward(
File "D:\Environment\anaconda3\envs\pytorch\lib\site-packages\torch\nn\functional.py", line 5179, in multi_head_attention_forward
attn_output, attn_output_weights = _scaled_dot_product_attention(q, k, v, attn_mask, dropout_p)
File "D:\Environment\anaconda3\envs\pytorch\lib\site-packages\torch\nn\functional.py", line 4852, in _scaled_dot_product_attention
attn = torch.baddbmm(attn_mask, q, k.transpose(-2, -1))
RuntimeError: "baddbmm_cuda" not implemented for 'Int'
�[?25h

Hello, the author. I encountered this problem during training. How can I solve it?

When will you update the code for 3d human pose estimation

Thanks for your great work, i am very interested in your research.

Config file for Human3.6M

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.