cure-lab / deciwatch Goto Github PK
View Code? Open in Web Editor NEW[ECCV 2022] Official implementation of the paper "DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation"
License: Apache License 2.0
[ECCV 2022] Official implementation of the paper "DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation"
License: Apache License 2.0
我有一个想法:可以先进行每帧的分析,选择动作幅度较大的帧作为关键帧,这种能否进一步优化网络?
Is there a way for me to train on custom data? What format would it need to be in?
Hello,
I'm trying to implement a dataset that does not have any action_name. In the documentation provided (https://github.com/cure-lab/DeciWatch/blob/main/doc/data.md) it states that the data should be given in the following format of [action_name]/[sequence_name]/[image_id]. Can the action_name be omitted or must it be provided for my custom dataset?
(base) sujia@cupt-System-Product-Name:~/hf/DeciWatch$ cd /home/sujia/hf/DeciWatch ; /usr/bin/env /home/sujia/anaconda3/bin/python /home/sujia/.vscode-server/extensions/ms-python.python-2022.12.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher 43147 -- /home/sujia/hf/DeciWatch/train.py
Namespace(cfg='/home/sujia/hf/DeciWatch/configs/config_h36m_fcn_3D.yaml', dataset_name='h36m', estimator='fcn', body_representation='3D', sample_interval=10)
Seed value for the experiment is 4321
GPU name -> NVIDIA GeForce RTX 3090
GPU feat -> _CudaDeviceProperties(name='NVIDIA GeForce RTX 3090', major=8, minor=6, total_memory=24268MB, multi_processor_count=82)
{'BODY_REPRESENTATION': '3D',
'CUDNN': CfgNode({'BENCHMARK': True, 'DETERMINISTIC': False, 'ENABLED': True}),
'DATASET': {'AIST': {'DETECTED_PATH': 'data/detected_poses/aist',
'GROUND_TRUTH_PATH': 'data/groundtruth_poses/aist',
'KEYPOINT_NUM': 14,
'KEYPOINT_ROOT': [2, 3]},
'H36M': {'DETECTED_PATH': 'data/detected_poses/h36m',
'GROUND_TRUTH_PATH': 'data/groundtruth_poses/h36m',
'KEYPOINT_NUM': 17,
'KEYPOINT_ROOT': [0]},
'JHMDB': {'DETECTED_PATH': 'data/detected_poses/jhmdb',
'GROUND_TRUTH_PATH': 'data/groundtruth_poses/jhmdb',
'KEYPOINT_NUM': 15,
'KEYPOINT_ROOT': [2]},
'PW3D': {'DETECTED_PATH': 'data/detected_poses/pw3d',
'GROUND_TRUTH_PATH': 'data/groundtruth_poses/pw3d',
'KEYPOINT_NUM': 14,
'KEYPOINT_ROOT': [2, 3]}},
'DATASET_NAME': 'h36m',
'DEBUG': True,
'DEVICE': 'cuda',
'ESTIMATOR': 'fcn',
'EVALUATE': {'DENOISE': False,
'INTERP': 'linear',
'PRETRAINED': 'data/checkpoints/h36m_fcn_3d/checkpoint.pth.tar',
'RELATIVE_IMPROVEMENT': False,
'ROOT_RELATIVE': True,
'SLIDE_WINDOW_STEP_Q': 1,
'SLIDE_WINDOW_STEP_SIZE': 10},
'EXP_NAME': 'h36m_fcn',
'LOG': CfgNode({'NAME': ''}),
'LOGDIR': 'results/08-08-2022_18-44-22_h36m_fcn',
'LOSS': CfgNode({'LAMADA': 1.0, 'W_DENOISE': 1.0}),
'MODEL': {'DECODER': 'transformer',
'DECODER_EMBEDDING_DIMENSION': 128,
'DECODER_HEAD': 4,
'DECODER_INTERP': 'linear',
'DECODER_RESIDUAL': True,
'DECODER_TOKEN_WINDOW': 5,
'DECODER_TRANSFORMER_BLOCK': 5,
'DROPOUT': 0.1,
'ENCODER_EMBEDDING_DIMENSION': 128,
'ENCODER_HEAD': 4,
'ENCODER_RESIDUAL': True,
'ENCODER_TRANSFORMER_BLOCK': 5,
'INTERVAL_N': 10,
'NAME': '',
'SAMPLE_TYPE': 'uniform',
'SLIDE_WINDOW': True,
'SLIDE_WINDOW_Q': 10,
'SLIDE_WINDOW_SIZE': 101,
'TYPE': 'network'},
'OUTPUT_DIR': 'results',
'SAMPLE_INTERVAL': 10,
'SEED_VALUE': 4321,
'SMPL_MODEL_DIR': 'data/smpl/',
'TRAIN': {'BATCH_SIZE': 1024,
'EPOCH': 20,
'LR': 0.001,
'LRDECAY': 0.95,
'PRE_NORM': True,
'RESUME': None,
'USE_6D_SMPL': False,
'USE_SMPL_LOSS': False,
'VALIDATE': True,
'WORKERS_NUM': 0},
'VIS': {'END': 1000,
'INPUT_VIDEO_NUMBER': 143,
'INPUT_VIDEO_PATH': 'data/videos/',
'OUTPUT_VIDEO_PATH': 'demo/',
'START': 0}}
#############################################################
You are loading the [training set] of dataset [h36m]
You are using pose esimator [fcn]
The type of the data is [3D]
The frame number is [1559752]
The sequence number is [600]
#############################################################
#############################################################
You are loading the [testing set] of dataset [h36m]
You are using pose esimator [fcn]
The type of the data is [3D]
The frame number is [543344]
The sequence number is [236]
#############################################################
Traceback (most recent call last):
File "/home/sujia/anaconda3/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/sujia/anaconda3/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/sujia/.vscode-server/extensions/ms-python.python-2022.12.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/main.py", line 39, in
cli.main()
File "/home/sujia/.vscode-server/extensions/ms-python.python-2022.12.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
run()
File "/home/sujia/.vscode-server/extensions/ms-python.python-2022.12.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
runpy.run_path(target, run_name="main")
File "/home/sujia/.vscode-server/extensions/ms-python.python-2022.12.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
return _run_module_code(code, init_globals, run_name,
File "/home/sujia/.vscode-server/extensions/ms-python.python-2022.12.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/home/sujia/.vscode-server/extensions/ms-python.python-2022.12.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
exec(code, run_globals)
File "/home/sujia/hf/DeciWatch/train.py", line 108, in
main(cfg)
File "/home/sujia/hf/DeciWatch/train.py", line 95, in main
Trainer(train_dataloader=train_loader,
File "/home/sujia/hf/DeciWatch/lib/core/trainer.py", line 67, in run
self.train()
File "/home/sujia/hf/DeciWatch/lib/core/trainer.py", line 113, in train
predicted_3d_pos, denoised_3d_pos = self.model(
File "/home/sujia/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/sujia/hf/DeciWatch/lib/models/deciwatch.py", line 158, in forward
self.recover, self.denoise = self.transformer.forward(
File "/home/sujia/hf/DeciWatch/lib/models/deciwatch.py", line 267, in forward
output = self.decode(mem, encoder_mask, encoder_pos_embed, trans_tgt,
File "/home/sujia/hf/DeciWatch/lib/models/deciwatch.py", line 287, in decode
hs = self.decoder(tgt,
File "/home/sujia/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/sujia/hf/DeciWatch/lib/models/deciwatch.py", line 343, in forward
output = layer(output,
File "/home/sujia/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/sujia/hf/DeciWatch/lib/models/deciwatch.py", line 536, in forward
return self.forward_pre(tgt, memory, tgt_mask, memory_mask,
File "/home/sujia/hf/DeciWatch/lib/models/deciwatch.py", line 507, in forward_pre
tgt2 = self.self_attn(q,
File "/home/sujia/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/sujia/anaconda3/lib/python3.9/site-packages/torch/nn/modules/activation.py", line 1153, in forward
attn_output, attn_output_weights = F.multi_head_attention_forward(
File "/home/sujia/anaconda3/lib/python3.9/site-packages/torch/nn/functional.py", line 5179, in multi_head_attention_forward
attn_output, attn_output_weights = _scaled_dot_product_attention(q, k, v, attn_mask, dropout_p)
File "/home/sujia/anaconda3/lib/python3.9/site-packages/torch/nn/functional.py", line 4852, in _scaled_dot_product_attention
attn = torch.baddbmm(attn_mask, q, k.transpose(-2, -1))
RuntimeError: "baddbmm_cuda" not implemented for 'Int'
First of all, thank you very much for your work. It is very nice to explore with the good documentation. I think the above mentioned scripts a re missing. It would be very nice to see those in the demo.py.
Greetings Gustav
or we can use both?
Dear authors,
Thanks for your amazing work and releasing the code!
In your other work SmoothNet, you showed that temporal-only network is superior to a transformer. However, here you use a vanilla transformer module as the denoise and recover net. In theory, these two networks can also be simply replaced by two SmoothNets. I am wondering have you done these experiments before? And what is your insight into these?
Thanks!
Best,
Xianghui
According to the setting of the paper, for 3dhpe, we can only input 3d poses, and output smooth 3D poses through DeciWatch network. Can we directly input 2D poses, and use the network to lifting the dimension and reduce the noise? Looking forward to your reply:)
Great work!
Just wanted to ask if it's still planned to integrate DeciWatch in mmpose? There is a stale PR there for some months now.
Hi, does there any root joints transition optimization comparation result?
When I do it now, 16 are printed.
What did I take out and put it in 15.
Both smoothnet and deciwatch are offline attitude estimation, and the design method determines that they cannot be performed in real time.
Hi, Thanks for your great work. I want to do inference on my custom video, how to do it? Also, does it supports multi people tracking?
i/p - [(x1, y1), (x2, y2), ...... , (xn, yn)]
I want output as -
o/p - [(x1_corrected, y1_corrected), (x2_corrected, y2_corrected), ...... , (xn_corrected, yn_corrected)]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.