Git Product home page Git Product logo

motionbert's Introduction

MotionBERT: A Unified Perspective on Learning Human Motion Representations

PyTorch arXiv Project Demo Hugging Face Models

PWC PWC PWC

This is the official PyTorch implementation of the paper "MotionBERT: A Unified Perspective on Learning Human Motion Representations" (ICCV 2023).

Installation

conda create -n motionbert python=3.7 anaconda
conda activate motionbert
# Please install PyTorch according to your CUDA version.
conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
pip install -r requirements.txt

Getting Started

Task Document
Pretrain docs/pretrain.md
3D human pose estimation docs/pose3d.md
Skeleton-based action recognition docs/action.md
Mesh recovery docs/mesh.md

Applications

In-the-wild inference (for custom videos)

Please refer to docs/inference.md.

Using MotionBERT for human-centric video representations

'''	    
  x: 2D skeletons 
    type = <class 'torch.Tensor'>
    shape = [batch size * frames * joints(17) * channels(3)]
    
  MotionBERT: pretrained human motion encoder
    type = <class 'lib.model.DSTformer.DSTformer'>
    
  E: encoded motion representation
    type = <class 'torch.Tensor'>
    shape = [batch size * frames * joints(17) * channels(512)]
'''
E = MotionBERT.get_representation(x)

Hints

  1. The model could handle different input lengths (no more than 243 frames). No need to explicitly specify the input length elsewhere.
  2. The model uses 17 body keypoints (H36M format). If you are using other formats, please convert them before feeding to MotionBERT.
  3. Please refer to model_action.py and model_mesh.py for examples of (easily) adapting MotionBERT to different downstream tasks.
  4. For RGB videos, you need to extract 2D poses (inference.md), convert the keypoint format (dataset_wild.py), and then feed to MotionBERT (infer_wild.py).

Model Zoo

Model Download Link Config Performance
MotionBERT (162MB) OneDrive pretrain/MB_pretrain.yaml -
MotionBERT-Lite (61MB) OneDrive pretrain/MB_lite.yaml -
3D Pose (H36M-SH, scratch) OneDrive pose3d/MB_train_h36m.yaml 39.2mm (MPJPE)
3D Pose (H36M-SH, ft) OneDrive pose3d/MB_ft_h36m.yaml 37.2mm (MPJPE)
Action Recognition (x-sub, ft) OneDrive action/MB_ft_NTU60_xsub.yaml 97.2% (Top1 Acc)
Action Recognition (x-view, ft) OneDrive action/MB_ft_NTU60_xview.yaml 93.0% (Top1 Acc)
Mesh (with 3DPW, ft) OneDrive mesh/MB_ft_pw3d.yaml 88.1mm (MPVE)

In most use cases (especially with finetuning), MotionBERT-Lite gives a similar performance with lower computation overhead.

TODO

  • Scripts and docs for pretraining

  • Demo for custom videos

Citation

If you find our work useful for your project, please consider citing the paper:

@inproceedings{motionbert2022,
  title     =   {MotionBERT: A Unified Perspective on Learning Human Motion Representations}, 
  author    =   {Zhu, Wentao and Ma, Xiaoxuan and Liu, Zhaoyang and Liu, Libin and Wu, Wayne and Wang, Yizhou},
  booktitle =   {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year      =   {2023},
}

motionbert's People

Contributors

baitian752 avatar shirleymaxx avatar viewsetting avatar walter0807 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

motionbert's Issues

Finetune data format

hello @Walter0807 , I wanna finetune pose3d task for my own dataset, what data format should I prepare, like what's inside .pkl file. Now I have a 2D skeleton video and the json file from Alphapose, what should I do next.
Sorry for keeping bothering you.

An error in infer_wild.py

Hello, I have met a difficulty and do not know how to solve it. My error is as follows. May I ask what happened and how to correct it? I'm using a Windows10 computer. Thank you so much!!

(motionbert) F:\DeepLearning\MotionBERT\MotionBERT-main> python infer_wild.py --vid_path video/me.mp4 --json_path video_json/vis_me.mp4.json --out_path output

Loading checkpoint checkpoint/pose3d/FT_MB_lite_MB_ft_h36m_global_lite/best_epoch.bin

0% | | 0/2 [00:00<?, ?it/s]L

oading checkpoint checkpoint/pose3d/FT_MB_lite_MB_ft_h36m_global_lite/best_epoch.bin

0% | | 0/2 [00:00<?, ?it/s]

Traceback (most recent call last):

File "< string>" , line 1, in < module>

File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\spawn.py", line 116, in spawn_main

exitcode = _main(fd, parent_sentinel)

File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\spawn.py", line 125, in _main

prepare(preparation_data)

File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\spawn.py", line 236, in prepare

_fixup_main_from_path(data['init_main_from_path'])

File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path

main_content = runpy.run_path(main_path,

File "F:\Software\Anaconda\envs\motionbert\lib\runpy.py", line 289, in run_path

return _run_module_code(code, init_globals, run_name,

File "F:\Software\Anaconda\envs\motionbert\lib\runpy.py", line 96, in _run_module_code

_run_code(code, mod_globals, init_globals,

File "F:\Software\Anaconda\envs\motionbert\lib\runpy.py", line 86, in _run_code

exec(code, run_globals)

File "F:\DeepLearning\MotionBERT\MotionBERT-main\infer_wild.py", line 70, in < module>

for batch_input in tqdm(test_loader):

File "F:\Software\Anaconda\envs\motionbert\lib\site-packages\tqdm\std.py", line 1178, in iter

for obj in iterable:

File "F:\Software\Anaconda\envs\motionbert\lib\site-packages\torch\utils\data\dataloader.py", line 439, in iter

self._iterator = self._get_iterator()

File "F:\Software\Anaconda\envs\motionbert\lib\site-packages\torch\utils\data\dataloader.py", line 390, in _get_iterator

return _MultiProcessingDataLoaderIter(self)

File "F:\Software\Anaconda\envs\motionbert\lib\site-packages\torch\utils\data\dataloader.py", line 1077, in init

w.start()

File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\process.py", line 121, in start

self._popen = self._Popen(self)

File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\context.py", line 224, in _Popen

return _default_context.get_context().Process._Popen(process_obj)

File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\context.py", line 336, in _Popen

return Popen(process_obj)

File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\popen_spawn_win32.py", line 45, in init

prep_data = spawn.get_preparation_data(process_obj._name)

File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\spawn.py", line 154, in get_preparation_data

_check_not_importing_main()

File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main

raise RuntimeError('''

RuntimeError:

An attempt has been made to start a new process before the

current process has finished its bootstrapping phase.

This probably means that you are not using fork to start your

child processes and you have forgotten to use the proper idiom

in the main module:

if name == 'main':

freeze_support()

...

The "freeze_support()" line can be omitted if the program

is not going to be frozen to produce an executable.

Pre-Training Time

Hi,

Thank you for your great work! How long does the pretraining take on 8 V100 machines? Thanks!

about the root trans in 3d pose and mesh

Hello, this is really an impressive work, clean and fantastic. however, I got some questions wanna ask for help:

  1. From the demo videos, the translations in 3d keypoints task relatively stable, but mesh looks like bumppy in vertical direction (height), is there a way to make use of all of their strength to make a multi-modality like model to make mesh trans more accurate?
  2. Since looks like 3d pose result really good, but most situations we need rotations of the body (like SMPL), is there a way could get the rotations like SMPL from 3d pose directly?

about the speed

even though the model is not big, but the speed is quite slow, about 1s per frame, is that normal?

The formula notation in the paper

Thank you for your great work, and I have a question about one of the formulas in the paper. ◦ denotes element-wise production. But I didn't know ◦, what that symbol meant, could you tell me about it ?

About Comparison of Model Architectural Designs

Hello, thank you for your work.
image
In my opinion, it is inappropriate to compare methods (a) and (f). From what I understand, method (a) contains one S-T block in one module whereas method (f) contains S-T, T-S blocks.
That is, I think method (f) has twice as many parameters as method (a). So I think it would be more appropriate to compare by setting the depth to 10 in method (a).
thank you!

About half body mesh regression issue

Hello, I recently found mb seems not very good at half body mesh regression, just wonder if you have tested it before and what could be the reason caused this?

for image based model like PARE, SPIN, it can imaginary on blinded part, but for mb, it totally failed at this scene.

about 2d projection.

thanks for you great work.
image

in-the-wild RGB videos do not have 3d GT, no depth infomation or camera K, how to do the reprojection ?

About Amass trainset

Hello, may I ask if the GRAB and SOMA in the AMASS training set are not used in the training of the pre trained model? If they are used, code tools/compress_amass.py seems to be incorrect?

mesh prediction error

I run this code:
!python3 infer_wild_mesh.py --vid_path ./4.mp4 --json_path ./alphapose-results.json --out_path /content/MotionBERT

I have saved the best_epoch here:
/content/MotionBERT/checkpoint/mesh/FT_MB_release_MB_ft_pw3d/best_epoch.bin

Traceback (most recent call last):
File "/content/MotionBERT/infer_wild_mesh.py", line 64, in
smpl = SMPL(args.data_root, batch_size=1).cuda()
File "/content/MotionBERT/lib/utils/utils_smpl.py", line 62, in init
super(SMPL, self).init(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/smplx/body_models.py", line 133, in init
assert osp.exists(smpl_path), 'Path {} does not exist!'.format(
AssertionError: Path data/mesh does not exist!

Code release

Thanks for your great work! When will you release the code?

Prediction Scale

Hi,
I have questions about the dimensions of the predicted poses both in inference and evaluation code.
I noticed that the predictions of the network in the evaluation function in train.py are being multiplied by a factor and I traced it back to data['test']['2.5d_factor'] in h36m_sh_conf_cam_source_final.pkl. Could you please help me understand how these factors are being calculated?
Does this mean that the outputs of the network are not expected to have the correct scale of a human (in meters) and only the relative pose is the goal? especially in inference, I noticed that when running the inference code if I plot the outputs I notice a change in the dimensions of the person (even when applying that I guess comes from this, even when using the MB_ft_h36m model with rootrel set to True.

In general, it would be really appreciated if you could help me understand the scale of the output and how I can convert it to meters.

Thanks in advance for your help.

wild video infer too slow

Hello, may I ask, I deduce that my own video speed is very slow, 10s video takes more than ten minutes, and there are some warning messages, may I ask why?

MAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (923, 924) to (928, 928) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisib

le by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).

0% | ▋ | 1/296 [00:00 & lt; 04:16, 1.15 it/s] [

swscaler @ 000001df1adc4300] Warning: data is not aligned! This can lead to a speed loss

My testloader Settings are as follows: testloader_params = {

'batch_size': 1,

'shuffle': False,

'num_workers': 0,

'pin_memory': True,

'prefetch_factor': 2,

'persistent_workers': False,

'drop_last': False

}

Fine-tuning part layers

Hi,
You mentioned the fine-tuning part layers in your paper, but the code is fine-tuning the entire model, which is costly to calculate. May I ask what part layers refers to?

some train questions

Thanks for your great work! I have some questions about how this Dual-stream Spatio-temporal Transformer (DSTformer) can accelerate training in parallel. Besides, whether is T=243 too computationally intensive for T-MHSA.
image
Thank you very much!

Broken link pyskl

The link of pyskl in the action.md is broken. Is it possible to find any new link for this, or some similar reference?

Usage

Hi, thanks for the release, looks very cool!
Can you please give me a hint on how I can utilize your pre-trained model to inference on my own video?
Given a video, do I need to run 2D pose estimation first before I can use MotionBERT? Or do you already provide that?
How should I generate 3D points on my video?
How can I get the motion embedding? I tried:

E = MotionBERT.get_representation(x)

but get_representation does not exist!
Thank you!

If you could just give me high level hints I would appreciate it! Thanks!

Missing key(s) in state_dict: "temp_embed", "pos_embed",

Loading checkpoint checkpoint/pose3d/FT_MB_lite_MB_ft_h36m_global_lite/best_epoch.bin Traceback (most recent call last): File "/content/MotionBERT/infer_wild.py", line 45, in <module> model_backbone.load_state_dict(checkpoint['model_pos'], strict=True) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2041, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for DSTformer: Missing key(s) in state_dict: "temp_embed", "pos_embed", ... Unexpected key(s) in state_dict: "module.temp_embed", "module.pos_embed", ...

I had tried to solve this problem, according to this blog.
https://blog.csdn.net/yangwangnndd/article/details/100207686
however, there are more problems following.

I followed this guide
https://github.com/Walter0807/MotionBERT/blob/main/docs/inference.md
I use this 3dpose model
https://onedrive.live.com/?authkey=%21ALuKCr9wihi87bI&id=A5438CD242871DF0%21190&cid=A5438CD242871DF0

License Information

Hi! Thank you so much for sharing this code. Can you please include the license information so that we know the restrictions/limitations if there are any?

About the hips coordinates to world

Hi, I got some real-time 3d pose result and visualize in open3d, it looks good:

ezgif-4-a234c576ce

However, I am wondering how to mapping the hips cooridinates to realworld, I am currently +0.65 for the z axis, but not aligned well, looks like it should be some value in normalized height hips to height. Do u know what exactly value it is?

Some excellent demo

Hi, just post some FBX (not rending, in real 3D) demos here, the result is impressive:

Clip_len 24

2023-04-25-14-47-18 00_00_00-00_00_30

Clip_len 48

2023-04-25-15-08-26 00_00_00-00_00_30

the video I tested is a very challenge one, still get some nice result!

Just still have one issue, the poses might blink in middle of frames. Do u got any thoughts? What's more, what's the best clip len here for realtime applications? (we can't using too big clip len here in realtime)

how to get the action recognition result for custom videos

Hey thanks for this wonderful work, the performance of 2D-3D recontruction is just eye-openning. I am just wondering whether the action recognition inference code for custom video is released yet, I can only find the evaluation code for action recognition which is meant for NTU-RGBD dataset.

Velocity loss in the paper

Thank you for your great work. I would like to ask about the loss function mentioned in the paper and the part about speed loss. What is the meaning of adding speed loss.

Keypoint format

I get the coco 17 key-points or any other key-point format of my own custom data, and I know I should convert the coco format to human3.6, but how? The definition between coco and human 3.6 is different especially for the body. Is there any way to convert the format between these datasets?

About the conf input

Hello, I notice that the input can be with or without conf, but didn't saw any ablation on this part, if it uses conf, then it highly couple with the pose model itself (some models might didn't produce relatively high scores), does there any add conf or not the final metrices accordingly?

Something about train.

Thank you for your great work. I have a question for you as follows. I see that there are three training sections in the doc folder, which are pretrain, scratch and finetune. Is there any connection between these three? If I focus only on 2D key points to 3D key points, which one should I focus on? Thank you very much and look forward to your answer.

How to accelerate model infer speed

Hi,I use the script ‘infer_wild.sh’ to infer 3d pose. I have a gpu and can I use it or use other method to accelerate model rendering speed? I found the the GPU utilization rate is very low.
image

Mesh with HybrIK

Will you provide the code of mesh recovery using HybrIK (report in Table 3)? I’d appreciate it if you can release the code related to this part.

PyTorch version for reproducing the result

I want to reproduce the result of "3D Pose (H36M-SH, scratch), 39.1mm", but I only can get 40.0mm. So I want to know what is the PyTorch version you used to train the model?

Cannot load checkpoint in docs/inference.MD

Hi,

Thanks for the great work!

I tried to follow the instructions in docs/inference.MD and got the following error while loading the checkpoint:

Error logs
(motionbert) H4dr1en@H4dr1en MotionBERT % /opt/miniconda3/envs/motionbert/bin/python /Users/H4dr1en/projects/MotionBERT/infer_wild_test.py
Loading checkpoint checkpoint/pose3d/FT_MB_lite_MB_ft_h36m_global_lite/best_epoch.bin
Traceback (most recent call last):
  File "/Users/H4dr1en/projects/MotionBERT/infer_wild_test.py", line 37, in <module>
    model_backbone.load_state_dict(checkpoint['model_pos'], strict=True)
  File "/opt/miniconda3/envs/motionbert/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for DSTformer:
        Missing key(s) in state_dict: "temp_embed", "pos_embed", "joints_embed.weight", "joints_embed.bias", "blocks_st.0.norm1_s.weight", "blocks_st.0.norm1_s.bias", "blocks_st.0.norm1_t.weight", "blocks_st.0.norm1_t.bias", "blocks_st.0.attn_s.proj.weight", "blocks_st.0.attn_s.proj.bias", "blocks_st.0.attn_s.qkv.weight", "blocks_st.0.attn_s.qkv.bias", "blocks_st.0.attn_t.proj.weight", "blocks_st.0.attn_t.proj.bias", "blocks_st.0.attn_t.qkv.weight", "blocks_st.0.attn_t.qkv.bias", "blocks_st.0.norm2_s.weight", "blocks_st.0.norm2_s.bias", "blocks_st.0.norm2_t.weight", "blocks_st.0.norm2_t.bias", "blocks_st.0.mlp_s.fc1.weight", "blocks_st.0.mlp_s.fc1.bias", "blocks_st.0.mlp_s.fc2.weight", "blocks_st.0.mlp_s.fc2.bias", "blocks_st.0.mlp_t.fc1.weight", "blocks_st.0.mlp_t.fc1.bias", "blocks_st.0.mlp_t.fc2.weight", "blocks_st.0.mlp_t.fc2.bias", "blocks_st.1.norm1_s.weight", "blocks_st.1.norm1_s.bias", "blocks_st.1.norm1_t.weight", "blocks_st.1.norm1_t.bias", "blocks_st.1.attn_s.proj.weight", "blocks_st.1.attn_s.proj.bias", "blocks_st.1.attn_s.qkv.weight", "blocks_st.1.attn_s.qkv.bias", "blocks_st.1.attn_t.proj.weight", "blocks_st.1.attn_t.proj.bias", "blocks_st.1.attn_t.qkv.weight", "blocks_st.1.attn_t.qkv.bias", "blocks_st.1.norm2_s.weight", "blocks_st.1.norm2_s.bias", "blocks_st.1.norm2_t.weight", "blocks_st.1.norm2_t.bias", "blocks_st.1.mlp_s.fc1.weight", "blocks_st.1.mlp_s.fc1.bias", "blocks_st.1.mlp_s.fc2.weight", "blocks_st.1.mlp_s.fc2.bias", "blocks_st.1.mlp_t.fc1.weight", "blocks_st.1.mlp_t.fc1.bias", "blocks_st.1.mlp_t.fc2.weight", "blocks_st.1.mlp_t.fc2.bias", "blocks_st.2.norm1_s.weight", "blocks_st.2.norm1_s.bias", "blocks_st.2.norm1_t.weight", "blocks_st.2.norm1_t.bias", "blocks_st.2.attn_s.proj.weight", "blocks_st.2.attn_s.proj.bias", "blocks_st.2.attn_s.qkv.weight", "blocks_st.2.attn_s.qkv.bias", "blocks_st.2.attn_t.proj.weight", "blocks_st.2.attn_t.proj.bias", "blocks_st.2.attn_t.qkv.weight", "blocks_st.2.attn_t.qkv.bias", "blocks_st.2.norm2_s.weight", "blocks_st.2.norm2_s.bias", "blocks_st.2.norm2_t.weight", "blocks_st.2.norm2_t.bias", "blocks_st.2.mlp_s.fc1.weight", "blocks_st.2.mlp_s.fc1.bias", "blocks_st.2.mlp_s.fc2.weight", "blocks_st.2.mlp_s.fc2.bias", "blocks_st.2.mlp_t.fc1.weight", "blocks_st.2.mlp_t.fc1.bias", "blocks_st.2.mlp_t.fc2.weight", "blocks_st.2.mlp_t.fc2.bias", "blocks_st.3.norm1_s.weight", "blocks_st.3.norm1_s.bias", "blocks_st.3.norm1_t.weight", "blocks_st.3.norm1_t.bias", "blocks_st.3.attn_s.proj.weight", "blocks_st.3.attn_s.proj.bias", "blocks_st.3.attn_s.qkv.weight", "blocks_st.3.attn_s.qkv.bias", "blocks_st.3.attn_t.proj.weight", "blocks_st.3.attn_t.proj.bias", "blocks_st.3.attn_t.qkv.weight", "blocks_st.3.attn_t.qkv.bias", "blocks_st.3.norm2_s.weight", "blocks_st.3.norm2_s.bias", "blocks_st.3.norm2_t.weight", "blocks_st.3.norm2_t.bias", "blocks_st.3.mlp_s.fc1.weight", "blocks_st.3.mlp_s.fc1.bias", "blocks_st.3.mlp_s.fc2.weight", "blocks_st.3.mlp_s.fc2.bias", "blocks_st.3.mlp_t.fc1.weight", "blocks_st.3.mlp_t.fc1.bias", "blocks_st.3.mlp_t.fc2.weight", "blocks_st.3.mlp_t.fc2.bias", "blocks_st.4.norm1_s.weight", "blocks_st.4.norm1_s.bias", "blocks_st.4.norm1_t.weight", "blocks_st.4.norm1_t.bias", "blocks_st.4.attn_s.proj.weight", "blocks_st.4.attn_s.proj.bias", "blocks_st.4.attn_s.qkv.weight", "blocks_st.4.attn_s.qkv.bias", "blocks_st.4.attn_t.proj.weight", "blocks_st.4.attn_t.proj.bias", "blocks_st.4.attn_t.qkv.weight", "blocks_st.4.attn_t.qkv.bias", "blocks_st.4.norm2_s.weight", "blocks_st.4.norm2_s.bias", "blocks_st.4.norm2_t.weight", "blocks_st.4.norm2_t.bias", "blocks_st.4.mlp_s.fc1.weight", "blocks_st.4.mlp_s.fc1.bias", "blocks_st.4.mlp_s.fc2.weight", "blocks_st.4.mlp_s.fc2.bias", "blocks_st.4.mlp_t.fc1.weight", "blocks_st.4.mlp_t.fc1.bias", "blocks_st.4.mlp_t.fc2.weight", "blocks_st.4.mlp_t.fc2.bias", "blocks_ts.0.norm1_s.weight", "blocks_ts.0.norm1_s.bias", "blocks_ts.0.norm1_t.weight", "blocks_ts.0.norm1_t.bias", "blocks_ts.0.attn_s.proj.weight", "blocks_ts.0.attn_s.proj.bias", "blocks_ts.0.attn_s.qkv.weight", "blocks_ts.0.attn_s.qkv.bias", "blocks_ts.0.attn_t.proj.weight", "blocks_ts.0.attn_t.proj.bias", "blocks_ts.0.attn_t.qkv.weight", "blocks_ts.0.attn_t.qkv.bias", "blocks_ts.0.norm2_s.weight", "blocks_ts.0.norm2_s.bias", "blocks_ts.0.norm2_t.weight", "blocks_ts.0.norm2_t.bias", "blocks_ts.0.mlp_s.fc1.weight", "blocks_ts.0.mlp_s.fc1.bias", "blocks_ts.0.mlp_s.fc2.weight", "blocks_ts.0.mlp_s.fc2.bias", "blocks_ts.0.mlp_t.fc1.weight", "blocks_ts.0.mlp_t.fc1.bias", "blocks_ts.0.mlp_t.fc2.weight", "blocks_ts.0.mlp_t.fc2.bias", "blocks_ts.1.norm1_s.weight", "blocks_ts.1.norm1_s.bias", "blocks_ts.1.norm1_t.weight", "blocks_ts.1.norm1_t.bias", "blocks_ts.1.attn_s.proj.weight", "blocks_ts.1.attn_s.proj.bias", "blocks_ts.1.attn_s.qkv.weight", "blocks_ts.1.attn_s.qkv.bias", "blocks_ts.1.attn_t.proj.weight", "blocks_ts.1.attn_t.proj.bias", "blocks_ts.1.attn_t.qkv.weight", "blocks_ts.1.attn_t.qkv.bias", "blocks_ts.1.norm2_s.weight", "blocks_ts.1.norm2_s.bias", "blocks_ts.1.norm2_t.weight", "blocks_ts.1.norm2_t.bias", "blocks_ts.1.mlp_s.fc1.weight", "blocks_ts.1.mlp_s.fc1.bias", "blocks_ts.1.mlp_s.fc2.weight", "blocks_ts.1.mlp_s.fc2.bias", "blocks_ts.1.mlp_t.fc1.weight", "blocks_ts.1.mlp_t.fc1.bias", "blocks_ts.1.mlp_t.fc2.weight", "blocks_ts.1.mlp_t.fc2.bias", "blocks_ts.2.norm1_s.weight", "blocks_ts.2.norm1_s.bias", "blocks_ts.2.norm1_t.weight", "blocks_ts.2.norm1_t.bias", "blocks_ts.2.attn_s.proj.weight", "blocks_ts.2.attn_s.proj.bias", "blocks_ts.2.attn_s.qkv.weight", "blocks_ts.2.attn_s.qkv.bias", "blocks_ts.2.attn_t.proj.weight", "blocks_ts.2.attn_t.proj.bias", "blocks_ts.2.attn_t.qkv.weight", "blocks_ts.2.attn_t.qkv.bias", "blocks_ts.2.norm2_s.weight", "blocks_ts.2.norm2_s.bias", "blocks_ts.2.norm2_t.weight", "blocks_ts.2.norm2_t.bias", "blocks_ts.2.mlp_s.fc1.weight", "blocks_ts.2.mlp_s.fc1.bias", "blocks_ts.2.mlp_s.fc2.weight", "blocks_ts.2.mlp_s.fc2.bias", "blocks_ts.2.mlp_t.fc1.weight", "blocks_ts.2.mlp_t.fc1.bias", "blocks_ts.2.mlp_t.fc2.weight", "blocks_ts.2.mlp_t.fc2.bias", "blocks_ts.3.norm1_s.weight", "blocks_ts.3.norm1_s.bias", "blocks_ts.3.norm1_t.weight", "blocks_ts.3.norm1_t.bias", "blocks_ts.3.attn_s.proj.weight", "blocks_ts.3.attn_s.proj.bias", "blocks_ts.3.attn_s.qkv.weight", "blocks_ts.3.attn_s.qkv.bias", "blocks_ts.3.attn_t.proj.weight", "blocks_ts.3.attn_t.proj.bias", "blocks_ts.3.attn_t.qkv.weight", "blocks_ts.3.attn_t.qkv.bias", "blocks_ts.3.norm2_s.weight", "blocks_ts.3.norm2_s.bias", "blocks_ts.3.norm2_t.weight", "blocks_ts.3.norm2_t.bias", "blocks_ts.3.mlp_s.fc1.weight", "blocks_ts.3.mlp_s.fc1.bias", "blocks_ts.3.mlp_s.fc2.weight", "blocks_ts.3.mlp_s.fc2.bias", "blocks_ts.3.mlp_t.fc1.weight", "blocks_ts.3.mlp_t.fc1.bias", "blocks_ts.3.mlp_t.fc2.weight", "blocks_ts.3.mlp_t.fc2.bias", "blocks_ts.4.norm1_s.weight", "blocks_ts.4.norm1_s.bias", "blocks_ts.4.norm1_t.weight", "blocks_ts.4.norm1_t.bias", "blocks_ts.4.attn_s.proj.weight", "blocks_ts.4.attn_s.proj.bias", "blocks_ts.4.attn_s.qkv.weight", "blocks_ts.4.attn_s.qkv.bias", "blocks_ts.4.attn_t.proj.weight", "blocks_ts.4.attn_t.proj.bias", "blocks_ts.4.attn_t.qkv.weight", "blocks_ts.4.attn_t.qkv.bias", "blocks_ts.4.norm2_s.weight", "blocks_ts.4.norm2_s.bias", "blocks_ts.4.norm2_t.weight", "blocks_ts.4.norm2_t.bias", "blocks_ts.4.mlp_s.fc1.weight", "blocks_ts.4.mlp_s.fc1.bias", "blocks_ts.4.mlp_s.fc2.weight", "blocks_ts.4.mlp_s.fc2.bias", "blocks_ts.4.mlp_t.fc1.weight", "blocks_ts.4.mlp_t.fc1.bias", "blocks_ts.4.mlp_t.fc2.weight", "blocks_ts.4.mlp_t.fc2.bias", "norm.weight", "norm.bias", "pre_logits.fc.weight", "pre_logits.fc.bias", "head.weight", "head.bias", "ts_attn.0.weight", "ts_attn.0.bias", "ts_attn.1.weight", "ts_attn.1.bias", "ts_attn.2.weight", "ts_attn.2.bias", "ts_attn.3.weight", "ts_attn.3.bias", "ts_attn.4.weight", "ts_attn.4.bias". 
        Unexpected key(s) in state_dict: "module.temp_embed", "module.pos_embed", "module.joints_embed.weight", "module.joints_embed.bias", "module.blocks_st.0.norm1_s.weight", "module.blocks_st.0.norm1_s.bias", "module.blocks_st.0.norm1_t.weight", "module.blocks_st.0.norm1_t.bias", "module.blocks_st.0.attn_s.proj.weight", "module.blocks_st.0.attn_s.proj.bias", "module.blocks_st.0.attn_s.qkv.weight", "module.blocks_st.0.attn_s.qkv.bias", "module.blocks_st.0.attn_t.proj.weight", "module.blocks_st.0.attn_t.proj.bias", "module.blocks_st.0.attn_t.qkv.weight", "module.blocks_st.0.attn_t.qkv.bias", "module.blocks_st.0.norm2_s.weight", "module.blocks_st.0.norm2_s.bias", "module.blocks_st.0.norm2_t.weight", "module.blocks_st.0.norm2_t.bias", "module.blocks_st.0.mlp_s.fc1.weight", "module.blocks_st.0.mlp_s.fc1.bias", "module.blocks_st.0.mlp_s.fc2.weight", "module.blocks_st.0.mlp_s.fc2.bias", "module.blocks_st.0.mlp_t.fc1.weight", "module.blocks_st.0.mlp_t.fc1.bias", "module.blocks_st.0.mlp_t.fc2.weight", "module.blocks_st.0.mlp_t.fc2.bias", "module.blocks_st.1.norm1_s.weight", "module.blocks_st.1.norm1_s.bias", "module.blocks_st.1.norm1_t.weight", "module.blocks_st.1.norm1_t.bias", "module.blocks_st.1.attn_s.proj.weight", "module.blocks_st.1.attn_s.proj.bias", "module.blocks_st.1.attn_s.qkv.weight", "module.blocks_st.1.attn_s.qkv.bias", "module.blocks_st.1.attn_t.proj.weight", "module.blocks_st.1.attn_t.proj.bias", "module.blocks_st.1.attn_t.qkv.weight", "module.blocks_st.1.attn_t.qkv.bias", "module.blocks_st.1.norm2_s.weight", "module.blocks_st.1.norm2_s.bias", "module.blocks_st.1.norm2_t.weight", "module.blocks_st.1.norm2_t.bias", "module.blocks_st.1.mlp_s.fc1.weight", "module.blocks_st.1.mlp_s.fc1.bias", "module.blocks_st.1.mlp_s.fc2.weight", "module.blocks_st.1.mlp_s.fc2.bias", "module.blocks_st.1.mlp_t.fc1.weight", "module.blocks_st.1.mlp_t.fc1.bias", "module.blocks_st.1.mlp_t.fc2.weight", "module.blocks_st.1.mlp_t.fc2.bias", "module.blocks_st.2.norm1_s.weight", "module.blocks_st.2.norm1_s.bias", "module.blocks_st.2.norm1_t.weight", "module.blocks_st.2.norm1_t.bias", "module.blocks_st.2.attn_s.proj.weight", "module.blocks_st.2.attn_s.proj.bias", "module.blocks_st.2.attn_s.qkv.weight", "module.blocks_st.2.attn_s.qkv.bias", "module.blocks_st.2.attn_t.proj.weight", "module.blocks_st.2.attn_t.proj.bias", "module.blocks_st.2.attn_t.qkv.weight", "module.blocks_st.2.attn_t.qkv.bias", "module.blocks_st.2.norm2_s.weight", "module.blocks_st.2.norm2_s.bias", "module.blocks_st.2.norm2_t.weight", "module.blocks_st.2.norm2_t.bias", "module.blocks_st.2.mlp_s.fc1.weight", "module.blocks_st.2.mlp_s.fc1.bias", "module.blocks_st.2.mlp_s.fc2.weight", "module.blocks_st.2.mlp_s.fc2.bias", "module.blocks_st.2.mlp_t.fc1.weight", "module.blocks_st.2.mlp_t.fc1.bias", "module.blocks_st.2.mlp_t.fc2.weight", "module.blocks_st.2.mlp_t.fc2.bias", "module.blocks_st.3.norm1_s.weight", "module.blocks_st.3.norm1_s.bias", "module.blocks_st.3.norm1_t.weight", "module.blocks_st.3.norm1_t.bias", "module.blocks_st.3.attn_s.proj.weight", "module.blocks_st.3.attn_s.proj.bias", "module.blocks_st.3.attn_s.qkv.weight", "module.blocks_st.3.attn_s.qkv.bias", "module.blocks_st.3.attn_t.proj.weight", "module.blocks_st.3.attn_t.proj.bias", "module.blocks_st.3.attn_t.qkv.weight", "module.blocks_st.3.attn_t.qkv.bias", "module.blocks_st.3.norm2_s.weight", "module.blocks_st.3.norm2_s.bias", "module.blocks_st.3.norm2_t.weight", "module.blocks_st.3.norm2_t.bias", "module.blocks_st.3.mlp_s.fc1.weight", "module.blocks_st.3.mlp_s.fc1.bias", "module.blocks_st.3.mlp_s.fc2.weight", "module.blocks_st.3.mlp_s.fc2.bias", "module.blocks_st.3.mlp_t.fc1.weight", "module.blocks_st.3.mlp_t.fc1.bias", "module.blocks_st.3.mlp_t.fc2.weight", "module.blocks_st.3.mlp_t.fc2.bias", "module.blocks_st.4.norm1_s.weight", "module.blocks_st.4.norm1_s.bias", "module.blocks_st.4.norm1_t.weight", "module.blocks_st.4.norm1_t.bias", "module.blocks_st.4.attn_s.proj.weight", "module.blocks_st.4.attn_s.proj.bias", "module.blocks_st.4.attn_s.qkv.weight", "module.blocks_st.4.attn_s.qkv.bias", "module.blocks_st.4.attn_t.proj.weight", "module.blocks_st.4.attn_t.proj.bias", "module.blocks_st.4.attn_t.qkv.weight", "module.blocks_st.4.attn_t.qkv.bias", "module.blocks_st.4.norm2_s.weight", "module.blocks_st.4.norm2_s.bias", "module.blocks_st.4.norm2_t.weight", "module.blocks_st.4.norm2_t.bias", "module.blocks_st.4.mlp_s.fc1.weight", "module.blocks_st.4.mlp_s.fc1.bias", "module.blocks_st.4.mlp_s.fc2.weight", "module.blocks_st.4.mlp_s.fc2.bias", "module.blocks_st.4.mlp_t.fc1.weight", "module.blocks_st.4.mlp_t.fc1.bias", "module.blocks_st.4.mlp_t.fc2.weight", "module.blocks_st.4.mlp_t.fc2.bias", "module.blocks_ts.0.norm1_s.weight", "module.blocks_ts.0.norm1_s.bias", "module.blocks_ts.0.norm1_t.weight", "module.blocks_ts.0.norm1_t.bias", "module.blocks_ts.0.attn_s.proj.weight", "module.blocks_ts.0.attn_s.proj.bias", "module.blocks_ts.0.attn_s.qkv.weight", "module.blocks_ts.0.attn_s.qkv.bias", "module.blocks_ts.0.attn_t.proj.weight", "module.blocks_ts.0.attn_t.proj.bias", "module.blocks_ts.0.attn_t.qkv.weight", "module.blocks_ts.0.attn_t.qkv.bias", "module.blocks_ts.0.norm2_s.weight", "module.blocks_ts.0.norm2_s.bias", "module.blocks_ts.0.norm2_t.weight", "module.blocks_ts.0.norm2_t.bias", "module.blocks_ts.0.mlp_s.fc1.weight", "module.blocks_ts.0.mlp_s.fc1.bias", "module.blocks_ts.0.mlp_s.fc2.weight", "module.blocks_ts.0.mlp_s.fc2.bias", "module.blocks_ts.0.mlp_t.fc1.weight", "module.blocks_ts.0.mlp_t.fc1.bias", "module.blocks_ts.0.mlp_t.fc2.weight", "module.blocks_ts.0.mlp_t.fc2.bias", "module.blocks_ts.1.norm1_s.weight", "module.blocks_ts.1.norm1_s.bias", "module.blocks_ts.1.norm1_t.weight", "module.blocks_ts.1.norm1_t.bias", "module.blocks_ts.1.attn_s.proj.weight", "module.blocks_ts.1.attn_s.proj.bias", "module.blocks_ts.1.attn_s.qkv.weight", "module.blocks_ts.1.attn_s.qkv.bias", "module.blocks_ts.1.attn_t.proj.weight", "module.blocks_ts.1.attn_t.proj.bias", "module.blocks_ts.1.attn_t.qkv.weight", "module.blocks_ts.1.attn_t.qkv.bias", "module.blocks_ts.1.norm2_s.weight", "module.blocks_ts.1.norm2_s.bias", "module.blocks_ts.1.norm2_t.weight", "module.blocks_ts.1.norm2_t.bias", "module.blocks_ts.1.mlp_s.fc1.weight", "module.blocks_ts.1.mlp_s.fc1.bias", "module.blocks_ts.1.mlp_s.fc2.weight", "module.blocks_ts.1.mlp_s.fc2.bias", "module.blocks_ts.1.mlp_t.fc1.weight", "module.blocks_ts.1.mlp_t.fc1.bias", "module.blocks_ts.1.mlp_t.fc2.weight", "module.blocks_ts.1.mlp_t.fc2.bias", "module.blocks_ts.2.norm1_s.weight", "module.blocks_ts.2.norm1_s.bias", "module.blocks_ts.2.norm1_t.weight", "module.blocks_ts.2.norm1_t.bias", "module.blocks_ts.2.attn_s.proj.weight", "module.blocks_ts.2.attn_s.proj.bias", "module.blocks_ts.2.attn_s.qkv.weight", "module.blocks_ts.2.attn_s.qkv.bias", "module.blocks_ts.2.attn_t.proj.weight", "module.blocks_ts.2.attn_t.proj.bias", "module.blocks_ts.2.attn_t.qkv.weight", "module.blocks_ts.2.attn_t.qkv.bias", "module.blocks_ts.2.norm2_s.weight", "module.blocks_ts.2.norm2_s.bias", "module.blocks_ts.2.norm2_t.weight", "module.blocks_ts.2.norm2_t.bias", "module.blocks_ts.2.mlp_s.fc1.weight", "module.blocks_ts.2.mlp_s.fc1.bias", "module.blocks_ts.2.mlp_s.fc2.weight", "module.blocks_ts.2.mlp_s.fc2.bias", "module.blocks_ts.2.mlp_t.fc1.weight", "module.blocks_ts.2.mlp_t.fc1.bias", "module.blocks_ts.2.mlp_t.fc2.weight", "module.blocks_ts.2.mlp_t.fc2.bias", "module.blocks_ts.3.norm1_s.weight", "module.blocks_ts.3.norm1_s.bias", "module.blocks_ts.3.norm1_t.weight", "module.blocks_ts.3.norm1_t.bias", "module.blocks_ts.3.attn_s.proj.weight", "module.blocks_ts.3.attn_s.proj.bias", "module.blocks_ts.3.attn_s.qkv.weight", "module.blocks_ts.3.attn_s.qkv.bias", "module.blocks_ts.3.attn_t.proj.weight", "module.blocks_ts.3.attn_t.proj.bias", "module.blocks_ts.3.attn_t.qkv.weight", "module.blocks_ts.3.attn_t.qkv.bias", "module.blocks_ts.3.norm2_s.weight", "module.blocks_ts.3.norm2_s.bias", "module.blocks_ts.3.norm2_t.weight", "module.blocks_ts.3.norm2_t.bias", "module.blocks_ts.3.mlp_s.fc1.weight", "module.blocks_ts.3.mlp_s.fc1.bias", "module.blocks_ts.3.mlp_s.fc2.weight", "module.blocks_ts.3.mlp_s.fc2.bias", "module.blocks_ts.3.mlp_t.fc1.weight", "module.blocks_ts.3.mlp_t.fc1.bias", "module.blocks_ts.3.mlp_t.fc2.weight", "module.blocks_ts.3.mlp_t.fc2.bias", "module.blocks_ts.4.norm1_s.weight", "module.blocks_ts.4.norm1_s.bias", "module.blocks_ts.4.norm1_t.weight", "module.blocks_ts.4.norm1_t.bias", "module.blocks_ts.4.attn_s.proj.weight", "module.blocks_ts.4.attn_s.proj.bias", "module.blocks_ts.4.attn_s.qkv.weight", "module.blocks_ts.4.attn_s.qkv.bias", "module.blocks_ts.4.attn_t.proj.weight", "module.blocks_ts.4.attn_t.proj.bias", "module.blocks_ts.4.attn_t.qkv.weight", "module.blocks_ts.4.attn_t.qkv.bias", "module.blocks_ts.4.norm2_s.weight", "module.blocks_ts.4.norm2_s.bias", "module.blocks_ts.4.norm2_t.weight", "module.blocks_ts.4.norm2_t.bias", "module.blocks_ts.4.mlp_s.fc1.weight", "module.blocks_ts.4.mlp_s.fc1.bias", "module.blocks_ts.4.mlp_s.fc2.weight", "module.blocks_ts.4.mlp_s.fc2.bias", "module.blocks_ts.4.mlp_t.fc1.weight", "module.blocks_ts.4.mlp_t.fc1.bias", "module.blocks_ts.4.mlp_t.fc2.weight", "module.blocks_ts.4.mlp_t.fc2.bias", "module.norm.weight", "module.norm.bias", "module.pre_logits.fc.weight", "module.pre_logits.fc.bias", "module.head.weight", "module.head.bias", "module.ts_attn.0.weight", "module.ts_attn.0.bias", "module.ts_attn.1.weight", "module.ts_attn.1.bias", "module.ts_attn.2.weight", "module.ts_attn.2.bias", "module.ts_attn.3.weight", "module.ts_attn.3.bias", "module.ts_attn.4.weight", "module.ts_attn.4.bias". 

I guess there is a mismatch between the checkpoint and the configuration file (and maybe code?). I am sure that I downloaded the checkpoint from the link in the inference.MD. Could you please double-check?

Note: I tried to load the checkpoint with all other configuration files in /configs/pose3d, none worked

Here is the code I am running
import os
import argparse
import torch
import torch.nn as nn
import os, sys
sys.path.append(os.getcwd())
from lib.utils.tools import *
from lib.utils.learning import *


def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--config", type=str, default="configs/pose3d/MB_ft_h36m_global_lite.yaml",
                        help="Path to the config file.")
    parser.add_argument('-e', '--evaluate', default='checkpoint/pose3d/FT_MB_lite_MB_ft_h36m_global_lite/best_epoch.bin',
                        type=str, metavar='FILENAME', help='checkpoint to evaluate (file name)')
    # parser.add_argument('-j', '--json_path', type=str, help='alphapose detection result json path')
    # parser.add_argument('-v', '--vid_path', type=str, help='video path')
    parser.add_argument('-o', '--out_path', type=str, help='output path')
    parser.add_argument('--pixel', action='store_true', help='align with pixle coordinates')
    parser.add_argument('--focus', type=int, default=None, help='target person id')
    parser.add_argument('--clip_len', type=int, default=243, help='clip length for network input')
    opts = parser.parse_args()
    return opts


opts = parse_args()
args = get_config(opts.config)

model_backbone = load_backbone(args)

print('Loading checkpoint', opts.evaluate)
checkpoint = torch.load(opts.evaluate, map_location="cpu")
model_backbone.load_state_dict(checkpoint['model_pos'], strict=True)

In-the-wild Inference input format

Hello and thanks for sharing your code.
May I please ask about the structure of the .json file needed for the In-the-wild Inference for 3D pose estimation? I want to use 2d estimations of another network other than AlphaPose and was not sure how to structure my 2d poses so it's compatible with your code.
Thanks in advance for your help.

demonstration of pose estimation

Hello, thanks to your wonderful work, i recently try to use motionbert, but it seems like it can only output some information like MPJPE. just wonder if i want to demo real time video pose estimation just like your animation in the cover, how shoulld i do? thanks you.

Input keypoint structure

Hi, the documentation says to use the H36M keypoint format or the Halpe 26 keypoints. Since these two formats differ and I’m trying to use YOLOv7 to extract the 2D poses, which keypoints and ordering does MotionBERT expect? Is there an example json available? Thank you😊

Real time application

Hi!

I was just wondering if you have some results on the speed and if this model (in the Lite variant) would be suitable for a real-time 3d pose estimation problem?

Thanks

how to preprocess NTU dataset?

The 3D coordinates I received are pixel values, can you help me how to convert them into values corresponding to 3D space?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.