walter0807 / motionbert Goto Github PK

[ICCV 2023] PyTorch Implementation of "MotionBERT: A Unified Perspective on Learning Human Motion Representations"

License: Apache License 2.0

Python 100.00%

3d-pose-estimation mesh-recovery skeleton-based-action-recognition iccv2023

motionbert's Introduction

MotionBERT: A Unified Perspective on Learning Human Motion Representations

This is the official PyTorch implementation of the paper "MotionBERT: A Unified Perspective on Learning Human Motion Representations" (ICCV 2023).

Installation

conda create -n motionbert python=3.7 anaconda
conda activate motionbert
# Please install PyTorch according to your CUDA version.
conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
pip install -r requirements.txt

Getting Started

Task	Document
Pretrain	docs/pretrain.md
3D human pose estimation	docs/pose3d.md
Skeleton-based action recognition	docs/action.md
Mesh recovery	docs/mesh.md

Applications

In-the-wild inference (for custom videos)

Please refer to docs/inference.md.

Using MotionBERT for human-centric video representations

'''	    
  x: 2D skeletons 
    type = <class 'torch.Tensor'>
    shape = [batch size * frames * joints(17) * channels(3)]
    
  MotionBERT: pretrained human motion encoder
    type = <class 'lib.model.DSTformer.DSTformer'>
    
  E: encoded motion representation
    type = <class 'torch.Tensor'>
    shape = [batch size * frames * joints(17) * channels(512)]
'''
E = MotionBERT.get_representation(x)

Hints

The model could handle different input lengths (no more than 243 frames). No need to explicitly specify the input length elsewhere.

The model uses 17 body keypoints (H36M format). If you are using other formats, please convert them before feeding to MotionBERT.

Please refer to model_action.py and model_mesh.py for examples of (easily) adapting MotionBERT to different downstream tasks.

For RGB videos, you need to extract 2D poses (inference.md), convert the keypoint format (dataset_wild.py), and then feed to MotionBERT (infer_wild.py).

Model Zoo

Model	Download Link	Config	Performance
MotionBERT (162MB)	OneDrive	pretrain/MB_pretrain.yaml	-
MotionBERT-Lite (61MB)	OneDrive	pretrain/MB_lite.yaml	-
3D Pose (H36M-SH, scratch)	OneDrive	pose3d/MB_train_h36m.yaml	39.2mm (MPJPE)
3D Pose (H36M-SH, ft)	OneDrive	pose3d/MB_ft_h36m.yaml	37.2mm (MPJPE)
Action Recognition (x-sub, ft)	OneDrive	action/MB_ft_NTU60_xsub.yaml	97.2% (Top1 Acc)
Action Recognition (x-view, ft)	OneDrive	action/MB_ft_NTU60_xview.yaml	93.0% (Top1 Acc)
Mesh (with 3DPW, ft)	OneDrive	mesh/MB_ft_pw3d.yaml	88.1mm (MPVE)

In most use cases (especially with finetuning), MotionBERT-Lite gives a similar performance with lower computation overhead.

TODO

Scripts and docs for pretraining
Demo for custom videos

Citation

If you find our work useful for your project, please consider citing the paper:

@inproceedings{motionbert2022,
  title     =   {MotionBERT: A Unified Perspective on Learning Human Motion Representations}, 
  author    =   {Zhu, Wentao and Ma, Xiaoxuan and Liu, Zhaoyang and Liu, Libin and Wu, Wayne and Wang, Yizhou},
  booktitle =   {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year      =   {2023},
}

motionbert's People

Contributors

Stargazers

Watchers

Forkers

jimzgchow stanleyjacob dezigns333 peterzs fangwudi stac-bot yanqi1811 yukihimex serdnad yonigozlan wenwen12321 h0han vvhj amorpheum nixsui qinb jiahongwu1995 xushuolin davidpengiupui knight-mao yur1clone xinxinatg githubaccountto daixiangzi zigchang yseoo sport-motion-analysis winson-du-ai donw7 shshjhjh4455 yanik-porto behradbeyglo leyangwen lyk7539511 arielmarilyn eyast lumpsoid w0lramd ma76538 thien0291 upbit autumn667 2132660698 zsxm1998 wangzhifengharrison log-xp dhwi96 yurkar2333 lqy61 jayjin jye16 viewsetting andrewboessen karinasanna kimx3966 zcloud2014 karahan-sahin devtimlas hologerry ayatsujitsukasa yaozhibo zhf231298 woringer404v christianingwersen yc-yuan baitian752 mathildepapillon dpduanpu yongyuan1995 kamkard knightzzz9w messmor moraalex paperwave syguan96 cxtaka markhill343 vuxminhan redcalabash kemto16 ydl832 lucas-mueller wyc51 sandstorm12 sangkim98 joris-gentinetta desdotl xezxey bossunwang kohsukeide abdusalamablimit yunchien77 dajiaohuang braca51e gillan-g simplaj holliemin9090 121649982 edensnufkin kyumly

motionbert's Issues

What is 2.5d_factor in preprocessed H3.6M data?

Hi, I looked through LCN and all data processing scripts, but none of them ever mentioned 2.5d-related fields. I wonder how is 2.5d_factor in the pkl file is calculated?

Finetune data format

hello @Walter0807 , I wanna finetune pose3d task for my own dataset, what data format should I prepare, like what's inside .pkl file. Now I have a 2D skeleton video and the json file from Alphapose, what should I do next.
Sorry for keeping bothering you.

About PoseTrack18 Dataset

The link here is invalid, can you share the data or update the link? Thanks a lot !

An error in infer_wild.py

Hello, I have met a difficulty and do not know how to solve it. My error is as follows. May I ask what happened and how to correct it? I'm using a Windows10 computer. Thank you so much!!

(motionbert) F:\DeepLearning\MotionBERT\MotionBERT-main> python infer_wild.py --vid_path video/me.mp4 --json_path video_json/vis_me.mp4.json --out_path output

Loading checkpoint checkpoint/pose3d/FT_MB_lite_MB_ft_h36m_global_lite/best_epoch.bin

0% | | 0/2 [00:00<?, ?it/s]L

oading checkpoint checkpoint/pose3d/FT_MB_lite_MB_ft_h36m_global_lite/best_epoch.bin

0% | | 0/2 [00:00<?, ?it/s]

Traceback (most recent call last):

File "< string>" , line 1, in < module>

File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\spawn.py", line 116, in spawn_main

exitcode = _main(fd, parent_sentinel)

File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\spawn.py", line 125, in _main

prepare(preparation_data)

File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\spawn.py", line 236, in prepare

_fixup_main_from_path(data['init_main_from_path'])

File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path

main_content = runpy.run_path(main_path,

File "F:\Software\Anaconda\envs\motionbert\lib\runpy.py", line 289, in run_path

return _run_module_code(code, init_globals, run_name,

File "F:\Software\Anaconda\envs\motionbert\lib\runpy.py", line 96, in _run_module_code

_run_code(code, mod_globals, init_globals,

File "F:\Software\Anaconda\envs\motionbert\lib\runpy.py", line 86, in _run_code

exec(code, run_globals)

File "F:\DeepLearning\MotionBERT\MotionBERT-main\infer_wild.py", line 70, in < module>

for batch_input in tqdm(test_loader):

File "F:\Software\Anaconda\envs\motionbert\lib\site-packages\tqdm\std.py", line 1178, in iter

for obj in iterable:

File "F:\Software\Anaconda\envs\motionbert\lib\site-packages\torch\utils\data\dataloader.py", line 439, in iter

self._iterator = self._get_iterator()

File "F:\Software\Anaconda\envs\motionbert\lib\site-packages\torch\utils\data\dataloader.py", line 390, in _get_iterator

return _MultiProcessingDataLoaderIter(self)

File "F:\Software\Anaconda\envs\motionbert\lib\site-packages\torch\utils\data\dataloader.py", line 1077, in init

w.start()

File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\process.py", line 121, in start

self._popen = self._Popen(self)

File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\context.py", line 224, in _Popen

return _default_context.get_context().Process._Popen(process_obj)

File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\context.py", line 336, in _Popen

return Popen(process_obj)

File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\popen_spawn_win32.py", line 45, in init

prep_data = spawn.get_preparation_data(process_obj._name)

File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\spawn.py", line 154, in get_preparation_data

_check_not_importing_main()

File "F:\Software\Anaconda\envs\motionbert\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main

raise RuntimeError('''

RuntimeError:

An attempt has been made to start a new process before the

current process has finished its bootstrapping phase.

This probably means that you are not using fork to start your

child processes and you have forgotten to use the proper idiom

in the main module:

if name == 'main':

freeze_support()

...

The "freeze_support()" line can be omitted if the program

is not going to be frozen to produce an executable.

Pre-Training Time

Hi,

Thank you for your great work! How long does the pretraining take on 8 V100 machines? Thanks!

about the root trans in 3d pose and mesh

Hello, this is really an impressive work, clean and fantastic. however, I got some questions wanna ask for help:

From the demo videos, the translations in 3d keypoints task relatively stable, but mesh looks like bumppy in vertical direction (height), is there a way to make use of all of their strength to make a multi-modality like model to make mesh trans more accurate?
Since looks like 3d pose result really good, but most situations we need rotations of the body (like SMPL), is there a way could get the rotations like SMPL from 3d pose directly?

about the speed

even though the model is not big, but the speed is quite slow, about 1s per frame, is that normal?

The formula notation in the paper

Thank you for your great work, and I have a question about one of the formulas in the paper. ◦ denotes element-wise production. But I didn't know ◦, what that symbol meant, could you tell me about it ?

About Comparison of Model Architectural Designs

Hello, thank you for your work.

In my opinion, it is inappropriate to compare methods (a) and (f). From what I understand, method (a) contains one S-T block in one module whereas method (f) contains S-T, T-S blocks.
That is, I think method (f) has twice as many parameters as method (a). So I think it would be more appropriate to compare by setting the depth to 10 in method (a).
thank you!

About half body mesh regression issue

Hello, I recently found mb seems not very good at half body mesh regression, just wonder if you have tested it before and what could be the reason caused this?

for image based model like PARE, SPIN, it can imaginary on blinded part, but for mb, it totally failed at this scene.

about 2d projection.

thanks for you great work.

in-the-wild RGB videos do not have 3d GT, no depth infomation or camera K, how to do the reprojection ?

The coordinate type of 3D pose

Dose the network output the camera coordinate of 3D poses?

Smpl Joint rotations.

About Amass trainset

Hello, may I ask if the GRAB and SOMA in the AMASS training set are not used in the training of the pre trained model? If they are used, code tools/compress_amass.py seems to be incorrect?

mesh prediction error

I run this code:
!python3 infer_wild_mesh.py --vid_path ./4.mp4 --json_path ./alphapose-results.json --out_path /content/MotionBERT

I have saved the best_epoch here:
/content/MotionBERT/checkpoint/mesh/FT_MB_release_MB_ft_pw3d/best_epoch.bin

Traceback (most recent call last):
File "/content/MotionBERT/infer_wild_mesh.py", line 64, in
smpl = SMPL(args.data_root, batch_size=1).cuda()
File "/content/MotionBERT/lib/utils/utils_smpl.py", line 62, in init
super(SMPL, self).init(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/smplx/body_models.py", line 133, in init
assert osp.exists(smpl_path), 'Path {} does not exist!'.format(
AssertionError: Path data/mesh does not exist!

Code release

Thanks for your great work! When will you release the code?

Prediction Scale

Hi,
I have questions about the dimensions of the predicted poses both in inference and evaluation code.
I noticed that the predictions of the network in the evaluation function in train.py are being multiplied by a factor and I traced it back to data['test']['2.5d_factor'] in h36m_sh_conf_cam_source_final.pkl. Could you please help me understand how these factors are being calculated?
Does this mean that the outputs of the network are not expected to have the correct scale of a human (in meters) and only the relative pose is the goal? especially in inference, I noticed that when running the inference code if I plot the outputs I notice a change in the dimensions of the person (even when applying that I guess comes from this, even when using the MB_ft_h36m model with rootrel set to True.

In general, it would be really appreciated if you could help me understand the scale of the output and how I can convert it to meters.

Thanks in advance for your help.

hello, does the MAED mesh head included in code mentioned in paper?

InstaVariety 2D keypoints preprocess

Hello, how do I generate the motion_all. npy and id_all. npy from the training dataset InstaVariety?
Not found on the page https://github.com/Walter0807/MotionBERT/blob/main/docs/pretrain.md thank you!

wild video infer too slow

Hello, may I ask, I deduce that my own video speed is very slow, 10s video takes more than ten minutes, and there are some warning messages, may I ask why?

MAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (923, 924) to (928, 928) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisib

le by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).

0% | ▋ | 1/296 [00:00 & lt; 04:16, 1.15 it/s] [

swscaler @ 000001df1adc4300] Warning: data is not aligned! This can lead to a speed loss

My testloader Settings are as follows: testloader_params = {

'batch_size': 1,

'shuffle': False,

'num_workers': 0,

'pin_memory': True,

'prefetch_factor': 2,

'persistent_workers': False,

'drop_last': False

}

Fine-tuning part layers

Hi,
You mentioned the fine-tuning part layers in your paper, but the code is fine-tuning the entire model, which is costly to calculate. May I ask what part layers refers to?

some train questions

Thanks for your great work! I have some questions about how this Dual-stream Spatio-temporal Transformer (DSTformer) can accelerate training in parallel. Besides, whether is T=243 too computationally intensive for T-MHSA.

Thank you very much!

Broken link pyskl

The link of pyskl in the action.md is broken. Is it possible to find any new link for this, or some similar reference?

Usage

Hi, thanks for the release, looks very cool!
Can you please give me a hint on how I can utilize your pre-trained model to inference on my own video?
Given a video, do I need to run 2D pose estimation first before I can use MotionBERT? Or do you already provide that?
How should I generate 3D points on my video?
How can I get the motion embedding? I tried:

E = MotionBERT.get_representation(x)

but get_representation does not exist!
Thank you!

If you could just give me high level hints I would appreciate it! Thanks!

Missing key(s) in state_dict: "temp_embed", "pos_embed",

Loading checkpoint checkpoint/pose3d/FT_MB_lite_MB_ft_h36m_global_lite/best_epoch.bin Traceback (most recent call last): File "/content/MotionBERT/infer_wild.py", line 45, in <module> model_backbone.load_state_dict(checkpoint['model_pos'], strict=True) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2041, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for DSTformer: Missing key(s) in state_dict: "temp_embed", "pos_embed", ... Unexpected key(s) in state_dict: "module.temp_embed", "module.pos_embed", ...

I had tried to solve this problem, according to this blog.
https://blog.csdn.net/yangwangnndd/article/details/100207686
however, there are more problems following.

I followed this guide
https://github.com/Walter0807/MotionBERT/blob/main/docs/inference.md
I use this 3dpose model
https://onedrive.live.com/?authkey=%21ALuKCr9wihi87bI&id=A5438CD242871DF0%21190&cid=A5438CD242871DF0

About Preprocessing

License Information

Hi! Thank you so much for sharing this code. Can you please include the license information so that we know the restrictions/limitations if there are any?

About the hips coordinates to world

Hi, I got some real-time 3d pose result and visualize in open3d, it looks good:

However, I am wondering how to mapping the hips cooridinates to realworld, I am currently +0.65 for the z axis, but not aligned well, looks like it should be some value in normalized height hips to height. Do u know what exactly value it is?

Can I inference with camera

How can I run the model with image

How to use hybridk's inference results for MB mesh inference

Hello, this is really an impressive work！
I have a question about how to use hybridk's 17 point output for mesh inference？I found that the output data of hybrik is inconsistent with the orientation of MB.

Some excellent demo

Hi, just post some FBX (not rending, in real 3D) demos here, the result is impressive:

Clip_len 24

Clip_len 48

the video I tested is a very challenge one, still get some nice result!

Just still have one issue, the poses might blink in middle of frames. Do u got any thoughts? What's more, what's the best clip len here for realtime applications? (we can't using too big clip len here in realtime)

how to get the action recognition result for custom videos

Hey thanks for this wonderful work, the performance of 2D-3D recontruction is just eye-openning. I am just wondering whether the action recognition inference code for custom video is released yet, I can only find the evaluation code for action recognition which is meant for NTU-RGBD dataset.

Velocity loss in the paper

Thank you for your great work. I would like to ask about the loss function mentioned in the paper and the part about speed loss. What is the meaning of adding speed loss.

Keypoint format

I get the coco 17 key-points or any other key-point format of my own custom data, and I know I should convert the coco format to human3.6, but how? The definition between coco and human 3.6 is different especially for the body. Is there any way to convert the format between these datasets?

About the conf input

Hello, I notice that the input can be with or without conf, but didn't saw any ablation on this part, if it uses conf, then it highly couple with the pose model itself (some models might didn't produce relatively high scores), does there any add conf or not the final metrices accordingly?

Something about train.

Thank you for your great work. I have a question for you as follows. I see that there are three training sections in the doc folder, which are pretrain, scratch and finetune. Is there any connection between these three? If I focus only on 2D key points to 3D key points, which one should I focus on? Thank you very much and look forward to your answer.

How to accelerate model infer speed

Hi，I use the script ‘infer_wild.sh’ to infer 3d pose. I have a gpu and can I use it or use other method to accelerate model rendering speed? I found the the GPU utilization rate is very low.

onedrive链接打不开

请问这是为什么呀

Mesh with HybrIK

Will you provide the code of mesh recovery using HybrIK (report in Table 3)? I’d appreciate it if you can release the code related to this part.

PyTorch version for reproducing the result

I want to reproduce the result of "3D Pose (H36M-SH, scratch), 39.1mm", but I only can get 40.0mm. So I want to know what is the PyTorch version you used to train the model?

Cannot load checkpoint in docs/inference.MD

Hi,

Thanks for the great work!

I tried to follow the instructions in docs/inference.MD and got the following error while loading the checkpoint:

Error logs

(motionbert) H4dr1en@H4dr1en MotionBERT % /opt/miniconda3/envs/motionbert/bin/python /Users/H4dr1en/projects/MotionBERT/infer_wild_test.py
Loading checkpoint checkpoint/pose3d/FT_MB_lite_MB_ft_h36m_global_lite/best_epoch.bin
Traceback (most recent call last):
  File "/Users/H4dr1en/projects/MotionBERT/infer_wild_test.py", line 37, in <module>
    model_backbone.load_state_dict(checkpoint['model_pos'], strict=True)
  File "/opt/miniconda3/envs/motionbert/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for DSTformer:
        Missing key(s) in state_dict: "temp_embed", "pos_embed", "joints_embed.weight", "joints_embed.bias", "blocks_st.0.norm1_s.weight", "blocks_st.0.norm1_s.bias", "blocks_st.0.norm1_t.weight", "blocks_st.0.norm1_t.bias", "blocks_st.0.attn_s.proj.weight", "blocks_st.0.attn_s.proj.bias", "blocks_st.0.attn_s.qkv.weight", "blocks_st.0.attn_s.qkv.bias", "blocks_st.0.attn_t.proj.weight", "blocks_st.0.attn_t.proj.bias", "blocks_st.0.attn_t.qkv.weight", "blocks_st.0.attn_t.qkv.bias", "blocks_st.0.norm2_s.weight", "blocks_st.0.norm2_s.bias", "blocks_st.0.norm2_t.weight", "blocks_st.0.norm2_t.bias", "blocks_st.0.mlp_s.fc1.weight", "blocks_st.0.mlp_s.fc1.bias", "blocks_st.0.mlp_s.fc2.weight", "blocks_st.0.mlp_s.fc2.bias", "blocks_st.0.mlp_t.fc1.weight", "blocks_st.0.mlp_t.fc1.bias", "blocks_st.0.mlp_t.fc2.weight", "blocks_st.0.mlp_t.fc2.bias", "blocks_st.1.norm1_s.weight", "blocks_st.1.norm1_s.bias", "blocks_st.1.norm1_t.weight", "blocks_st.1.norm1_t.bias", "blocks_st.1.attn_s.proj.weight", "blocks_st.1.attn_s.proj.bias", "blocks_st.1.attn_s.qkv.weight", "blocks_st.1.attn_s.qkv.bias", "blocks_st.1.attn_t.proj.weight", "blocks_st.1.attn_t.proj.bias", "blocks_st.1.attn_t.qkv.weight", "blocks_st.1.attn_t.qkv.bias", "blocks_st.1.norm2_s.weight", "blocks_st.1.norm2_s.bias", "blocks_st.1.norm2_t.weight", "blocks_st.1.norm2_t.bias", "blocks_st.1.mlp_s.fc1.weight", "blocks_st.1.mlp_s.fc1.bias", "blocks_st.1.mlp_s.fc2.weight", "blocks_st.1.mlp_s.fc2.bias", "blocks_st.1.mlp_t.fc1.weight", "blocks_st.1.mlp_t.fc1.bias", "blocks_st.1.mlp_t.fc2.weight", "blocks_st.1.mlp_t.fc2.bias", "blocks_st.2.norm1_s.weight", "blocks_st.2.norm1_s.bias", "blocks_st.2.norm1_t.weight", "blocks_st.2.norm1_t.bias", "blocks_st.2.attn_s.proj.weight", "blocks_st.2.attn_s.proj.bias", "blocks_st.2.attn_s.qkv.weight", "blocks_st.2.attn_s.qkv.bias", "blocks_st.2.attn_t.proj.weight", "blocks_st.2.attn_t.proj.bias", "blocks_st.2.attn_t.qkv.weight", "blocks_st.2.attn_t.qkv.bias", "blocks_st.2.norm2_s.weight", "blocks_st.2.norm2_s.bias", "blocks_st.2.norm2_t.weight", "blocks_st.2.norm2_t.bias", "blocks_st.2.mlp_s.fc1.weight", "blocks_st.2.mlp_s.fc1.bias", "blocks_st.2.mlp_s.fc2.weight", "blocks_st.2.mlp_s.fc2.bias", "blocks_st.2.mlp_t.fc1.weight", "blocks_st.2.mlp_t.fc1.bias", "blocks_st.2.mlp_t.fc2.weight", "blocks_st.2.mlp_t.fc2.bias", "blocks_st.3.norm1_s.weight", "blocks_st.3.norm1_s.bias", "blocks_st.3.norm1_t.weight", "blocks_st.3.norm1_t.bias", "blocks_st.3.attn_s.proj.weight", "blocks_st.3.attn_s.proj.bias", "blocks_st.3.attn_s.qkv.weight", "blocks_st.3.attn_s.qkv.bias", "blocks_st.3.attn_t.proj.weight", "blocks_st.3.attn_t.proj.bias", "blocks_st.3.attn_t.qkv.weight", "blocks_st.3.attn_t.qkv.bias", "blocks_st.3.norm2_s.weight", "blocks_st.3.norm2_s.bias", "blocks_st.3.norm2_t.weight", "blocks_st.3.norm2_t.bias", "blocks_st.3.mlp_s.fc1.weight", "blocks_st.3.mlp_s.fc1.bias", "blocks_st.3.mlp_s.fc2.weight", "blocks_st.3.mlp_s.fc2.bias", "blocks_st.3.mlp_t.fc1.weight", "blocks_st.3.mlp_t.fc1.bias", "blocks_st.3.mlp_t.fc2.weight", "blocks_st.3.mlp_t.fc2.bias", "blocks_st.4.norm1_s.weight", "blocks_st.4.norm1_s.bias", "blocks_st.4.norm1_t.weight", "blocks_st.4.norm1_t.bias", "blocks_st.4.attn_s.proj.weight", "blocks_st.4.attn_s.proj.bias", "blocks_st.4.attn_s.qkv.weight", "blocks_st.4.attn_s.qkv.bias", "blocks_st.4.attn_t.proj.weight", "blocks_st.4.attn_t.proj.bias", "blocks_st.4.attn_t.qkv.weight", "blocks_st.4.attn_t.qkv.bias", "blocks_st.4.norm2_s.weight", "blocks_st.4.norm2_s.bias", "blocks_st.4.norm2_t.weight", "blocks_st.4.norm2_t.bias", "blocks_st.4.mlp_s.fc1.weight", "blocks_st.4.mlp_s.fc1.bias", "blocks_st.4.mlp_s.fc2.weight", "blocks_st.4.mlp_s.fc2.bias", "blocks_st.4.mlp_t.fc1.weight", "blocks_st.4.mlp_t.fc1.bias", "blocks_st.4.mlp_t.fc2.weight", "blocks_st.4.mlp_t.fc2.bias", "blocks_ts.0.norm1_s.weight", "blocks_ts.0.norm1_s.bias", "blocks_ts.0.norm1_t.weight", "blocks_ts.0.norm1_t.bias", "blocks_ts.0.attn_s.proj.weight", "blocks_ts.0.attn_s.proj.bias", "blocks_ts.0.attn_s.qkv.weight", "blocks_ts.0.attn_s.qkv.bias", "blocks_ts.0.attn_t.proj.weight", "blocks_ts.0.attn_t.proj.bias", "blocks_ts.0.attn_t.qkv.weight", "blocks_ts.0.attn_t.qkv.bias", "blocks_ts.0.norm2_s.weight", "blocks_ts.0.norm2_s.bias", "blocks_ts.0.norm2_t.weight", "blocks_ts.0.norm2_t.bias", "blocks_ts.0.mlp_s.fc1.weight", "blocks_ts.0.mlp_s.fc1.bias", "blocks_ts.0.mlp_s.fc2.weight", "blocks_ts.0.mlp_s.fc2.bias", "blocks_ts.0.mlp_t.fc1.weight", "blocks_ts.0.mlp_t.fc1.bias", "blocks_ts.0.mlp_t.fc2.weight", "blocks_ts.0.mlp_t.fc2.bias", "blocks_ts.1.norm1_s.weight", "blocks_ts.1.norm1_s.bias", "blocks_ts.1.norm1_t.weight", "blocks_ts.1.norm1_t.bias", "blocks_ts.1.attn_s.proj.weight", "blocks_ts.1.attn_s.proj.bias", "blocks_ts.1.attn_s.qkv.weight", "blocks_ts.1.attn_s.qkv.bias", "blocks_ts.1.attn_t.proj.weight", "blocks_ts.1.attn_t.proj.bias", "blocks_ts.1.attn_t.qkv.weight", "blocks_ts.1.attn_t.qkv.bias", "blocks_ts.1.norm2_s.weight", "blocks_ts.1.norm2_s.bias", "blocks_ts.1.norm2_t.weight", "blocks_ts.1.norm2_t.bias", "blocks_ts.1.mlp_s.fc1.weight", "blocks_ts.1.mlp_s.fc1.bias", "blocks_ts.1.mlp_s.fc2.weight", "blocks_ts.1.mlp_s.fc2.bias", "blocks_ts.1.mlp_t.fc1.weight", "blocks_ts.1.mlp_t.fc1.bias", "blocks_ts.1.mlp_t.fc2.weight", "blocks_ts.1.mlp_t.fc2.bias", "blocks_ts.2.norm1_s.weight", "blocks_ts.2.norm1_s.bias", "blocks_ts.2.norm1_t.weight", "blocks_ts.2.norm1_t.bias", "blocks_ts.2.attn_s.proj.weight", "blocks_ts.2.attn_s.proj.bias", "blocks_ts.2.attn_s.qkv.weight", "blocks_ts.2.attn_s.qkv.bias", "blocks_ts.2.attn_t.proj.weight", "blocks_ts.2.attn_t.proj.bias", "blocks_ts.2.attn_t.qkv.weight", "blocks_ts.2.attn_t.qkv.bias", "blocks_ts.2.norm2_s.weight", "blocks_ts.2.norm2_s.bias", "blocks_ts.2.norm2_t.weight", "blocks_ts.2.norm2_t.bias", "blocks_ts.2.mlp_s.fc1.weight", "blocks_ts.2.mlp_s.fc1.bias", "blocks_ts.2.mlp_s.fc2.weight", "blocks_ts.2.mlp_s.fc2.bias", "blocks_ts.2.mlp_t.fc1.weight", "blocks_ts.2.mlp_t.fc1.bias", "blocks_ts.2.mlp_t.fc2.weight", "blocks_ts.2.mlp_t.fc2.bias", "blocks_ts.3.norm1_s.weight", "blocks_ts.3.norm1_s.bias", "blocks_ts.3.norm1_t.weight", "blocks_ts.3.norm1_t.bias", "blocks_ts.3.attn_s.proj.weight", "blocks_ts.3.attn_s.proj.bias", "blocks_ts.3.attn_s.qkv.weight", "blocks_ts.3.attn_s.qkv.bias", "blocks_ts.3.attn_t.proj.weight", "blocks_ts.3.attn_t.proj.bias", "blocks_ts.3.attn_t.qkv.weight", "blocks_ts.3.attn_t.qkv.bias", "blocks_ts.3.norm2_s.weight", "blocks_ts.3.norm2_s.bias", "blocks_ts.3.norm2_t.weight", "blocks_ts.3.norm2_t.bias", "blocks_ts.3.mlp_s.fc1.weight", "blocks_ts.3.mlp_s.fc1.bias", "blocks_ts.3.mlp_s.fc2.weight", "blocks_ts.3.mlp_s.fc2.bias", "blocks_ts.3.mlp_t.fc1.weight", "blocks_ts.3.mlp_t.fc1.bias", "blocks_ts.3.mlp_t.fc2.weight", "blocks_ts.3.mlp_t.fc2.bias", "blocks_ts.4.norm1_s.weight", "blocks_ts.4.norm1_s.bias", "blocks_ts.4.norm1_t.weight", "blocks_ts.4.norm1_t.bias", "blocks_ts.4.attn_s.proj.weight", "blocks_ts.4.attn_s.proj.bias", "blocks_ts.4.attn_s.qkv.weight", "blocks_ts.4.attn_s.qkv.bias", "blocks_ts.4.attn_t.proj.weight", "blocks_ts.4.attn_t.proj.bias", "blocks_ts.4.attn_t.qkv.weight", "blocks_ts.4.attn_t.qkv.bias", "blocks_ts.4.norm2_s.weight", "blocks_ts.4.norm2_s.bias", "blocks_ts.4.norm2_t.weight", "blocks_ts.4.norm2_t.bias", "blocks_ts.4.mlp_s.fc1.weight", "blocks_ts.4.mlp_s.fc1.bias", "blocks_ts.4.mlp_s.fc2.weight", "blocks_ts.4.mlp_s.fc2.bias", "blocks_ts.4.mlp_t.fc1.weight", "blocks_ts.4.mlp_t.fc1.bias", "blocks_ts.4.mlp_t.fc2.weight", "blocks_ts.4.mlp_t.fc2.bias", "norm.weight", "norm.bias", "pre_logits.fc.weight", "pre_logits.fc.bias", "head.weight", "head.bias", "ts_attn.0.weight", "ts_attn.0.bias", "ts_attn.1.weight", "ts_attn.1.bias", "ts_attn.2.weight", "ts_attn.2.bias", "ts_attn.3.weight", "ts_attn.3.bias", "ts_attn.4.weight", "ts_attn.4.bias". 
        Unexpected key(s) in state_dict: "module.temp_embed", "module.pos_embed", "module.joints_embed.weight", "module.joints_embed.bias", "module.blocks_st.0.norm1_s.weight", "module.blocks_st.0.norm1_s.bias", "module.blocks_st.0.norm1_t.weight", "module.blocks_st.0.norm1_t.bias", "module.blocks_st.0.attn_s.proj.weight", "module.blocks_st.0.attn_s.proj.bias", "module.blocks_st.0.attn_s.qkv.weight", "module.blocks_st.0.attn_s.qkv.bias", "module.blocks_st.0.attn_t.proj.weight", "module.blocks_st.0.attn_t.proj.bias", "module.blocks_st.0.attn_t.qkv.weight", "module.blocks_st.0.attn_t.qkv.bias", "module.blocks_st.0.norm2_s.weight", "module.blocks_st.0.norm2_s.bias", "module.blocks_st.0.norm2_t.weight", "module.blocks_st.0.norm2_t.bias", "module.blocks_st.0.mlp_s.fc1.weight", "module.blocks_st.0.mlp_s.fc1.bias", "module.blocks_st.0.mlp_s.fc2.weight", "module.blocks_st.0.mlp_s.fc2.bias", "module.blocks_st.0.mlp_t.fc1.weight", "module.blocks_st.0.mlp_t.fc1.bias", "module.blocks_st.0.mlp_t.fc2.weight", "module.blocks_st.0.mlp_t.fc2.bias", "module.blocks_st.1.norm1_s.weight", "module.blocks_st.1.norm1_s.bias", "module.blocks_st.1.norm1_t.weight", "module.blocks_st.1.norm1_t.bias", "module.blocks_st.1.attn_s.proj.weight", "module.blocks_st.1.attn_s.proj.bias", "module.blocks_st.1.attn_s.qkv.weight", "module.blocks_st.1.attn_s.qkv.bias", "module.blocks_st.1.attn_t.proj.weight", "module.blocks_st.1.attn_t.proj.bias", "module.blocks_st.1.attn_t.qkv.weight", "module.blocks_st.1.attn_t.qkv.bias", "module.blocks_st.1.norm2_s.weight", "module.blocks_st.1.norm2_s.bias", "module.blocks_st.1.norm2_t.weight", "module.blocks_st.1.norm2_t.bias", "module.blocks_st.1.mlp_s.fc1.weight", "module.blocks_st.1.mlp_s.fc1.bias", "module.blocks_st.1.mlp_s.fc2.weight", "module.blocks_st.1.mlp_s.fc2.bias", "module.blocks_st.1.mlp_t.fc1.weight", "module.blocks_st.1.mlp_t.fc1.bias", "module.blocks_st.1.mlp_t.fc2.weight", "module.blocks_st.1.mlp_t.fc2.bias", "module.blocks_st.2.norm1_s.weight", "module.blocks_st.2.norm1_s.bias", "module.blocks_st.2.norm1_t.weight", "module.blocks_st.2.norm1_t.bias", "module.blocks_st.2.attn_s.proj.weight", "module.blocks_st.2.attn_s.proj.bias", "module.blocks_st.2.attn_s.qkv.weight", "module.blocks_st.2.attn_s.qkv.bias", "module.blocks_st.2.attn_t.proj.weight", "module.blocks_st.2.attn_t.proj.bias", "module.blocks_st.2.attn_t.qkv.weight", "module.blocks_st.2.attn_t.qkv.bias", "module.blocks_st.2.norm2_s.weight", "module.blocks_st.2.norm2_s.bias", "module.blocks_st.2.norm2_t.weight", "module.blocks_st.2.norm2_t.bias", "module.blocks_st.2.mlp_s.fc1.weight", "module.blocks_st.2.mlp_s.fc1.bias", "module.blocks_st.2.mlp_s.fc2.weight", "module.blocks_st.2.mlp_s.fc2.bias", "module.blocks_st.2.mlp_t.fc1.weight", "module.blocks_st.2.mlp_t.fc1.bias", "module.blocks_st.2.mlp_t.fc2.weight", "module.blocks_st.2.mlp_t.fc2.bias", "module.blocks_st.3.norm1_s.weight", "module.blocks_st.3.norm1_s.bias", "module.blocks_st.3.norm1_t.weight", "module.blocks_st.3.norm1_t.bias", "module.blocks_st.3.attn_s.proj.weight", "module.blocks_st.3.attn_s.proj.bias", "module.blocks_st.3.attn_s.qkv.weight", "module.blocks_st.3.attn_s.qkv.bias", "module.blocks_st.3.attn_t.proj.weight", "module.blocks_st.3.attn_t.proj.bias", "module.blocks_st.3.attn_t.qkv.weight", "module.blocks_st.3.attn_t.qkv.bias", "module.blocks_st.3.norm2_s.weight", "module.blocks_st.3.norm2_s.bias", "module.blocks_st.3.norm2_t.weight", "module.blocks_st.3.norm2_t.bias", "module.blocks_st.3.mlp_s.fc1.weight", "module.blocks_st.3.mlp_s.fc1.bias", "module.blocks_st.3.mlp_s.fc2.weight", "module.blocks_st.3.mlp_s.fc2.bias", "module.blocks_st.3.mlp_t.fc1.weight", "module.blocks_st.3.mlp_t.fc1.bias", "module.blocks_st.3.mlp_t.fc2.weight", "module.blocks_st.3.mlp_t.fc2.bias", "module.blocks_st.4.norm1_s.weight", "module.blocks_st.4.norm1_s.bias", "module.blocks_st.4.norm1_t.weight", "module.blocks_st.4.norm1_t.bias", "module.blocks_st.4.attn_s.proj.weight", "module.blocks_st.4.attn_s.proj.bias", "module.blocks_st.4.attn_s.qkv.weight", "module.blocks_st.4.attn_s.qkv.bias", "module.blocks_st.4.attn_t.proj.weight", "module.blocks_st.4.attn_t.proj.bias", "module.blocks_st.4.attn_t.qkv.weight", "module.blocks_st.4.attn_t.qkv.bias", "module.blocks_st.4.norm2_s.weight", "module.blocks_st.4.norm2_s.bias", "module.blocks_st.4.norm2_t.weight", "module.blocks_st.4.norm2_t.bias", "module.blocks_st.4.mlp_s.fc1.weight", "module.blocks_st.4.mlp_s.fc1.bias", "module.blocks_st.4.mlp_s.fc2.weight", "module.blocks_st.4.mlp_s.fc2.bias", "module.blocks_st.4.mlp_t.fc1.weight", "module.blocks_st.4.mlp_t.fc1.bias", "module.blocks_st.4.mlp_t.fc2.weight", "module.blocks_st.4.mlp_t.fc2.bias", "module.blocks_ts.0.norm1_s.weight", "module.blocks_ts.0.norm1_s.bias", "module.blocks_ts.0.norm1_t.weight", "module.blocks_ts.0.norm1_t.bias", "module.blocks_ts.0.attn_s.proj.weight", "module.blocks_ts.0.attn_s.proj.bias", "module.blocks_ts.0.attn_s.qkv.weight", "module.blocks_ts.0.attn_s.qkv.bias", "module.blocks_ts.0.attn_t.proj.weight", "module.blocks_ts.0.attn_t.proj.bias", "module.blocks_ts.0.attn_t.qkv.weight", "module.blocks_ts.0.attn_t.qkv.bias", "module.blocks_ts.0.norm2_s.weight", "module.blocks_ts.0.norm2_s.bias", "module.blocks_ts.0.norm2_t.weight", "module.blocks_ts.0.norm2_t.bias", "module.blocks_ts.0.mlp_s.fc1.weight", "module.blocks_ts.0.mlp_s.fc1.bias", "module.blocks_ts.0.mlp_s.fc2.weight", "module.blocks_ts.0.mlp_s.fc2.bias", "module.blocks_ts.0.mlp_t.fc1.weight", "module.blocks_ts.0.mlp_t.fc1.bias", "module.blocks_ts.0.mlp_t.fc2.weight", "module.blocks_ts.0.mlp_t.fc2.bias", "module.blocks_ts.1.norm1_s.weight", "module.blocks_ts.1.norm1_s.bias", "module.blocks_ts.1.norm1_t.weight", "module.blocks_ts.1.norm1_t.bias", "module.blocks_ts.1.attn_s.proj.weight", "module.blocks_ts.1.attn_s.proj.bias", "module.blocks_ts.1.attn_s.qkv.weight", "module.blocks_ts.1.attn_s.qkv.bias", "module.blocks_ts.1.attn_t.proj.weight", "module.blocks_ts.1.attn_t.proj.bias", "module.blocks_ts.1.attn_t.qkv.weight", "module.blocks_ts.1.attn_t.qkv.bias", "module.blocks_ts.1.norm2_s.weight", "module.blocks_ts.1.norm2_s.bias", "module.blocks_ts.1.norm2_t.weight", "module.blocks_ts.1.norm2_t.bias", "module.blocks_ts.1.mlp_s.fc1.weight", "module.blocks_ts.1.mlp_s.fc1.bias", "module.blocks_ts.1.mlp_s.fc2.weight", "module.blocks_ts.1.mlp_s.fc2.bias", "module.blocks_ts.1.mlp_t.fc1.weight", "module.blocks_ts.1.mlp_t.fc1.bias", "module.blocks_ts.1.mlp_t.fc2.weight", "module.blocks_ts.1.mlp_t.fc2.bias", "module.blocks_ts.2.norm1_s.weight", "module.blocks_ts.2.norm1_s.bias", "module.blocks_ts.2.norm1_t.weight", "module.blocks_ts.2.norm1_t.bias", "module.blocks_ts.2.attn_s.proj.weight", "module.blocks_ts.2.attn_s.proj.bias", "module.blocks_ts.2.attn_s.qkv.weight", "module.blocks_ts.2.attn_s.qkv.bias", "module.blocks_ts.2.attn_t.proj.weight", "module.blocks_ts.2.attn_t.proj.bias", "module.blocks_ts.2.attn_t.qkv.weight", "module.blocks_ts.2.attn_t.qkv.bias", "module.blocks_ts.2.norm2_s.weight", "module.blocks_ts.2.norm2_s.bias", "module.blocks_ts.2.norm2_t.weight", "module.blocks_ts.2.norm2_t.bias", "module.blocks_ts.2.mlp_s.fc1.weight", "module.blocks_ts.2.mlp_s.fc1.bias", "module.blocks_ts.2.mlp_s.fc2.weight", "module.blocks_ts.2.mlp_s.fc2.bias", "module.blocks_ts.2.mlp_t.fc1.weight", "module.blocks_ts.2.mlp_t.fc1.bias", "module.blocks_ts.2.mlp_t.fc2.weight", "module.blocks_ts.2.mlp_t.fc2.bias", "module.blocks_ts.3.norm1_s.weight", "module.blocks_ts.3.norm1_s.bias", "module.blocks_ts.3.norm1_t.weight", "module.blocks_ts.3.norm1_t.bias", "module.blocks_ts.3.attn_s.proj.weight", "module.blocks_ts.3.attn_s.proj.bias", "module.blocks_ts.3.attn_s.qkv.weight", "module.blocks_ts.3.attn_s.qkv.bias", "module.blocks_ts.3.attn_t.proj.weight", "module.blocks_ts.3.attn_t.proj.bias", "module.blocks_ts.3.attn_t.qkv.weight", "module.blocks_ts.3.attn_t.qkv.bias", "module.blocks_ts.3.norm2_s.weight", "module.blocks_ts.3.norm2_s.bias", "module.blocks_ts.3.norm2_t.weight", "module.blocks_ts.3.norm2_t.bias", "module.blocks_ts.3.mlp_s.fc1.weight", "module.blocks_ts.3.mlp_s.fc1.bias", "module.blocks_ts.3.mlp_s.fc2.weight", "module.blocks_ts.3.mlp_s.fc2.bias", "module.blocks_ts.3.mlp_t.fc1.weight", "module.blocks_ts.3.mlp_t.fc1.bias", "module.blocks_ts.3.mlp_t.fc2.weight", "module.blocks_ts.3.mlp_t.fc2.bias", "module.blocks_ts.4.norm1_s.weight", "module.blocks_ts.4.norm1_s.bias", "module.blocks_ts.4.norm1_t.weight", "module.blocks_ts.4.norm1_t.bias", "module.blocks_ts.4.attn_s.proj.weight", "module.blocks_ts.4.attn_s.proj.bias", "module.blocks_ts.4.attn_s.qkv.weight", "module.blocks_ts.4.attn_s.qkv.bias", "module.blocks_ts.4.attn_t.proj.weight", "module.blocks_ts.4.attn_t.proj.bias", "module.blocks_ts.4.attn_t.qkv.weight", "module.blocks_ts.4.attn_t.qkv.bias", "module.blocks_ts.4.norm2_s.weight", "module.blocks_ts.4.norm2_s.bias", "module.blocks_ts.4.norm2_t.weight", "module.blocks_ts.4.norm2_t.bias", "module.blocks_ts.4.mlp_s.fc1.weight", "module.blocks_ts.4.mlp_s.fc1.bias", "module.blocks_ts.4.mlp_s.fc2.weight", "module.blocks_ts.4.mlp_s.fc2.bias", "module.blocks_ts.4.mlp_t.fc1.weight", "module.blocks_ts.4.mlp_t.fc1.bias", "module.blocks_ts.4.mlp_t.fc2.weight", "module.blocks_ts.4.mlp_t.fc2.bias", "module.norm.weight", "module.norm.bias", "module.pre_logits.fc.weight", "module.pre_logits.fc.bias", "module.head.weight", "module.head.bias", "module.ts_attn.0.weight", "module.ts_attn.0.bias", "module.ts_attn.1.weight", "module.ts_attn.1.bias", "module.ts_attn.2.weight", "module.ts_attn.2.bias", "module.ts_attn.3.weight", "module.ts_attn.3.bias", "module.ts_attn.4.weight", "module.ts_attn.4.bias".

I guess there is a mismatch between the checkpoint and the configuration file (and maybe code?). I am sure that I downloaded the checkpoint from the link in the inference.MD. Could you please double-check?

Note: I tried to load the checkpoint with all other configuration files in /configs/pose3d, none worked

Here is the code I am running

import os
import argparse
import torch
import torch.nn as nn
import os, sys
sys.path.append(os.getcwd())
from lib.utils.tools import *
from lib.utils.learning import *


def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--config", type=str, default="configs/pose3d/MB_ft_h36m_global_lite.yaml",
                        help="Path to the config file.")
    parser.add_argument('-e', '--evaluate', default='checkpoint/pose3d/FT_MB_lite_MB_ft_h36m_global_lite/best_epoch.bin',
                        type=str, metavar='FILENAME', help='checkpoint to evaluate (file name)')
    # parser.add_argument('-j', '--json_path', type=str, help='alphapose detection result json path')
    # parser.add_argument('-v', '--vid_path', type=str, help='video path')
    parser.add_argument('-o', '--out_path', type=str, help='output path')
    parser.add_argument('--pixel', action='store_true', help='align with pixle coordinates')
    parser.add_argument('--focus', type=int, default=None, help='target person id')
    parser.add_argument('--clip_len', type=int, default=243, help='clip length for network input')
    opts = parser.parse_args()
    return opts


opts = parse_args()
args = get_config(opts.config)

model_backbone = load_backbone(args)

print('Loading checkpoint', opts.evaluate)
checkpoint = torch.load(opts.evaluate, map_location="cpu")
model_backbone.load_state_dict(checkpoint['model_pos'], strict=True)

Can multi-person estimates be implemented?

In-the-wild Inference input format

Hello and thanks for sharing your code.
May I please ask about the structure of the .json file needed for the In-the-wild Inference for 3D pose estimation? I want to use 2d estimations of another network other than AlphaPose and was not sure how to structure my 2d poses so it's compatible with your code.
Thanks in advance for your help.

demonstration of pose estimation

Hello, thanks to your wonderful work, i recently try to use motionbert, but it seems like it can only output some information like MPJPE. just wonder if i want to demo real time video pose estimation just like your animation in the cover, how shoulld i do? thanks you.

Input keypoint structure

Hi, the documentation says to use the H36M keypoint format or the Halpe 26 keypoints. Since these two formats differ and I’m trying to use YOLOv7 to extract the 2D poses, which keypoints and ordering does MotionBERT expect? Is there an example json available? Thank you😊

Real time application

Hi!

I was just wondering if you have some results on the speed and if this model (in the Lite variant) would be suitable for a real-time 3d pose estimation problem?

Thanks

Hi, I use the training code and find something wrong.
https://github.com/Walter0807/MotionBERT/blob/ec48976542ba746fd1b48054502be03888fbab86/train.py#LL340C89-L340C89 The test dataloader include AMASS, and in dataset_motion_3d.py, 2d input use motion_file['data_input'] which is None when generating AMASS testset. So, how to use AMASS testset or just use H3.6M testset？ Looking forward to your reply, thx!

how to preprocess NTU dataset?

The 3D coordinates I received are pixel values, can you help me how to convert them into values corresponding to 3D space?