Git Product home page Git Product logo

ray3d's Introduction

PWC

Ray3D: ray-based 3D human pose estimation for monocular absolute 3D localization

This repository contains the implementation of the approach described in the paper:

Yu Zhan, Fenghai Li, Renliang Weng, and Wongun Choi. Ray3D: ray-based 3D human pose estimation for monocular absolute 3D localization. In Conference on Computer Vision and Pattern Recognition (CVPR), 2022.

3D pose estimation by Ray3D in the world coordinate system

Dashed lines denote 3D ground-truth poses. Solid lines represent the poses estimated by Ray3D.

Quick start

Dependencies

Please make sure you have the following dependencies installed before running:

  • python 3
  • torch==1.4.0
  • other necessary dependencies requirements.txt
  • (optional) screen, rsync

Dataset

Human3.6M

We use the data processed by Videopose. You can generate the files by yourself as well, or you can download them from google drive.

MPI-INF-3DHP

3DHP is set up by our own script. You can download the original dataset and generate the files with the following command:

# set up the 'data_root' parameter which stores the original 3DHP data
python3 prepare_data_3dhp.py

Or you can directly download the processed data from google drive.

HumanEva-I

We set up HumanEva-I by following the procedure provided by Videopose. You can set it up by yourself, or you can download the files from google drive.

Synthetic

The synthetic dataset is set up based on Human3.6M. Once you have the 'data_3d_h36m.npz' file generated, you can generate the synthetic dataset with following procedure:

# 1). generate synthetic data for camera intrinsic test
python3 camera_intrinsic.py

Then, run the following preprocessing script:

# 2). generate synthetic data for camera extrinsic test
python3 camera_augmentation.py

Finally, use the following preprocessing script to generate training file for synthetic training

# 3). generate train and evaluation file for synthetic training
python3 aggregate_camera.py

Description

We train and test five approaches on the above mentioned datasets.

  • Ray3D: implemented in the main branch.
  • RIE: implemented in the main branch.
  • Videopose: implemented in the videopose branch.
  • Poseformer: implemented in the poseformer branch.
  • Poselifter: implemented in the poselifter branch.

We release the pretrained models for academic purpose. You can create a folder named checkpoint to store all the pretrained models.

Train

Please turn on visdom before you start training a new model.

To train the above mentioned methods, you can run the following command by specifying different configuration file in the cfg folder:

python3 main.py --cfg cfg_ray3d_3dhp_stage1

To train Ray3D with synthetic data, please use the codes from synthetic branch. We did some optimization for large scale training.

Evaluation

To evaluate the models on the public and the synthetic datasets, you can run the following command by specifying different configuration files, timestamps and checkpoints:

python3 main.py \
    --cfg cfg_ray3d_h36m_stage3 \
    --timestamp Oct_31_2021_05_43_36 \
    --evaluate best_epoch.bin

To evaluate Ray3D on the synthetic dataset with 14-joint set up, please use these scripts.

Visualization

We use the same visualization techniques provided by VideoPose3D. You can perform visualization with the following command:

python3 main.py \
    --cfg cfg_ray3d_h36m_stage3 \
    --timestamp Oct_31_2021_05_43_36 \
    --evaluate best_epoch.bin \
    --render

License

This work is licensed under Apache-2.0. See LICENSE for details.

Citation

If you find this repository useful, please cite our paper:

@InProceedings{Zhan_2022_CVPR,
    author    = {Zhan, Yu and Li, Fenghai and Weng, Renliang and Choi, Wongun},
    title     = {Ray3D: Ray-Based 3D Human Pose Estimation for Monocular Absolute 3D Localization},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {13116-13125}
}

Acknowledgement

Our implementation took inspiration from the following authors and repositories:

We thank the authors for kindly releasing their codes!

ray3d's People

Contributors

yxzhxn avatar

Stargazers

Yiqun Wang avatar  avatar  avatar 西西吉 avatar Kaiwen DONG avatar ZP-Guo avatar  avatar tm avatar haoqiang avatar  avatar Bili avatar Zhixin Piao avatar  avatar 黄云龙 avatar predator_c avatar Yating Tian avatar yingqin She avatar  avatar Zhongpai Gao avatar  avatar Jeong-jun Choi avatar liheyuan avatar  avatar Eason avatar  avatar  avatar Momo the Monster avatar  avatar Blark avatar  avatar Haixing avatar  avatar  avatar  avatar Nicolas Dedual avatar Congju Du avatar  avatar Zhenzhi Wang avatar Haofan Wang avatar Jingze avatar Grace123 avatar wen avatar  avatar  avatar Bradley Scott avatar Nhat Huy (Henry) Phan avatar Hanbyel Cho avatar Tyler Luan avatar  avatar  avatar  avatar Praveen Kumar Rajendran avatar  avatar Pan Mingjie avatar  avatar Yahui Zhang avatar Leopold avatar squarehead avatar Denis Tome' avatar xing-shuai avatar cvhadessun avatar m1nt avatar Collin Yao avatar Hungsing avatar Sehee Min avatar Rikoo avatar Andrei Burov avatar  avatar  avatar  avatar Wong Yachen avatar  avatar Yuxuan Mu avatar  avatar Xinyun Wang avatar Emmanuel Durand avatar  avatar JooHo Kim avatar Fukahire avatar  avatar FeiyiXu avatar SeanD avatar  avatar Junfa Liu avatar Xavier Weber avatar YeChongjie avatar  avatar Wenkang Shan avatar Yufei Wen avatar Hongwei Yi avatar huanluo avatar xuheny avatar Bilgee Bayaraa avatar  avatar Tanvir Hassan avatar Classic D Liu avatar  avatar  avatar zhmchen avatar 爱可可-爱生活 avatar

Watchers

Jimmy Yu avatar  avatar Snow avatar  avatar Ti Wang avatar

ray3d's Issues

Predictions wont visualize, only GTs do

Thanks for your great work! The code is visualizing the ground truths but not the predictions and the input image. What am I missing? I am running the main branch, and using the eva dataset. I get in the logs "MovieWriter imagemagick unavailable". I have also tried by using ffmpeg, but same results in mp4 video.
eva

what is dapalib? I can't find any way to download library named dapalib

pip install -r requirements.txt
show me the errors as follow:
ERROR: Could not find a version that satisfies the requirement dapdalib (from versions: none)
ERROR: No matching distribution found for dapdalib

so I'd like to know which platform does this code run on ? Windows or Linux?
How did you download the dapalib library? I even can't find any useful information on Google

Thanks so much if I can get help from you.

运行单个视频的问题

您好,请问可以发布一个简单运行的方式吗? 比如我要用这个模型运行一段视频或者说是图片。
python test.py xxx/xx.jpg(xxx.avi)这个样子的。
然后返回关键点的3D绝对位置。
麻烦了

关于aggregate_camera

大佬我想问一下在aggregatec_camera.py文件中的Train/json,training.json,pitch/json文件是怎么生成的,我第一次接触不是很了解,我是直接下载了你提供的数据集。

when use in wild videos, what affect its performance?

Great works first!
I make a h36m-like custom data. And I use the h36m pretrained model, but the result is not good.
ray3dh36m
I have some questions.
1.Does h36m with aug will better ? Or other datasets pretrained model performance good for wild video?
2.When we use in the wild, what procedure do you recommend?
For example,
Use custom database train. Use same camera record video which has a intrinsic params when inference.

3dhp data required

Thanks for your wonderful job! Where can we download 3dhp data like '/Ray3D/3dhp/S1/Seq1/imageSequence/video_0/img_000001.jpg' ?

error results on pretrained model

I have downloaded your pretrained model, but it generated error results when i run the code to evaluate it. Can you tell me the reason?Thanks a lot!
image

Question about the camera parameters of the Human3.6M dataset

Hi, thanks for your excellent work and open source~
I am little confused about the difference of the H36M's camera parameters provided by this repo and VideoPose3D. Details are as follows.

  1. tangential_distortion in the h36m_cameras_intrinsic_params seems to reverse left and right.
  2. R and translation in the h36m_cameras_extrinsic_params are very different from VideoPose3D. I was wondering how to determine the rotation and translation of each camera and each subject.

Looking forward to your reply.

关于预训练模型预测时结果错误的问题

大佬您好!非常感谢您所做的杰出工作。我想请问一下,我下载您在ReadMe中的预训练模型和数据集后,在测试时发现预测结果相差很大,请问是什么情况。谢谢大佬!
图片
对于checkpoint/RAY3DRIEX_h36m_3_RIE_FRAME9_LR0.0005_EPOCH20_BATCH1024_Oct_31_2021_05_43_36/configs/data_config.json文件的配置:
{ "DATASET": "h36m", "WORLD_3D_GT_EVAL": true, "KEYPOINTS": "gt", "TRAIN_SUBJECTS": "S1,S5,S6,S7,S8", "TEST_SUBJECTS": "S9,S11", "GT_3D": "E:/HPE/Ray3D-main/Ray3D-main/dataset/data_3d_h36m.npz", "GT_2D": "E:/HPE/Ray3D-main/Ray3D-main/dataset/data_2d_h36m_gt.npz", "CAMERA_PARAM": "", "SUBSET": 1, "STRIDE": 1, "DOWNSAMPLE": 1, "ACTIONS": "*", "REMOVE_IRRELEVANT_KPTS": false, "FRAME_PATH": "/ssd/yzhan/data/benchmark/3D/showroom/20210702/frame/", "INTRINSIC_ENCODING": false, "RAY_ENCODING": true }
对于cfg文件夹下的文件路径也做了相应的调整后,仍然未解决该问题。

相机augmentation的问题

大佬好!

在论文附录中有对Human3.6M的相机内参、外参分别做了augmentation,生成多个“虚拟相机”。 请问下这一步具体是怎么做的呢?

我查了下Human3.6M并不是合成数据集,如何生成相机外参aug(比如说旋转后)的图像呢?

Testing on MPI-INF-3DHP

Hi, thanks for the great work. Could you please say how can I train & test MPI-INF-3DHP using 14 joint skeleton structure? And, do I need to prepare seperately 'data_2d_3dhp_gt.npz' which has only 14 joints or the one you shared is gonna work?

3dhp 2d keypoints

Firstly, this is really interesting work, kuddos!
can you please explain what
'WORLD_3D_GT_EVAL'
represents in the config files?
Also, do you all convert 3d coordinates to meters?
If so can you point me to this?

Lastly, if I set
'RAY_ENCODING'
to False, in the config filethis will return screen normalized 2d keypoints and 3d keypoints in the camera reference frame that are root centered correct?

2d keypoint npz for test videos

I have been doing some visualization of overlaying the 2D keypoints on the test image sequences released in MPI and I am noticing that the keypoints appear to be sampled at a different rate than the test videos. For examples, TS1 has agreement up to around frame 700 and then the 2D poses skip ahead and remain that way for the rest of the sequence. Do you have any idea why this might be occurring?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.