Git Product home page Git Product logo

htt's Introduction

Hierarchical Temporal Transformer for 3D Hand Pose Estimation and Action Recognition from Egocentric RGB Videos

Original implementation of the paper Yilin Wen, Hao Pan, Lei Yang, Jia Pan, Taku Komura and Wenping Wang, "Hierarchical Temporal Transformer for 3D Hand Pose Estimation and Action Recognition from Egocentric RGB Videos", CVPR, 2023. [Paper][Supplementary Video]

A version of extended abstract was accpeted by the Human Body, Hands, and Activities from Egocentric and Multi-view Cameras Workshop, ECCV, 2022. [Extended Abstract]

Requirements

Environment

The code is tested with the following environment:

Ubuntu 20.04
python 3.9
pytorch 1.10.0
torchvision 0.11.0

Other dependent packages as included in requirements.txt can be installed by pip. Note that we also refer to the utility functions in libyana. To install this libyana library, we follow LPC, CVPR 2020 to run:

pip install git+https://github.com/hassony2/[email protected]

Data Preprocessing

To facilitate computation, for downloaded FPHA and H2O datasets: We resize all images into the 480x270 pixels, and use lmdb to manage the training images. One may refer to the preprocess_utils.py for related functions.

Pretrained Model

Our pretrained weights for FPHA and H2O, and other related data for running the demo code of the inference stage can be downloaded via the following link: [Inference Data] (included in the ws.zip)

which includes:

  1. ./ckpts/: The pretrained ckpt files for the FPHA and H2O datasets.
  2. ./curves/: .npz files for visualizing the 3D PCK(-RA) at different error thresholds on FPHA and H2O.

You may keep the downloaded ws folder under the root directory of this git repository.

Quick Start

Plot 3D PCK(-RA) Curves for Hand Pose Estimation

Run

python plot_pck_curves.py

to plot the curves the 3D PCK(-RA) at different error thresholds on FPHA and H2O.

Evaluation for Hand Pose Estimation and Action Recognition

Run

CUDA_VISIBLE_DEVICES=0 python eval.py --batch_size <batch_size> \
--val_split <val_split> --train_dataset <dataset> --val_dataset <dataset> \
--dataset_folder <path_to_dataset_root> \
--resume_path <path_to_pth>

for evaluation on the dataset and split given by <dataset> and <val_split>.

Note that for the test split of H2O, we report the hand MEPE and action recall rate by referring to our submitted results in the H2O challenge codalab.

Training

Run python train.py with parsed arguments to train a network with regard to your training data.

Acknowledgement

For the transformer architecture, we rely on the code of DETR, ECCV 2020 and Attention is All You Need, NeurIPS 2017.

For evaluation of 3D hand pose estimation, we follow the code of libyana and original ColorHandPose3DNetwork, ICCV 2017.

For data processing and augmentation, resnet architecture, and other utility functions, our code is heavily relied on the code of LPC, CVPR 2020 and libyana.

Citiation

If you find this work helpful, please consider citing

@article{wen2023hierarchical,
  title={Hierarchical Temporal Transformer for 3D Hand Pose Estimation and Action Recognition from Egocentric RGB Videos},
  author={Wen, Yilin and Pan, Hao and Yang, Lei and Pan, Jia and Komura, Taku and Wang, Wenping},
  journal={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2023}
}

htt's People

Contributors

fylwen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

htt's Issues

Custom dataset

Thank you so much for your work!

I'm wondering if this model can be applied to any custom data without GT labels? Could you offer code for reference? Thanks a lot for your reply!

Inference In wild video?

Hello, thanks for share !
I wonder if the model only support first-person view?
And is there any code for inferencing in wild video?

preprocess_utils.py has incompatible function

I think there's incompatible function in preprocess_utils.py 38 line.

  • Original
    frame_path_dst = os.listdir(fhb_rgb_dst, subj, action, seq, "color", frame)

  • After
    frame_path_dst = os.path.join(fhb_rgb_dst, subj, action, seq, "color", frame)

RuntimeError: shape '[-1, 128, 512]' is invalid for input of size 16384

Hello, thanks for share !
I encountered an error while training:
Traceback (most recent call last):
File "/home/zhangjiabao/data/1/zhangjiabao/mesh/HTT/train.py", line 238, in
main(args)
File "/home/zhangjiabao/data/1/zhangjiabao/mesh/HTT/train.py", line 131, in main
epochpass.epoch_pass(
File "/home/zhangjiabao/data/1/zhangjiabao/mesh/HTT/netscripts/epochpass.py", line 50, in epoch_pass
loss, results, losses = model(batch)
File "/home/zhangjiabao/data/1/zhangjiabao/miniconda3/envs/handmesh/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zhangjiabao/data/1/zhangjiabao/miniconda3/envs/handmesh/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/zhangjiabao/data/1/zhangjiabao/miniconda3/envs/handmesh/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/zhangjiabao/data/1/zhangjiabao/miniconda3/envs/handmesh/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/zhangjiabao/data/1/zhangjiabao/miniconda3/envs/handmesh/lib/python3.9/site-packages/torch/_utils.py", line 434, in reraise
raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/zhangjiabao/data/1/zhangjiabao/miniconda3/envs/handmesh/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/zhangjiabao/data/1/zhangjiabao/miniconda3/envs/handmesh/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zhangjiabao/data/1/zhangjiabao/mesh/HTT/models/htt.py", line 169, in forward
batch_seq_ain_feature=flatten_ain_feature.contiguous().view(-1,self.ntokens_action,flatten_ain_feature.shape[-1])
RuntimeError: shape '[-1, 128, 512]' is invalid for input of size 16384


The shape doesn't match. I think one of the parameters is self.ntokens_action, do I think the initial settings for this parameter are customized?
May I know how to solve the problem of parameter matching.

About Train time

hello!for training,if i use 3090Ti,how many hours for an epoch

About H2O test

Thank you very much for your work!
I am currently facing some issues

how can I report the hand MEPE and action recall rate by referring to our submitted results?
how to generate the handposes.json file from eval.py

About Experiments

Hey, thank you for your work.

In utils.py->to25Dbranch, why inp_res = [256,256], but inp_res=[480, 270]? And, why the trans_factor =100, scale_factor = 0.0001?

Thank you, looking for your reply!

Training

Hello, may I ask if the FPHA dataset of your HTT model is open-sourced for training

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.