Git Product home page Git Product logo

cdrnet's Introduction

Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled Representation

Alt text Alt text

This is a Tensorflow implementation of the paper "Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled Representation" by Remelli, et al[1]. The network is trained on the Human3.6M data set[2]. I'm not part of the authors and this implementation is an independent implementation.

Here you will find the training code for the canonical fusion model described in the paper. This implementation is based on the paper only so some minor details are different from the author's intention. The code here will let you train end to end. I've also included testing code as well as a trained model. However, you will need to download the Human3.6M data. I wrote this implementation for some personal testing. But since the authors did not release an official implementation, I've uploaded mine for anyone interested in testing it.

Requirements:
-Tensorflow 2.4.1
-cdflib
-opencv
-matplotlib
-scipy
The requirements can all be installed through pip.

How-to
-Clone the repository.
-Download Human3.6M data.
-Place video data into Data folder and label data into Labels folder.
-Download the trained model if you want to test.
-Run the train_and_test.py script. Choose train or test in the script.

Trained Model
This model is trained on the Human3.6M data.
click here

Training
The network was trained on a GTX1080Ti. It will take about 9 to 12 hours of training. You should be able to get the loss value down to 2.4~2.5 MSE. The differentiable DLT layer descirbed in the paper is included but not used to calculate a loss. I found serious instability issues in using the layer. But the code is included if you want to use it. Simply replace _dummy_loss with _DLT_loss in the model.compile() call.

Note:
Subject9 has some incorrect labels that will significantly hinder network training. Make sure to remove these wrong labeled data. Please check the S9_training_vids.txt for the list of videos that can be used for training.

References:

  1. Edoardo Remelli, Shangchen Han, Sina Honari, Pascal Fua, Robert Wang. Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled Representation. arXiv:2004.02186

  2. Catalin Ionescu, Dragos Papava, Vlad Olaru and Cristian Sminchisescu, Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, No. 7, July 2014

cdrnet's People

Contributors

temugeb avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.