Git Product home page Git Product logo

lakshmi-sathi / evaluation-of-nvidias-camera-to-robot-pose-estimation-deep-learning-research Goto Github PK

View Code? Open in Web Editor NEW
6.0 1.0 0.0 214 KB

Evaluation of the Single-Image Camera-to-Robot Pose Estimation deep learning research by NVIDIA on the Jaco Gen 2 6DoF KG-3 Robot Arm from Kinova Robotics.

License: Other

Python 99.09% Shell 0.91%
kinova-robotics nvidia-research pose-estimation deep-learning auto-encoder robotics jaco-arm nvidia-dataset-synthesizer ndds unreal-engine

evaluation-of-nvidias-camera-to-robot-pose-estimation-deep-learning-research's Introduction

Evaluation-of-NVIDIAs-Camera-to-Robot-Pose-Estimation-Deep-Learning-Research

Introduction

Single-Image Pose Estimation as introduced by NVIDIA through their work is evaluated on the Jaco Gen2 Robot Arm from Kinova Robotics. More info on the Original work from NVIDIA here. Majority of the code and data is made available by them.

image

Image shows the network inference output on a synthetic image.

For achieving the target of this work (Single-Image Pose Estimation of Jaco Arm) the steps would be:

  • Generation of a Synthetic Dataset with randomized lighting, positions, interfering objects, colours, and configurations for the Jaco Gen 2 robot arm.
    • Rigging of Jaco Arm Model in Blender
    • Generation of Dataset with randomization using Unreal Engine + NDDS
  • Train the VGG auto-encoder network (convolutional layers pretrained on ImageNet) with this synthetic dataset and assess the performance using real images of the Jaco Robot arm.

Setup

Install the DREAM package and its dependencies using pip:
pip install . -r requirements.txt

Downloading Models and Data

Comment out the unwanted models and data in the 'DOWNLOAD.sh' scripts and run it for downloading the required data and trained models.
Example:-
cd trained_models; ./DOWNLOAD.sh; cd \..
cd data; ./DOWNLOAD.sh; cd \..

Single-image inference

Inference on one image using DREAM-vgg-Q pretrained network (on first image of Panda-3Cam dataset):
python scripts/network_inference.py -i trained_models/panda_dream_vgg_q.pth -m data/real/panda-3cam_realsense/000000.rgb.jpg

Training

Training a DREAM-vgg-Q model for the Jaco robot:
python scripts/train_network.py -i data/synthetic/jaco_synth_train_dr/ -t 0.8 -m manip_configs/jaco2.yaml -ar arch_configs/dream_vgg_q.yaml -e 25 -lr 0.00015 -b 128 -w 16 -o <path/to/output_dir/>

The model configurations are defined in the architecture files in 'arch_configs' directory. The synthetic dataset generated is not provided here due to the large size.

More information can be obtained from their official repo: link

Method Followed

The arm we wanted to train the network with is Jaco2. The arm model for Jaco was obtained from Kinova Robotics. The target is to generate a synthetic dataset for training the network. The tools going to be need are Blender, Unreal Engine and NDDS(Plugin).
Blender is an opensource tool for 3D modelling and animation and is readily available for download. Unreal Engine is an open-source game engine and NDDS is a domain randomized data synthesizer plugin for Unreal Engine developed by NVIDIA.
In order to obtain randomized arm configurations it is required to make the arm model moveable (Rigging) and Blender serves that purpose. The next step in the process is generate a randomized datset for which Unreal Engine and its plugin NDDS is used. Unreal Engine requires the '.FBX' format for the rigged model.
For this conversion of the model file obtained in '.STEP' format first it is converted from ‘.STEP’ to ‘.STL’ using FreeCAD and then Blender was used to convert '.STL' to '.FBX'. This FBX format is the one that can be opened in both Blender and, Unreal Engine which is used to generate a randomized dataset for Jaco.

image The image shows Jaco Arm model loaded in Blender.

The Blender 3D animation tool has an ‘Armature’ feature that enables us to setup a skeletal system for 3D models so that it can be moved and animated as required. This was one of the targets with using the Blender tool.

Here you can see the bones are assigned one per each segment of the Jaco robot arm from the base till the fingers tips. In graphics animation terms it is called “Rigging” the Jaco Arm model.

image

This skeleton is needed for later moving the Jaco arm to random positions for generation of the synthentic dataset.

Now after we make a skeletal system, we need to attach the mesh (The robot arm’s body) to it. That is were weight painting comes in. It is nothing but specifying which parts of the mesh are to be attached to which bone.
Steps:

  • Select bone
  • Paint the part to attach to that bone as Red
  • Paint the part to detach from that bone as Blue

A mesh is attached to a bone means when the bone moves, the mesh moves with it. In this video it can be observed how when the bone joints are moved, the mesh is following along. Also as each bone is selected we can see the part painted in red change, this is showing which parts are attached to which bone.

image

The Jaco arm with skeleton formed and weight painted.

After we have mesh moving as expected with the bones, we need to set constraints on the angular directions in which each joint can move (otherwise any joint can move in any direction, which is not the case in real world). In this image take note of the marked joint (red – x axis, blue – z axis, green – y axis), we can say by looking at the joint shape itself that it can only rotate in the x axis, so on the right hand side pane, in the rotation constraint settings of the bone, it is limited to rotate only on the x axis. Similarly the process is repeated for every joint including those of the fingers.

image

Setting of arm joint angle constraints.

Once we have the arm movement taken care of, we can then focus on defining animations for the arm. Creating a number of different animations is done as a means of simulating random movement, in the sense that at a point, any of these animations could be playing and that in a way brings an effect of using actual random arm configurations.
This step becomes a whole lot easier with Blender because it allows us to just specify a few number of configurations of the arm each for a specific range of time and then Blender fills in the rest of the frames, completing the animation for us. This means we only specify poses at 4 or 5 frames and the rest let’s say 300 frames gets filled in by Blender, that's one animation and a number of these is created, it can be observed in this video.

image

Animating in Blender.

UE4 or Unreal Engine 4, though it is a Game Engine, since it is open source, it makes for a good research platform. NVIDIA has developed the whole tool called NVIDIA Dataset Synthesiser (NDDS) specifically for this platform which leverages its capabilities for generating synthetic datasets for training of neural networks.

image

Here can be seen the Unreal Engine 4 Open Source Game Engine. In that NDDS is loaded as a project. In the middle top you can see the scene window where the 3D virtual world is. At the right hand top side you can see all the items placed in the 3D virtual world. At the bottom is the Content Browser. In the NDDS project we can see two very valuable folders “DomainRandomisationDNNContent” and “NVSceneCapturerContent”. These folders contain items for introducing Domain Randomisation and Data Collection/Synthesis respectively that can be added to the scene.

Here’s a feel for how it looks like when the data capture is happening in UE4 as the scene is Domain Randomised: video (In this only a single animation of the arm is getting played back, in an actual data capture there will be multiple animations played out randomly)

Here and encoder-decoder model is used for the neural network as in the referenced work. CNN networks VGG is considered for the encoder section. The whole output of the network are ‘belief_maps’ (seen as black squares in the image with white spot where the corresponding joint exists). They are nothing but a representations of where the joints are in the 2D image. In the referenced work, from this 2D keypoints the actual 3D locations of the robot are also calculated using a mathematical algorithm which takes in even the camera intrinsics information.

image
Image Courtesy: Original work from NVIDIA

Jaco arm model rigged and animated is dragged and dropped into the scene. The moment we click ‘Play’ button, the animation that was saved starts playing. The stutter seen is as it requires a significant amount of processing power/memory. Like this the arm is imported to the NDDS project and then next is to set the randomisations, the scene capturer (virtual camera), and then extract data. image
This is how it looks.

Reference/Main Work

Lee, Timothy E and Tremblay, Jonathan and To, Thang and Cheng, Jia and Mosier, Terry and Kroemer, Oliver and Fox, Dieter and Birchfield, Stan, "Camera-to-Robot Pose Estimation from a Single Image", International Conference on Robotics and Automation (ICRA), 2020. https://arxiv.org/abs/1911.09231

evaluation-of-nvidias-camera-to-robot-pose-estimation-deep-learning-research's People

Contributors

lakshmi-sathi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

evaluation-of-nvidias-camera-to-robot-pose-estimation-deep-learning-research's Issues

Doubt regarding Weight Painting

Hi @lakshmi-sathi,
I was trying to generate the animation video as you suggested. I am stuck at the weight painting step. Can you elaborate on this a bit? Also, I am not able to access the video links that you have referred to in the readme. It would be of great help to me if you can give me access to it.

Thank-you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.