Evaluation-of-NVIDIAs-Camera-to-Robot-Pose-Estimation-Deep-Learning-Research

Introduction

Single-Image Pose Estimation as introduced by NVIDIA through their work is evaluated on the Jaco Gen2 Robot Arm from Kinova Robotics. More info on the Original work from NVIDIA here. Majority of the code and data is made available by them.

Image shows the network inference output on a synthetic image.

For achieving the target of this work (Single-Image Pose Estimation of Jaco Arm) the steps would be:

Generation of a Synthetic Dataset with randomized lighting, positions, interfering objects, colours, and configurations for the Jaco Gen 2 robot arm.
- Rigging of Jaco Arm Model in Blender
- Generation of Dataset with randomization using Unreal Engine + NDDS
Train the VGG auto-encoder network (convolutional layers pretrained on ImageNet) with this synthetic dataset and assess the performance using real images of the Jaco Robot arm.

Setup

Install the DREAM package and its dependencies using pip:

pip install . -r requirements.txt

Downloading Models and Data

Comment out the unwanted models and data in the 'DOWNLOAD.sh' scripts and run it for downloading the required data and trained models.
Example:-

cd trained_models; ./DOWNLOAD.sh; cd \..
cd data; ./DOWNLOAD.sh; cd \..

Single-image inference

Inference on one image using DREAM-vgg-Q pretrained network (on first image of Panda-3Cam dataset):

python scripts/network_inference.py -i trained_models/panda_dream_vgg_q.pth -m data/real/panda-3cam_realsense/000000.rgb.jpg

Training

Training a DREAM-vgg-Q model for the Jaco robot:

python scripts/train_network.py -i data/synthetic/jaco_synth_train_dr/ -t 0.8 -m manip_configs/jaco2.yaml -ar arch_configs/dream_vgg_q.yaml -e 25 -lr 0.00015 -b 128 -w 16 -o <path/to/output_dir/>

The model configurations are defined in the architecture files in 'arch_configs' directory. The synthetic dataset generated is not provided here due to the large size.

More information can be obtained from their official repo: link

Method Followed

The arm we wanted to train the network with is Jaco2. The arm model for Jaco was obtained from Kinova Robotics. The target is to generate a synthetic dataset for training the network. The tools going to be need are Blender, Unreal Engine and NDDS(Plugin).
Blender is an opensource tool for 3D modelling and animation and is readily available for download. Unreal Engine is an open-source game engine and NDDS is a domain randomized data synthesizer plugin for Unreal Engine developed by NVIDIA.
In order to obtain randomized arm configurations it is required to make the arm model moveable (Rigging) and Blender serves that purpose. The next step in the process is generate a randomized datset for which Unreal Engine and its plugin NDDS is used. Unreal Engine requires the '.FBX' format for the rigged model.
For this conversion of the model file obtained in '.STEP' format first it is converted from ‘.STEP’ to ‘.STL’ using FreeCAD and then Blender was used to convert '.STL' to '.FBX'. This FBX format is the one that can be opened in both Blender and, Unreal Engine which is used to generate a randomized dataset for Jaco.

The image shows Jaco Arm model loaded in Blender.

The Blender 3D animation tool has an ‘Armature’ feature that enables us to setup a skeletal system for 3D models so that it can be moved and animated as required. This was one of the targets with using the Blender tool.

Here you can see the bones are assigned one per each segment of the Jaco robot arm from the base till the fingers tips. In graphics animation terms it is called “Rigging” the Jaco Arm model.

This skeleton is needed for later moving the Jaco arm to random positions for generation of the synthentic dataset.

Now after we make a skeletal system, we need to attach the mesh (The robot arm’s body) to it. That is were weight painting comes in. It is nothing but specifying which parts of the mesh are to be attached to which bone.
Steps:

Select bone
Paint the part to attach to that bone as Red
Paint the part to detach from that bone as Blue

A mesh is attached to a bone means when the bone moves, the mesh moves with it. In this video it can be observed how when the bone joints are moved, the mesh is following along. Also as each bone is selected we can see the part painted in red change, this is showing which parts are attached to which bone.

The Jaco arm with skeleton formed and weight painted.

After we have mesh moving as expected with the bones, we need to set constraints on the angular directions in which each joint can move (otherwise any joint can move in any direction, which is not the case in real world). In this image take note of the marked joint (red – x axis, blue – z axis, green – y axis), we can say by looking at the joint shape itself that it can only rotate in the x axis, so on the right hand side pane, in the rotation constraint settings of the bone, it is limited to rotate only on the x axis. Similarly the process is repeated for every joint including those of the fingers.

Setting of arm joint angle constraints.

Once we have the arm movement taken care of, we can then focus on defining animations for the arm. Creating a number of different animations is done as a means of simulating random movement, in the sense that at a point, any of these animations could be playing and that in a way brings an effect of using actual random arm configurations.
This step becomes a whole lot easier with Blender because it allows us to just specify a few number of configurations of the arm each for a specific range of time and then Blender fills in the rest of the frames, completing the animation for us. This means we only specify poses at 4 or 5 frames and the rest let’s say 300 frames gets filled in by Blender, that's one animation and a number of these is created, it can be observed in this video.

Animating in Blender.

UE4 or Unreal Engine 4, though it is a Game Engine, since it is open source, it makes for a good research platform. NVIDIA has developed the whole tool called NVIDIA Dataset Synthesiser (NDDS) specifically for this platform which leverages its capabilities for generating synthetic datasets for training of neural networks.

Here can be seen the Unreal Engine 4 Open Source Game Engine. In that NDDS is loaded as a project. In the middle top you can see the scene window where the 3D virtual world is. At the right hand top side you can see all the items placed in the 3D virtual world. At the bottom is the Content Browser. In the NDDS project we can see two very valuable folders “DomainRandomisationDNNContent” and “NVSceneCapturerContent”. These folders contain items for introducing Domain Randomisation and Data Collection/Synthesis respectively that can be added to the scene.

Here’s a feel for how it looks like when the data capture is happening in UE4 as the scene is Domain Randomised: video (In this only a single animation of the arm is getting played back, in an actual data capture there will be multiple animations played out randomly)

Here and encoder-decoder model is used for the neural network as in the referenced work. CNN networks VGG is considered for the encoder section. The whole output of the network are ‘belief_maps’ (seen as black squares in the image with white spot where the corresponding joint exists). They are nothing but a representations of where the joints are in the 2D image. In the referenced work, from this 2D keypoints the actual 3D locations of the robot are also calculated using a mathematical algorithm which takes in even the camera intrinsics information.

Image Courtesy: Original work from NVIDIA

Jaco arm model rigged and animated is dragged and dropped into the scene. The moment we click ‘Play’ button, the animation that was saved starts playing. The stutter seen is as it requires a significant amount of processing power/memory. Like this the arm is imported to the NDDS project and then next is to set the randomisations, the scene capturer (virtual camera), and then extract data.
This is how it looks.

Reference/Main Work

Lee, Timothy E and Tremblay, Jonathan and To, Thang and Cheng, Jia and Mosier, Terry and Kroemer, Oliver and Fox, Dieter and Birchfield, Stan, "Camera-to-Robot Pose Estimation from a Single Image", International Conference on Robotics and Automation (ICRA), 2020. https://arxiv.org/abs/1911.09231

lakshmi-sathi / evaluation-of-nvidias-camera-to-robot-pose-estimation-deep-learning-research Goto Github PK