Git Product home page Git Product logo

agv-visual-localization's Introduction

agv-visual-localization

Core packages for the Produtech Project, where, initially, the objective was to build an AGV capable of getting its localization based on Artificial Vision (Under construction)

Table of Contents

Overview

One of the branches of the Produtech II SIF 24541 project is the T.6.3.3 - Development of a flexible and low-cost localization and navigation system for PPS6. Therefore, the aim is to develop the core features of a visual-based navigation system. The prototype is composed of a small robot controlled through a remote controller that emulates an industrial AGV, and a set of programs developed for this system. The localization system works by detecting a constellation of visual landmarks (Data Matrix), which encode their absolute position in relation to a reference frame. The system is able to self-locate in this constellation of markers by applying triangulation and trilateration techniques.

Setup

The setup of our robot is composed of two main parts: the hardware and the software. Regarding the hardware, the retrofitting of the Atlas MV robot consisted of disassembling the old robot and leaving only the interesting parts for the current project. Also, the existing software in terms of communication between the power chart and the engine was renewed. Finally, some of the novel core programming modules were built to perform the real-time self-robot localization.

Hardware

Here, the hardware parts used and developed during this project are described.

Robot

In terms of hardware retrofitting the entire old electronic was changed by a simpler and actual one (e.g. the usage of an Arduino to do the communication between the Joystick and the steering AC motor). One of the initial setups of the robot can be seen in the figure below:

It is not possible to show an image of the final setup but some of the new hardware parts that compose the final robot version are described in the table below.

Name Description/Function
DC/AC Inversor Input 48 V, Output: AC; To power the Jetson
DC/DC Inversor Input 12 V, Output: 5V; To power the Arduino
Arduino To perform the communication between the remote controller and the AC motor.
Jetson AGX Xavier To perform DL computation and to ensure the ROS architecture running.
Four Cameras To acquire image data.

Cameras

The cameras used to acquire the images can be seen in the figure below:

The e-CAM130_CUXVR - Multiple Camera Board were conceived to acquire images with the NVIDIA® Jetson AGX Xavier™ board.

Jetson AGX Xavier

This board enables the creation of AI applications mainly based on Deep Learning by incorporating 512-core Volta GPU with Tensor Cores and (2x) NVDLA Engines. The NVIDIA Jetpack 4.2 and the DeepStream SDK 4.0 were installed in this board to provide the software SDK required for this project.

Software

Now, the set of software modules developed in this project are presented, from the low-level up to the high-level entire system.

maxon_des

This is a library of functions to communicate with the Maxon DES 70/10 power chart. This set of functions includes:

  • Status functions - Functions that allow checking the board status, list errors, clear these errors or reset/enable the board.
  • Parameters functions - Functions to read and set some of the "static" parameters.
  • Setting functions - Functions to set the current, (motor) velocity and stop the motor motion.

Resources: REPO

Colaborators:

ros-maxon-driver

This is the ROS driver to use those functions on the ROS-melodic framework. This driver translates the joystick inputs into the respective function call.

Resources: REPO

ros-panorama-package

Initially, the idea was to build a panorama image of the scenario through 3 input images. Then, the panorama image was processed through a Deep Neural Network that returns the location of the Data Matrix in the image. The complete process is developed as a ROS package, which is based on the OpenCV PImage Stitching Project. Resources: REPO

faster-rcnn-data-matrix

This is a proof-of-concept notebook consisting of the entire training/testing pipeline of the FasterRCNN model through the Detectron2 platform. To do so, the Dataset was manually created through the labelbox application.

Resources: REPO

deepstream-app

Finally, the board used in this project - Jetson AGX Xavier - allowed the study of another type of architectures to process the input images. The DeepStream framework delivers a complete streaming analytics toolkit for AI-based video and image understanding, as well as multi-sensor processing. Therefore, this SDK enables real-time inference through DNN, based on ONNX and Tensor RT libraries.

Thus, two DeepStream applications were also developed based on those that were provided by NVIDIA. These applications are pipelines whose input is one image that then passes forward on the YoloV3 architecture (in one of the applications a classical tracker is also used). This object detection model outputs the bounding boxes of the respective objects in the scene.

Resources: REPO

Known problems ⚠️

In this section, the problems inherent to any part of the work are presented.

High latency on the panorama image creation

Since the package developed w.r.t the panorama image creation uses warp transformations and they are sequentially deployed on the code, this implies a decrease of the rate ROS topic publication related to the panorama image. The solution could be parallelize those transformations.

Lack of Deepstream-ROS interaction

There is no bridge between ROS and Deepstream at the time of this repo. Therefore, the construction of an entire architecture (e.g. autonomous vehicle) is difficult to achieve (because an autonomous vehicle is not based solely on video analytics). However, jetson-inference is a library with several deep-learning inference networks with TensorRT to deploy in the NVIDIA Jeston platform. These models can be used as DL inference nodes.

Current Work

The COVID-19 pandemic brought this work to a halt (due to the impossibility of contact with the robot), but enabled the development of another type of research on the field of autonomous driving (and connected with the AGV system). So, this work will be extended to autonomous driving applications. The visual perception of road agents/objects, based on DL ROS nodes, is going to be developed soon. Mainly, the two crucial objectives are to deploy a unified representation of the road/lanes segmentation and object detection.

Colaborators:

Road Object Detection

The object detection models deployed in this project were trained on the BDD100k dataset. Several developers have already implemented these models, so what is presented here are approaches based on those developed by these authors.

Resources: REPO

  • Faster RCNN ✔️

    Faster RCNN is one of the most widely used deep learning models for object detection. Although, its high-latency comparing to single-shot methods, Faster RCNN is performant detecting both small and large objects. The authors of this DL architecture divide the overall architecture into 2 modules, however, it is fairer to divide it into 3 modules:

    • feature maps extractor;
    • RPN (Region Proposals Network);
    • Fast RCNN detector;

    The former is composed of a traditional classification architecture, which is responsible for producing feature maps. In our approach we choose a MobileNetV2 to perform this task due to its low-latency. After that, a small network slides over the feature maps predicting multiple possible proposals for each of its cells. This small network returns a lower-dimensional feature, which is then fed to two 1 * 1 convolutional layers. These layers yield the probability of a proposal bounding a target, and the encoded coordinates of each proposal, respectively. Finally, the features that correspond to objects pass through an ROI pooling layer that crops and rescales each feature. During inference, the non-maximum suppression (NMS) algorithm is computed to filter out the best-located bounding boxes.

    The work that we developed here in terms of training and model creation was based on the torchvision module of Pytorch framework.

    Numeric results based on COCO metrics for object detection models on the BDD100K validation set:

    Metric IoU Thresholds Scales maxDets AP/AR values
    Average Precision (AP) 0.50:0.95 all 100 0.202
    Average Precision (AP) 0.50 all 100 0.409
    Average Precision (AP) 0.75 all 100 0.175
    Average Precision (AP) 0.50:0.95 small 100 0.050
    Average Precision (AP) 0.50:0.95 medium 100 0.243
    Average Precision (AP) 0.50:0.95 large 100 0.432
    Average Recall (AR) 0.50:0.95 all 1 0.158
    Average Recall (AR) 0.50:0.95 all 10 0.277
    Average Recall (AR) 0.50:0.95 all 100 0.290
    Average Recall (AR) 0.50:0.95 small 100 0.116
    Average Recall (AR) 0.50:0.95 medium 100 0.355
    Average Recall (AR) 0.50:0.95 large 100 0.519

    Visual results on our roads: video1 and video2

  • SSD ✔️

    Single shot models can process the input faster due to the respective tasks - localization and classification - be done in a single forward fashion. Here, SSD is presented as well as its results in the validation set of the dataset used in this work. This architecture is characterized by its base network (or backbone), the usage of multi-scaled feature maps for the detection task, and the respective convolutional predictors. MobileNetV2 was used to perform the perception of the image features and then was truncated before the classification layers. Hence, some of the final layers of MobileNet and additional feature layers allow predictions of detections at multiple scales. Each of these extra layers can produce a fixed set of detection predictions using a set of convolutional filters. Finally, the output of the model is the score for a category and the location of the box that bounds the target object.

    Numeric results on the BDD100K validation set:

    Metric IoU Thresholds Scales maxDets AP/AR values
    Average Precision (AP) 0.50:0.95 all 100 0.083
    Average Precision (AP) 0.50 all 100 0.131
    Average Precision (AP) 0.75 all 100 0.085
    Average Precision (AP) 0.50:0.95 small 100 0.002
    Average Precision (AP) 0.50:0.95 medium 100 0.044
    Average Precision (AP) 0.50:0.95 large 100 0.293
    Average Recall (AR) 0.50:0.95 all 1 0.068
    Average Recall (AR) 0.50:0.95 all 10 0.093
    Average Recall (AR) 0.50:0.95 all 100 0.093
    Average Recall (AR) 0.50:0.95 small 100 0.005
    Average Recall (AR) 0.50:0.95 medium 100 0.052
    Average Recall (AR) 0.50:0.95 large 100 0.334

    Although a huge difference between the numerical results for the validation set between the two architectures presented so far, this model is also performant on our roads. Please, check the videos below.

    Visual results on our roads: video1 and video2

    Problems: Small Objects Detection (due to the low resolution of the feature maps) Possible Solution: Feature-Fused-SSD.

  • YOLOV3 ✔️

    All YOLO architectures are also single-shot methods, and that is why they achieve high-speed predictions. The authors have been presenting several evolutions which is reflected in the amount of YOLO versions that exist - 4 until the writing date of this README file (YOLO, YOLOv2, YOLOv3, and YOLOv4). This architecture has always shown low-latency and, therefore what has been the focus on along the various versions is the localization performance.

    Contrary to the previous architectures presented, YOLO has a custom features extractor - Darknet. This architecture can have different layouts, but the most common one is Darknet53 (from the third version of YOLO), which is 93.8% accurate on ImageNet test set (Top-5). Thus, v3 makes detections at three different scales by applying 1 * 1 kernels on those features maps at three different stages of the network:

    • the first detection is made by the 82nd layer (stride of 32). In our case, the input image has a size of 416 * 416, which means that the final detection feature map has a size of 13 * 13 * 45 (the number of channels is given by B * (5 + C), where B is the number of bounding boxes that a cell on the feature map can predict - 3; 5 is related to the object confidence, and the four values that determine the bounding box location; finally, C is the number of classes - 10 for BDD100K);
    • the second detection is made by the 94th layer, yielding a detection feature map of 26 * 26 * 45 (stride of 16);
    • finally, the last detection is made by the 106th layer, giving rise to a feature map of 52 * 52 * 45 (stride of 8);

    This last detection layer helped to improve small objects detection due to a higher resolution feature map (major problem in previous Yolo versions).

  • YOLOV4 ✔️

    YOLOv4 is composed of a Cross Stage Partial (CSP) Darknet53 with an SPP module, a path-aggregation net (PANet), and a YOLOv3 head. CSP networks have similar basis and purposes to a DenseNet. Therefore, this type of architectures enhances the features reuse by reducing the amount of repeated gradient information observed in a DenseNet. To do so, it divides the base feature map, then a part of the channels passes through a partial dense block and the other part undergoes to the final partial transition layer. After activation maps production, the only difference between YOLOv3 and YOLOv4 in terms of architecture's layout is the global features concatenation. Instead of the FPN technique, a custom PANet approach is performed. PANet is simply an enhanced version of FPN; after the FPN's block composed of a top-down pathway with lateral connections, PANet also propagates low-level features through a bottom-up path augmentation block. This block allows the addition (concatenation for YOLOv4) of the FPN resulting features with the output of those feature maps with 3*3 convolutions, which yields an even better understanding of the low-level features.

    Numeric results on the BDD100K validation set:

    Metric IoU Thresholds Scales maxDets AP/AR values
    Average Precision (AP) 0.50:0.95 all 100 0.105
    Average Precision (AP) 0.50 all 100 0.209
    Average Precision (AP) 0.75 all 100 0.092
    Average Precision (AP) 0.50:0.95 small 100 0.053
    Average Precision (AP) 0.50:0.95 medium 100 0.223
    Average Precision (AP) 0.50:0.95 large 100 0.326
    Average Recall (AR) 0.50:0.95 all 1 0.107
    Average Recall (AR) 0.50:0.95 all 10 0.220
    Average Recall (AR) 0.50:0.95 all 100 0.257
    Average Recall (AR) 0.50:0.95 small 100 0.187
    Average Recall (AR) 0.50:0.95 medium 100 0.467
    Average Recall (AR) 0.50:0.95 large 100 0.511

    Visual results on our roads on the Nvidia AGX Xavier device: video1

Data Matrix Detection

For this content, please follow this REPO.

Publications

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.