Git Product home page Git Product logo

Comments (3)

krrish94 avatar krrish94 commented on May 29, 2024

Thanks for your interest @paucarre!

In gradslam, at each time step we align the "entire map" with the current frame. This implicitly means that we're performing global optimization at each step (and wouldn't need explicit loop closure detection). However, this also means that currently gradslam can operate in relatively-small environments (as global alignment at each timestep would become computationally expensive for large scenes).

from gradslam.

krrish94 avatar krrish94 commented on May 29, 2024

The differentiable map fusion module indeed uses camera poses as input (which may in-turn be computed from odometry). This can be used for learning odometry estimation (but not loop closure detection, since we don't have an explicit loop closure step in gradslam)

from gradslam.

paucarre avatar paucarre commented on May 29, 2024

Thanks for your answer @krrish94 !

That clarifies a lot.

I was wondering if you see feasible the following setup to solve the memory/resources problem:

  1. Using a neural network, downsample the high memory 3D color map to a low-memory 2D map where
    it's fesible to do mapping with limited memory resources (which happens often in robotics)

To be specific, say the original high-memory map called Original_Map is a pointcloud (as per GradSLAM documentation).
Assume Original_Map pointcloud has the following tensor-size structure:

Original_Map = [
  points = [ (K_1, 3), (K_2, 3), ... , (K_N, 3) ],
  normals = [ (K_1, 3), (K_2, 3), ... , (K_N, 3) ],
  colors = [ (K_1, 3), (K_2, 3), ... , (K_N, 3) ]
]

where K_i are very high (lots of points for each sample).

We want to reduce K_i in Original_Map by flattening and down-sampling it to a 2D Compact_Map ("birds eye map").

A neural network would transform the original Original_Map to a Compact_Map, where Compact_Map has the following structure:

Compact_Map = [
  points = [ (H_1, 2), (H_2, 2), ... , (H_N, 2) ],
  normals = [ (H_1, 2), (H_2, 2), ... , (H_N, 2) ],
  embeddings = [ (H_1, L), (H_2, L), ... , (H_N, L) ]
]

Where:

  • H_i is very significantly smaller than K_i (H_i << K_i ). This implies samples are reduced and embeddings will contain whatever information is required for SLAM.
  • The embedding space L would be bigger than the 3 colors but that can be limited to something relatively small (say 16 or 32)

Then we perform both Differentiable Map as well as Map Fussion on the Compact_Map with low memory as the
tensor itself could be orders of magnitude smaller (because we reduce tensor size)

  1. Compact_Map to Original_Map Correspondence

Then the only thing that's left is to establish a correspondence between Compact_Map and Original_Map.
We call such network Local_Correspondence_Network, and can be something along the lines of an autoencoder,
which wouldn't require a tagged training set.

As the correspondences between Compact_Map to Original_Map are local, because they are similar from a 2D point prespective, one could just use a sparse 3D convolutional neural network. The loss could just be a reconstruction loss based on pointclouds, which I guess it's something feasible. The idea is that the reconstruction error makes sure colors and point height information in Original_Map are kept in Compact_Map

Furthermore, there could be a map-consistency loss. This loss can be generated by constructing the map with Original_Map and GradSLAM and on a different path the map using Compact_Map and GradSLAM. As during training we can potentially use a high-end computer, it might be ok to generate the whole map. The loss would be something like the difference in regards to 2D correspondences (whether the 3D map of Original_Map can be seen as a "birth eye view" like Compact_Map).

I see the following advantages in Local_Correspondence_Network :

  • It can be reused as it just compacts locally parts of the map irrespective of the global structure ( due to its convolutional nature)
  • As it's convolutional it'll be itself reasonably small ("low memory")
  • It can downsample contiguous parts of the map but also samples from sensors directly, which enables real-time in-robot SLAM.
  1. Production setup

Then the following steps would take place:

  • Train a Local_Correspondence_Network using samples form sensor data (like intel D435)

  • Downsample all samples from the sensor using Local_Correspondence_Network and build Compact_Map using GradSLAM layers.

  • Load Compact_Map as well as Local_Correspondence_Network into the robot and do the following:

    • For each sample from the robot's sensor, downsample it using Local_Correspondence_Network
    • Use Compact_Map and GradSLAM in the robot for location within the map.

Overall, I was wondering if you could help me to understand:

  1. Whether the idea makes any sense
  2. Whether the framework, as it is now, is capable of such setup

from gradslam.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.