Light

Segfault when running kitti seq 05 about dcsam HOT 7 CLOSED

marineroboticsgroup commented on August 26, 2024

Segfault when running kitti seq 05

from dcsam.

Comments (7)

kurransingh commented on August 26, 2024 2

Just came across this while running run2.bag with a two class AprilTag test. Same gdb backtrace as above, other parameters here:

# Data association algorithm (0: ML, 1: MM, 2: SM, 3: EM)
DA_type: 1

# Misclassification rate
misclassification_rate: 0

# X, Y, Z per frame. Roll, Pitch, Yaw per frame
odom_base_sigmas: [0.05, 0.05, 0.05, 0.05, 0.05, 0.05]

# X, Y, Z per frame. Roll, Pitch, Yaw per frame
# Set as a tenth of modled noise, but only adding X, Y, Yaw
odom_added_noise_sigmas: [0.001, 0.001, 0.001, 0.001, 0.001, 0.001]

# X, Y, Z, Roll, Pitch, Yaw
tag_base_sigmas: [0.1, 0.1, 0.1, 0.1, 0.1, 0.1]

tag_added_noise_sigmas: [0.001, 0.001, 0.001, 0.001, 0.001, 0.001]

# Number of classifications
n_classes: 2

# Null hypothesis weight
nh_weight: .1

# Only consider landmarks within X distance (m) of current pose estimate
search_radius: 30.0

# Mahalanobis distance threshold
maha_dist_thresh: 6.0

from dcsam.

ProfFan commented on August 26, 2024 1

@keevindoherty Could you provide a small test case so I can help in debug?

from dcsam.

keevindoherty commented on August 26, 2024

@kurransingh Hmm, that is an interesting development... I'll try to reproduce this later today.

from dcsam.

keevindoherty commented on August 26, 2024

It seems that specifying natural variable ordering (i.e. sort by keys) is functional on all tested datasets. It's not clear at the moment why COLAMD variable ordering for discrete graphs presents issues, but the central issue appears to be that a nullptr shows up in the discrete factor graph passed to the EliminateDiscrete function. Another workaround added in #18 is to do a nullptr check before doing sum-product elimination, which works (in that it does not crash) but we don't have a good test case to ensure that this isn't producing some odd behavior from an inference standpoint (i.e. producing incorrect marginals), since the original issue is hard to reproduce. Natural ordering, at least, should not impact the accuracy of inference at all. For now, if we do encounter a nullptr in this graph (which has not happened with natural ordering yet), we skip it and print an error, but we should revisit this.

from dcsam.

keevindoherty commented on August 26, 2024

Hey @ProfFan - thanks for the interest in helping out, we really appreciate it! We've been trying to come up with a small test case that reproduces this issue now for 3 weeks (since this issue opened!). It seems to (suspiciously) only turn up when we're running our semantic SLAM code somewhere in the middle of a dataset (e.g. halfway through the KITTI dataset). A few observations:

Thus far, we have only observed it to occur when we're using the DCMaxMixtureFactor which specifies a mixture of hybrid factors. We've considered the possibility that there is something weird going on between our factor implementation and the GTSAM discrete solver. I thought initially it might have something to do with trying to eliminate a variable involved with an inactive mixture component, but when I attempted to test this (see here [confusingly on a macOS branch, since I was testing on my other laptop]), GTSAM threw exceptions where appropriate (rather than segfaulting). I have considered that perhaps rather than having the mixture factor select a single mixture component for discrete variables, maybe we should have it also place uniform priors on the variables involved in the inactive factors.
If there is an issue with the mixture factors, it seems inconsistent: the problematic graphs have other mixture factors that seem to function properly (though this is purely based on examination of the navigation solutions).
It doesn't seem like there's anything numerically special about the data at the point of the segfault (no obvious floating point explosions or anything like that).
So far, everything has worked entirely as expected when forcing GTSAM to use Natural variable ordering. Manually examining the contents of the graph to be eliminated in our elimination code and removing the nullptr also allows the code to run and produces results that aren't obviously incorrect, but I am worried that information might be getting lost somewhere, and we only recently discovered this so we've not yet compared against the Natural ordering (the latter I trust is doing the right thing, since modifying the elimination ordering shouldn't change the resulting marginals at all).
We are running tests with the EMFactor here which performs expectation-maximization using a weighted combination of component factors. So far, it seems to work without any of the above trickery: if this continues to be the case, it does suggest a path to fixing the max-mixture, which is to simply set the weight of the active component to "1" and all inactive components to "0". This would be equivalent to placing a uniform prior on variables involved with inactive components, as I mentioned above.

We will keep playing around with this to isolate a simple test case - if you have any ideas about what to try based on these observations, please let us know!

from dcsam.

keevindoherty commented on August 26, 2024

Natural ordering turns out to not be enough to fix this - i.e. it does not seem like a variable ordering issue. However, this appears to be fixed with the changes I added on the bugfix/maxmixture branch. The key, as alluded to above, was placing a uniform prior on the discrete keys involved with inactive component factors in the max mixture. The motivation for this solution was that DCEMFactors functioned properly (no segfault issue), and we can write a max-mixture in terms of an EM factor with a single weight of 1 on the max component and weights of 0 on all other components. We'll have to test this more rigorously, but seems like the right direction, and gives us a point of comparison.

from dcsam.

keevindoherty commented on August 26, 2024

Happy to resolve this now with PR #20

from dcsam.

Related Issues (13)

Normalization Constant Sign Bug HOT 1
Cache DC factor indices for fast updates in solver HOT 1
Publish docs to gh-pages
Add interface for continuous-only solves
Weird KITTI behaviors after adding uniform priors on inactive discrete keys HOT 1
Possible Normalization Typo HOT 3
What's the proper version of GTSAM that we need? HOT 2
make test failed HOT 2
CI Server is down HOT 2
Clean up the docs HOT 1
Index Type Typo HOT 1
Potential numerical issues in DCFactor exp-normalization

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.