Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::out_of_range with map::at exception with gtsam 4.1.1 (rtabmap) #1092

Closed
cdb0y511 opened this issue Feb 8, 2022 · 11 comments · Fixed by #1158
Closed

std::out_of_range with map::at exception with gtsam 4.1.1 (rtabmap) #1092

cdb0y511 opened this issue Feb 8, 2022 · 11 comments · Fixed by #1158
Assignees

Comments

@cdb0y511
Copy link

cdb0y511 commented Feb 8, 2022

Hi, I am using rtabmp with gtsam 4.1.1. It occasionally occurs terminate called after throwing an instance of 'std::out_of_range'
what(): map::at.
The details are similar to
introlab/rtabmap#823
I hope it could be fixed soon. Maybe you could work it out together @matlabbe.
I give a reprint here.

With GTSAM version built from source (revision: 4.1.1) during loop closure optimization:

[DEBUG] (2022-02-03 16:13:45.949) OptimizerGTSAM.cpp:101::optimize() Optimizing graph...
[DEBUG] (2022-02-03 16:13:45.949) OptimizerGTSAM.cpp:146::optimize() hasGPSPrior=false
[DEBUG] (2022-02-03 16:13:45.949) OptimizerGTSAM.cpp:164::optimize() fill poses to gtsam... rootId=1 (priorsIgnored=1 landmarksIgnored=0)
[DEBUG] (2022-02-03 16:13:45.949) OptimizerGTSAM.cpp:223::optimize() fill edges to gtsam...
[DEBUG] (2022-02-03 16:13:45.953) OptimizerGTSAM.cpp:479::optimize() create optimizer
[DEBUG] (2022-02-03 16:13:45.955) OptimizerGTSAM.cpp:504::optimize() GTSAM optimizing begin (max iterations=20, robust=0)
[DEBUG] (2022-02-03 16:13:45.965) OptimizerGTSAM.cpp:584::optimize() iteration 1 error =2850609649897726489254101293091235944356044801931291559347145600874420166397741503058701851780329932591817167121596483299665242156038849992125986899196008202587784806682300342868079360100003408894353952842967492154479751175590697972670210540378883501996026547213724436053723262923791400960.000000
[DEBUG] (2022-02-03 16:13:45.976) OptimizerGTSAM.cpp:584::optimize() iteration 2 error =5016100078967479096064413010768809019148602854242979119938905283661092705639338472477011700353588127562436548176456729989570331508193090845168052534407557742985955853652363314364564935033288790078552428268421593627697933614414369113679293174995444909290919633032349218707196942858868340438922690560.000000
[DEBUG] (2022-02-03 16:13:45.976) OptimizerGTSAM.cpp:590::optimize() Negative improvement?! Ignore and continue optimizing... (-5016100076116869425110442501786949926674733499886072217178394925254059757313327868568474126116375716917423374279035923904149843802861944430925552915508911836392252585395110302645738006052198719011840191255833450124487785933472970703237412918514014549742862808981949781291961016050608007111191298048.000000 < 0.000010)
[DEBUG] (2022-02-03 16:13:45.986) OptimizerGTSAM.cpp:584::optimize() iteration 3 error =4762437314404316677678978362215431360901508662362565105929317273411226380242999817166248601054457756356314417395982789444608445999853387397856784778582078705214311653018871259333382528033412543819451588388981733574931330715497184027876694693487667040715851249105737640310706930901554561024.000000
[DEBUG] (2022-02-03 16:13:45.996) OptimizerGTSAM.cpp:584::optimize() iteration 4 error =10375071687008403818694769841431843491355929407087161691691665919401692325307968802150380172489645670836245273758214944374368613158103528730104116996642719655751021310965017658033602275517442857849603218646021211197352441906817076859211969627919396267107659523192607703809860802465813757952000.000000
[DEBUG] (2022-02-03 16:13:45.996) OptimizerGTSAM.cpp:590::optimize() Negative improvement?! Ignore and continue optimizing... (-10370309249693999211514267725940937073799608250251315401149818182777489539027667437073554994776271686411532953788783224558210290659927700369449925412051392869421429210283080719775332002144501721930346975002849411698388125145667610495033111088655127153698549802735347982290158922839559774404608.000000 < 0.000010)
terminate called after throwing an instance of 'std::out_of_range'
  what():  map::at

Thread 1 "rtabmap-reproce" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007fff273fefb7 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007fff27400921 in __GI_abort () at abort.c:79
#2  0x00007fff27df3957 in  () at /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007fff27df9ae6 in  () at /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007fff27df9b21 in  () at /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007fff27df9da9 in __cxa_rethrow () at /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007fff1ecaac08 in gtsam::VectorValues gtsam::internal::linearAlgorithms::optimizeBayesTree<gtsam::GaussianBayesTree>(gtsam::GaussianBayesTree const&) ()
    at /home/mathieu/workspace/latest/lib/libgtsam.so.4
#7  0x00007fff1ec99d7d in gtsam::GaussianBayesTree::optimize() const ()
    at /home/mathieu/workspace/latest/lib/libgtsam.so.4
#8  0x00007fff1ecc1ddd in gtsam::GaussianFactorGraph::optimize(gtsam::Ordering const&, std::function<std::pair<boost::shared_ptr<gtsam::GaussianConditional>, boost::shared_ptr<gtsam::GaussianFactor> > (gtsam::GaussianFactorGraph const&, gtsam::Ordering const&)> const&) const ()
    at /home/mathieu/workspace/latest/lib/libgtsam.so.4
#9  0x00007fff1ed8be04 in gtsam::NonlinearOptimizer::solve(gtsam::GaussianFactorGraph const&, gtsam::NonlinearOptimizerParams const&) const () at /home/mathieu/workspace/latest/lib/libgtsam.so.4
#10 0x00007fff1ed48c27 in gtsam::GaussNewtonOptimizer::iterate() ()
    at /home/mathieu/workspace/latest/lib/libgtsam.so.4
#11 0x00007ffff74d593a in rtabmap::OptimizerGTSAM::optimize(int, std::map<int, rtabmap::Transform, std::less<int>, std::allocator<std::pair<int const, rtabmap::Transform> > > const&, std::multimap<int, rtabmap::Link, std::less<int>, std::allocator<std::pair<int const, rtabmap::Link> > > const&, cv::Mat&, std::__cxx11::list<std::map<int, rtabmap::Transform, std::less<int>, std::allocator<std::pair<int const, rtabmap::Transform> > >, std::allocator<std::map<int, rtabmap::Transform, std::less<int>, std::allocator<std::pair<int const, rtabmap::Transform> > > > >*, double*, int*) ()
    at /home/mathieu/workspace/latest/lib/librtabmap_core.so.0.20
#12 0x00007ffff71fb74d in rtabmap::Rtabmap::optimizeGraph(int, std::set<int, std::less<int>, std::allocator<int> > const&, std::map<int, rtabmap::Transform, std::less<int>, std::allocator<std::pair<int const, rtabmap::Transform> > > const&, bool, cv::Mat&, std::multimap<int, rtabmap::Link, std::less<int>, std::allocator<std::pair<int const, rtabmap::Link> > >*, double*, int*) const ()
    at /home/mathieu/workspace/latest/lib/librtabmap_core.so.0.20
#13 0x00007ffff71fbbed in rtabmap::Rtabmap::optimizeCurrentMap(int, bool, std::map<int, rtabmap::Transform, std::less<int>, std::allocator<std::pair<int const, rtabmap::Transform> > >&, cv::Mat&, std::multimap<int, rtabmap::Link, std::less<int>, std::allocator<std::pair<int const, rtabmap::Link> > >*, double*, int*) const () at /home/mathieu/workspace/latest/lib/librtabmap_core.so.0.20
#14 0x00007ffff720f2aa in rtabmap::Rtabmap::process(rtabmap::SensorData const&, rtabmap::Transform, cv::Mat const&, std::vector<float, std::allocator<float> > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, float, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, float> > > const&) () at /home/mathieu/workspace/latest/lib/librtabmap_core.so.0.20
#15 0x000055555555e3ee in main ()

The bug doesn't appear with gtsam binary version (4.0.3-1) from PPA when processing the same dataset.

@DanMcGann
Copy link
Contributor

I have run into this problem as well, and have developed a minimal example that demonstrates it (see below).

I originally came across it when working with KimeraRPGO, but have also found it in a custom algorithm. It would appear that the common behavior of rtabmap, KimeraRPGO, and my use case, is re-solving with a batch optimizer after a new factor is added to the system.

However, I have also found that the bug is reproducible even when the incremental adding of factors is omitted, and rather any repeated call to optimize can product this error. I have provided minimal examples for both the incremental adding of factors case, and the bare-bones repeated optimize case.

In my testing I have noticed the following about the bug:

  • Non-Deterministic: can occur at seemingly any iteration of the examples
  • Non-Identical: bug manifests most often as std::out_of_range but also sometimes as segfaults in realloc_insert
  • Universal: Occurs with LM, GN and DogLeg optimizers

Technical Details:

  • GTSAM: 4.1.1 (build from source)
  • Tested with 2D data sets from Prof. Carlone's Website (M3500, Intel)
Incremental Factors Example

#include <gtsam/geometry/Pose2.h>
#include <gtsam/nonlinear/DoglegOptimizer.h>              // Bug occurs
#include <gtsam/nonlinear/GaussNewtonOptimizer.h>         // Bug occurs
#include <gtsam/nonlinear/LevenbergMarquardtOptimizer.h>  // Bug occurs
#include <gtsam/slam/PriorFactor.h>
#include <gtsam/slam/dataset.h>

int main(int argc, char *argv[]) {
  // Read g2o input file
  gtsam::NonlinearFactorGraph::shared_ptr graph;
  gtsam::Values::shared_ptr initial;
  boost::tie(graph, initial) = gtsam::readG2o(argv[1], false);

  // Separate Odometry and Loop closures
  gtsam::NonlinearFactorGraph odom_factors;
  gtsam::NonlinearFactorGraph loop_factors;
  for (auto &factor : *graph) {
    if (factor->keys().front() + 1 == factor->keys().back()) {
      odom_factors.push_back(factor);
    } else {
      loop_factors.push_back(factor);
    }
  }

  // Container for ever increasing graph
  gtsam::NonlinearFactorGraph incremental_graph;
  gtsam::Values incremental_values;

  // Add prior
  incremental_values.insert(0, initial->at(0));
  incremental_graph.emplace_shared<gtsam::PriorFactor<gtsam::Pose2>>(
      0, gtsam::Pose2(), gtsam::noiseModel::Isotropic::Sigma(3, 1e-4));

  // Iterate through odometry adding loop closures when able
  for (size_t i = 0; i < odom_factors.nrFactors(); i++) {
    incremental_graph.push_back(odom_factors.at(i));
    incremental_values.insert(i + 1, initial->at(i + 1));
    for (auto &factor : loop_factors) {
      auto f = factor->keys().front();
      auto b = factor->keys().back();
      if ((f == i + 1 && b < i + 1) || (b == i + 1 && f < i + 1)) {
        incremental_graph.push_back(factor);
      }
    }

    // Optimize the incremental subproblem
    gtsam::DoglegOptimizer optimizer(incremental_graph, incremental_values);
    auto result = optimizer.optimize();
  }
}

Bare-Bones Example

#include <gtsam/geometry/Pose2.h>
#include <gtsam/nonlinear/DoglegOptimizer.h>              // Bug occurs
#include <gtsam/nonlinear/GaussNewtonOptimizer.h>         // Bug occurs
#include <gtsam/nonlinear/LevenbergMarquardtOptimizer.h>  // Bug occurs
#include <gtsam/slam/PriorFactor.h>
#include <gtsam/slam/dataset.h>

int main(int argc, char *argv[]) {
  // Read g2o input file
  gtsam::NonlinearFactorGraph::shared_ptr graph;
  gtsam::Values::shared_ptr initial;
  boost::tie(graph, initial) = gtsam::readG2o(argv[1], false);

  // Add prior
  graph->emplace_shared<gtsam::PriorFactor<gtsam::Pose2>>(
      0, gtsam::Pose2(), gtsam::noiseModel::Isotropic::Sigma(3, 1e-4));

  // Run the optimizer a bunch
  for (size_t i = 0; i < 1000; i++) {
    gtsam::LevenbergMarquardtOptimizer optimizer(*graph, *initial);
    auto result = optimizer.optimize();
  }
}

@DanMcGann
Copy link
Contributor

Update: I have confirmed that running on v4.0.3 (built from source) the above examples do not appear to error out.

@ProfFan
Copy link
Collaborator

ProfFan commented Feb 11, 2022

Hi @DanMcGann, are you using TBB?

@DanMcGann
Copy link
Contributor

Yes, GTSAM_WITH_TBB is set. I have not tried with TBB disabled.

@ProfFan
Copy link
Collaborator

ProfFan commented Feb 11, 2022

It would be great if you can try

  1. without TBB, and
  2. with TBB, but with Revert @acxz's TBB revert #901 reverted?

@ProfFan
Copy link
Collaborator

ProfFan commented Feb 20, 2022

So a brief review:

  1. This is probably not memory corruption, otherwise Asan should have caught it
  2. This happens spuriously
  3. Related with TBB
  4. Works if Revert @acxz's TBB revert #901 is reverted (need confirmation from @cdb0y511 @DanMcGann @matlabbe)

So I am thinking about a race condition in the graph traversal task.

@acxz
Copy link
Contributor

acxz commented Feb 20, 2022

I'll take a closer look at this either this week or next week. Having a race condition is not really acceptable.

@DanMcGann
Copy link
Contributor

4. Works if [Revert @acxz's TBB revert #901](https://github.com/borglab/gtsam/pull/901) is reverted (need confirmation from @cdb0y511 @DanMcGann @matlabbe)

Confirmed: Tested this morning with these commits reverted and I cannot seem to re-recreate the bug.

@ProfFan
Copy link
Collaborator

ProfFan commented Mar 11, 2022

@acxz Do you have time to take a look? BTW, I suspect it could be a missing wait when you converted the old code.

@acxz
Copy link
Contributor

acxz commented Mar 16, 2022

Sorry about the delay, not yet sadly.

Yeah thats a good hunch you have.

@ProfFan
Copy link
Collaborator

ProfFan commented Apr 4, 2022

I have half a roadmap for fixing this issue. This is due to the old ClusterTree code somewhat relying on the TBB behavior. @acxz The generated BayesTree is incorrect and the issue here is a red herring...

I still have to debug where the problem exact is.

@ProfFan ProfFan mentioned this issue Apr 5, 2022
@ProfFan ProfFan linked a pull request Apr 5, 2022 that will close this issue
@ProfFan ProfFan self-assigned this Apr 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants