Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reset to batch opt #360

Merged

Conversation

jakemclaughlin6
Copy link
Contributor

@jakemclaughlin6 jakemclaughlin6 commented Mar 15, 2024

Adding the same reset service from the fixed lag smoother to the batch optimizer. This also adds a mutex for the optimization which is performing fine in my experiments, this is to make sure on reset we have sole access to the graph.

@jakemclaughlin6
Copy link
Contributor Author

@svwilliams This should be a relatively non-invasive addition, let me know if otherwise and if so for my use case I can just extend the existing batch optimizer with a reset function

Copy link
Contributor

@svwilliams svwilliams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there is a locking error here. See in-line comments for details.

Comment on lines 260 to 271
// DANGER: The optimizationLoop() function obtains the lock optimization_mutex_ lock and the
// pending_transactions_mutex_ lock at the same time. We perform a parallel locking scheme here to
// prevent the possibility of deadlocks.
{
std::lock_guard<std::mutex> lock(optimization_mutex_);
// Clear all pending transactions
{
std::lock_guard<std::mutex> lock(pending_transactions_mutex_);
pending_transactions_.clear();
}
// Clear the graph and marginal tracking states
graph_->clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike the fixed lag smoother, the batch optimizer does not lock the pending_transactions_mutex_ during the optimization loop. So you should not need to worry about the locking order as much.

However, the optimization loop does consist of two steps:

  1. Make a local copy of the combined_transaction_
  2. Update the graph from the local transaction copy

It is possible for the reset() operation to interrupt the optimization loop in between Step 1 and Step 2. If that happens, then the local transaction copy would contain variables and constraints but the graph would have been cleared by the reset logic. And that will cause problems. We need to ensure that the state of the graph does not change after the local transaction copy has been created.

Something like:

In BatchOptimizer::optimizationLoop()

  {
    std::lock_guard<std::mutex> lock(optimization_mutex_);
    // Copy the combined transaction so it can be shared with all the plugins
    fuse_core::Transaction::ConstSharedPtr const_transaction;
    {
      std::lock_guard<std::mutex> lock(combined_transaction_mutex_);
      const_transaction = std::move(combined_transaction_);
      combined_transaction_ = fuse_core::Transaction::make_shared();
    }
    // Update the graph
    graph_->update(*const_transaction);
    // Optimize the entire graph
    graph_->optimize(params_.solver_options);
    // Make a copy of the graph to share
    fuse_core::Graph::ConstSharedPtr const_graph = graph_->clone();
    // Optimization is complete. Notify all the things about the graph changes.
    notify(const_transaction, const_graph);
    // Clear the request flag now that this optimization cycle is complete
    optimization_request_ = false;
  }

Now optimization_mutex_ and combined_transaction_mutex_ are locked at the same time, so we are in a similar situation as the fixed-lag smoother.

In BatchOptimizer::resetServiceCallback()

  // DANGER: The optimizationLoop() function obtains the lock optimization_mutex_ lock and the
  //         combined_transaction_mutex_ lock at the same time. We perform a parallel locking scheme here to
  //         prevent the possibility of deadlocks.
  {
    std::lock_guard<std::mutex> lock(optimization_mutex_);
    // Clear the combined transation
    {
      std::lock_guard<std::mutex> lock(combined_transaction_mutex_);
      combined_transaction_.clear();
    }
    // Clear the graph and marginal tracking states
    graph_->clear();
  }
  // Clear all pending transactions
  // The transaction callback and the optimization timer callback are the only other locations where
  // the pending_transactions_ variable is modified. As long as the BatchOptimizer node handle is
  // single-threaded, then pending_transactions_ variable cannot be modified while the reset callback
  // is running. Therefore, there are no timing or sequence issues with exactly where inside the reset
  // service callback the pending_transactions_ are cleared.
  {
    std::lock_guard<std::mutex> lock(pending_transactions_mutex_);
    pending_transactions_.clear();
  }

Copy link
Contributor

@DavidLocus DavidLocus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jakemclaughlin6
Copy link
Contributor Author

If this is good to merge, then someone else will have to merge this I don't have access @svwilliams @DavidLocus

@svwilliams
Copy link
Contributor

Yep, I'm working on a ROS 2 port right now. I'll get this merged in later today.

@svwilliams svwilliams merged commit 415971a into locusrobotics:devel Mar 21, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants