-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add reset to batch opt #360
Add reset to batch opt #360
Conversation
@svwilliams This should be a relatively non-invasive addition, let me know if otherwise and if so for my use case I can just extend the existing batch optimizer with a reset function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there is a locking error here. See in-line comments for details.
// DANGER: The optimizationLoop() function obtains the lock optimization_mutex_ lock and the | ||
// pending_transactions_mutex_ lock at the same time. We perform a parallel locking scheme here to | ||
// prevent the possibility of deadlocks. | ||
{ | ||
std::lock_guard<std::mutex> lock(optimization_mutex_); | ||
// Clear all pending transactions | ||
{ | ||
std::lock_guard<std::mutex> lock(pending_transactions_mutex_); | ||
pending_transactions_.clear(); | ||
} | ||
// Clear the graph and marginal tracking states | ||
graph_->clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike the fixed lag smoother, the batch optimizer does not lock the pending_transactions_mutex_
during the optimization loop. So you should not need to worry about the locking order as much.
However, the optimization loop does consist of two steps:
- Make a local copy of the combined_transaction_
- Update the graph from the local transaction copy
It is possible for the reset() operation to interrupt the optimization loop in between Step 1 and Step 2. If that happens, then the local transaction copy would contain variables and constraints but the graph would have been cleared by the reset logic. And that will cause problems. We need to ensure that the state of the graph does not change after the local transaction copy has been created.
Something like:
In BatchOptimizer::optimizationLoop()
{
std::lock_guard<std::mutex> lock(optimization_mutex_);
// Copy the combined transaction so it can be shared with all the plugins
fuse_core::Transaction::ConstSharedPtr const_transaction;
{
std::lock_guard<std::mutex> lock(combined_transaction_mutex_);
const_transaction = std::move(combined_transaction_);
combined_transaction_ = fuse_core::Transaction::make_shared();
}
// Update the graph
graph_->update(*const_transaction);
// Optimize the entire graph
graph_->optimize(params_.solver_options);
// Make a copy of the graph to share
fuse_core::Graph::ConstSharedPtr const_graph = graph_->clone();
// Optimization is complete. Notify all the things about the graph changes.
notify(const_transaction, const_graph);
// Clear the request flag now that this optimization cycle is complete
optimization_request_ = false;
}
Now optimization_mutex_
and combined_transaction_mutex_
are locked at the same time, so we are in a similar situation as the fixed-lag smoother.
In BatchOptimizer::resetServiceCallback()
// DANGER: The optimizationLoop() function obtains the lock optimization_mutex_ lock and the
// combined_transaction_mutex_ lock at the same time. We perform a parallel locking scheme here to
// prevent the possibility of deadlocks.
{
std::lock_guard<std::mutex> lock(optimization_mutex_);
// Clear the combined transation
{
std::lock_guard<std::mutex> lock(combined_transaction_mutex_);
combined_transaction_.clear();
}
// Clear the graph and marginal tracking states
graph_->clear();
}
// Clear all pending transactions
// The transaction callback and the optimization timer callback are the only other locations where
// the pending_transactions_ variable is modified. As long as the BatchOptimizer node handle is
// single-threaded, then pending_transactions_ variable cannot be modified while the reset callback
// is running. Therefore, there are no timing or sequence issues with exactly where inside the reset
// service callback the pending_transactions_ are cleared.
{
std::lock_guard<std::mutex> lock(pending_transactions_mutex_);
pending_transactions_.clear();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
If this is good to merge, then someone else will have to merge this I don't have access @svwilliams @DavidLocus |
Yep, I'm working on a ROS 2 port right now. I'll get this merged in later today. |
Adding the same reset service from the fixed lag smoother to the batch optimizer. This also adds a mutex for the optimization which is performing fine in my experiments, this is to make sure on reset we have sole access to the graph.