-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
c/balancer_backend: first initialize planner and then call plan #18091
c/balancer_backend: first initialize planner and then call plan #18091
Conversation
This change is a part of an effort to identify and fix rare segmentation fault in Redpanda that happens after it was suspended with `SIGSTOP` signal. According to the C++ standard the temporary should be kept alive until the expression ends. The crash we are observing indicates the UAF issue. The only way the variable, that access causes the segfault, can be deleted is by getting out of scope which in this situation should be guaranteed. Given our experience with coroutines and different types of lifecycle bugs that we found in past this is a poor man's effort to avoid the issue. Signed-off-by: Michał Maślanka <michal@redpanda.com>
new failures in https://buildkite.com/redpanda/redpanda/builds/48323#018f1a87-af9a-444d-9da5-c453f7f529a7:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess we can merge this, but it seems like a mistake. the complexity involved in keeping things like topics_table iterators consistent by tracking mutation revisions (and the fact that topics table mutable references are shared outside its implementation) seems like it is much more likely to be the source of a mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given we don't have any other good theories, I think we should just merge this and see what happens.. will at least confirm whether this has something to do with the lifetime of the planner object in this part of the code.
Given the comments above, we should at least not backport it right away. Let it bake in dev before we do so, @mmaslankaprv ? |
/backport v24.1.x |
/backport v23.3.x |
Failed to create a backport PR to v23.3.x branch. I tried:
|
We can leave it to bake. This change is however perfectly safe. It s a little bit a hit and miss approach. |
This change is a part of an effort to identify and fix rare segmentation fault in Redpanda that happens after it was suspended with
SIGSTOP
signal.According to the C++ standard the temporary should be kept alive until the expression ends. The crash we are observing indicates the UAF issue. The only way the variable, that access causes the segfault, can be deleted is by getting out of scope which in this situation should be guaranteed.
Given our experience with coroutines and different types of lifecycle bugs that we found in past this is a poor man's effort to avoid the issue.
Related issues:
#17751
#16510
#16533
#13301
#17751
Backports Required
Release Notes