Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

c/balancer_backend: first initialize planner and then call plan #18091

Merged
merged 1 commit into from
Apr 28, 2024

Conversation

mmaslankaprv
Copy link
Member

@mmaslankaprv mmaslankaprv commented Apr 26, 2024

This change is a part of an effort to identify and fix rare segmentation fault in Redpanda that happens after it was suspended with SIGSTOP signal.
According to the C++ standard the temporary should be kept alive until the expression ends. The crash we are observing indicates the UAF issue. The only way the variable, that access causes the segfault, can be deleted is by getting out of scope which in this situation should be guaranteed.

Given our experience with coroutines and different types of lifecycle bugs that we found in past this is a poor man's effort to avoid the issue.

Related issues:
#17751
#16510
#16533
#13301
#17751

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.1.x
  • v23.3.x
  • v23.2.x

Release Notes

  • none

This change is a part of an effort to identify and fix rare segmentation
fault in Redpanda that happens after it was suspended with `SIGSTOP`
signal.
According to the C++ standard the temporary should be kept alive until
the expression ends. The crash we are observing indicates the UAF issue.
The only way the variable, that access causes the segfault, can be
deleted is by getting out of scope which in this situation should be
guaranteed.

Given our experience with coroutines and different types of lifecycle
bugs that we found in past this is a poor man's effort to avoid the
issue.

Signed-off-by: Michał Maślanka <michal@redpanda.com>
@vbotbuildovich
Copy link
Collaborator

new failures in https://buildkite.com/redpanda/redpanda/builds/48323#018f1a87-af9a-444d-9da5-c453f7f529a7:

"rptest.tests.upgrade_test.UpgradeBackToBackTest.test_upgrade_with_all_workloads.single_upgrade=False"

Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess we can merge this, but it seems like a mistake. the complexity involved in keeping things like topics_table iterators consistent by tracking mutation revisions (and the fact that topics table mutable references are shared outside its implementation) seems like it is much more likely to be the source of a mistake.

Copy link
Contributor

@bharathv bharathv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we don't have any other good theories, I think we should just merge this and see what happens.. will at least confirm whether this has something to do with the lifetime of the planner object in this part of the code.

@piyushredpanda
Copy link
Contributor

Given the comments above, we should at least not backport it right away. Let it bake in dev before we do so, @mmaslankaprv ?

@mmaslankaprv mmaslankaprv merged commit a1a5b38 into redpanda-data:dev Apr 28, 2024
16 of 22 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v24.1.x

@vbotbuildovich
Copy link
Collaborator

/backport v23.3.x

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v23.3.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-18091-v23.3.x-165 remotes/upstream/v23.3.x
git cherry-pick -x a52d0ada31dd0c842333fadc2efc37d4aaf2b471

Workflow run logs.

@mmaslankaprv
Copy link
Member Author

Given the comments above, we should at least not backport it right away. Let it bake in dev before we do so, @mmaslankaprv ?

We can leave it to bake. This change is however perfectly safe. It s a little bit a hit and miss approach.

@mmaslankaprv mmaslankaprv deleted the balancer-variable branch April 29, 2024 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants