Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leadership pinning: implementation #23691

Merged
merged 16 commits into from
Oct 10, 2024
Merged

Conversation

ztlpn
Copy link
Contributor

@ztlpn ztlpn commented Oct 9, 2024

Jira ref: https://redpandadata.atlassian.net/browse/CORE-7022

TODO: add it to the list of enterprise features.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.2.x
  • v24.1.x
  • v23.3.x

Release Notes

Features

  • Add leadership pinning: ability to set preferred racks for topic partition leaders. To configure, set redpanda.leaders.preference topic config property or default_leaders_preference cluster config property.

@ztlpn ztlpn force-pushed the leadership-pinning-impl branch 2 times, most recently from ad58898 to ea5d50d Compare October 9, 2024 10:16
ztlpn added 12 commits October 9, 2024 16:19
This way several constraints can use it, not just the topic-aware
distribution constraint.
This is a struct that will hold everything needed for the leadership
pinning constraint.
Previously, there were two problems with muted groups:
1) If the transfer was not successful, we didn't mute
2) If the transfer was successful, the group remained
   muted for a long time, preventing reaching optimum.

Make the balancer more aggressive by removing the group from the muted
set if we've got a leadership notification after a successful transfer.
Usually this means that the health report hasn't yet been updated, it
doesn't make sense to mute those groups for long.
Cleanup the execution model for balancing iterations:

1) Allow the balancing fiber to run to completion (with small
  mandatory intervals between iterations).
2) Don't ignore timer if it fired while the fiber was active.
3) If we are not throttled or activating after acquiring controller
  leadership, allow to schedule timer sooner (useful for notifications).
For now, just trigger the balancer soon after topic creation.
@ztlpn ztlpn force-pushed the leadership-pinning-impl branch from ea5d50d to a91893a Compare October 9, 2024 14:19
@ztlpn ztlpn requested a review from mmaslankaprv October 9, 2024 14:20
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Oct 9, 2024

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/56115#019271fd-bb8c-4ad1-b1ce-b7b2d3ad0ad9:

"rptest.tests.leadership_transfer_test.LeadershipPinningTest.test_leadership_pinning"

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/56152#019272b6-d818-4fd6-b67c-2038e07b2821:

"rptest.tests.leadership_transfer_test.LeadershipPinningTest.test_leadership_pinning"

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/56179#01927362-4df1-449b-ae90-49619d95286b:

"rptest.tests.leadership_transfer_test.LeadershipPinningTest.test_leadership_pinning"
"rptest.tests.leadership_transfer_test.LeadershipPinningTest.test_leadership_pinning"

@vbotbuildovich
Copy link
Collaborator

Retry command for Build#56115

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/leadership_transfer_test.py::LeadershipPinningTest.test_leadership_pinning

@ztlpn ztlpn force-pushed the leadership-pinning-impl branch from a91893a to 26dbd35 Compare October 9, 2024 18:03
@vbotbuildovich
Copy link
Collaborator

Retry command for Build#56152

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/leadership_transfer_test.py::LeadershipPinningTest.test_leadership_pinning

@ztlpn
Copy link
Contributor Author

ztlpn commented Oct 9, 2024

/ci-repeat 1
skip-units
skip-redpanda-build
dt-repeat=10
tests/rptest/tests/leadership_transfer_test.py::LeadershipPinningTest

@vbotbuildovich
Copy link
Collaborator

Retry command for Build#56179

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/leadership_transfer_test.py::LeadershipPinningTest.test_leadership_pinning
tests/rptest/tests/leadership_transfer_test.py::LeadershipPinningTest.test_leadership_pinning

@ztlpn ztlpn force-pushed the leadership-pinning-impl branch from 26dbd35 to 8bd5e66 Compare October 10, 2024 13:47
@ztlpn
Copy link
Contributor Author

ztlpn commented Oct 10, 2024

/ci-repeat 1
skip-units
skip-redpanda-build
dt-repeat=10
tests/rptest/tests/leadership_transfer_test.py::LeadershipPinningTest

@ztlpn
Copy link
Contributor Author

ztlpn commented Oct 10, 2024

Ok looks like the test is stable now

@ztlpn ztlpn merged commit 4c8c4b7 into redpanda-data:dev Oct 10, 2024
16 checks passed
@ztlpn ztlpn deleted the leadership-pinning-impl branch October 10, 2024 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants