-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transfer leadership before stepping down after reconfiguration #19966
Transfer leadership before stepping down after reconfiguration #19966
Conversation
22917a8
to
ff7751c
Compare
does this make sense? if a node is removed from a raft configuration, why would it be a follower? |
Michal was probably referring to the implementation detail there. The "becoming a follower" (see do_step_down("reason")) part in the implementation relinquishes leadership letting a new leader take charge. This is the terminal state for that replica as it cannot request votes (it is no longer a part of the configuration) nor it can receive any heartbeats as the rest of the quorum already forgot about it. It will be GC-ed by the controller. |
Added a missing trigger of Raft leadership notification after stepping down when a leader node is not longer part of raft group configuration. Signed-off-by: Michał Maślanka <michal@redpanda.com>
ff7751c
to
dfb0533
Compare
/ci-repeat 1 |
if (_leader_id) { | ||
_leader_id = std::nullopt; | ||
trigger_leadership_notification(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be part of do_step_down
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in some cases (when processing requests) we do not trigger the notification with no leader but immediately update leader with the new leader node id
co_await stop_node(vn.id()); | ||
} | ||
|
||
auto tolerance = 0.15; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I have a feeling that could be flaky in a noisy debug environment. In the end what we care about is that this interval is much smaller than the election timeout, maybe we can test that directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was worried about that and this is why i expressed the expected value based on the leadership transfer that is executed right before the reconfiguration. I was thinking that in the debug environment the leadership transfer would also be slower hence the test will self adapt to the env
When a node currently being a raft group leader is not a part of new configuration it must step down and become a follower. When stepping down a leader stops sending heartbeats to the followers allowing them to trigger election. The election starts only after an election timeout elapsed on on of the followers. This makes the whole process slow and during the whole time clients can not write and read from the raft group as it is leaderless. To address this issue a new method of step down was introduced. The new stepdown implementation which is going to be used for reconfiguration requests one of the followers to timeout immediately and trigger leader election. This speeds up the whole process and makes it much less disruptive as the stepdown is now comparable to leadership transfer. Signed-off-by: Michał Maślanka <michal@redpanda.com>
Added a test validating if a leader election caused by removing leader from the replica set takes a comparable amount of time to the leadership transfer. Signed-off-by: Michał Maślanka <michal@redpanda.com>
dfb0533
to
9c58109
Compare
/ci-repeat 1 |
ci failure: #19012 |
/backport v24.1.x |
/backport v23.3.x |
Failed to create a backport PR to v24.1.x branch. I tried:
|
Failed to create a backport PR to v23.3.x branch. I tried:
|
When a node currently being a raft group leader is not a part of new
configuration it must step down and become a follower. When stepping
down a leader stops sending heartbeats to the followers allowing them to
trigger election. The election starts only after an election timeout
elapsed on on of the followers. This makes the whole process
slow and during the whole time clients can not write and read from the
raft group as it is leaderless. To address this issue a new method of
step down was introduced. The new stepdown implementation which is
going to be used for reconfiguration requests one of the followers to
timeout immediately and trigger leader election. This speeds up the
whole process and makes it much less disruptive as the stepdown is now
comparable to leadership transfer.
Time to elect a leader before the fix:
after reconfiguration:
with the fix:
Backports Required
Release Notes
Improvements