Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Made prevote phase not resetting last heartbeat timestamp #11726

Merged
merged 4 commits into from
Jul 11, 2023

Conversation

mmaslankaprv
Copy link
Member

@mmaslankaprv mmaslankaprv commented Jun 27, 2023

When a voter receives vote request and it votes for the candidate it
updates the last heartbeat timeout. If this happens during the prevote
phase and in a deployment with even number of locks it may lead to
temporary live lock and not being able to elect the leader.

Fixes: #11657

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.1.x
  • v22.3.x
  • v22.2.x

Release Notes

Improvements

  • faster contoler failover

@mmaslankaprv mmaslankaprv changed the title Fix 11657 Made prevote phase not resetting last heartbeat timestamp Jun 28, 2023
@mmaslankaprv mmaslankaprv marked this pull request as ready for review June 28, 2023 09:53
Made entries indicating receiving append entries and vote request more
obvious.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
When a voter receives vote request and it votes for the candidate it
updates the last heartbeat timeout. If this happens during the prevote
phase and in a deployment with even number of locks it may lead to
temporary live lock and not being able to elect the leader.

Fixes: redpanda-data#11657

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Added test verifying if a controller is elected in timely fashion when
some of the cluster nodes are down.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
@mmaslankaprv
Copy link
Member Author

/ci-repeat 1

1 similar comment
@mmaslankaprv
Copy link
Member Author

/ci-repeat 1

@mmaslankaprv
Copy link
Member Author

/ci-repeat 1

@mmaslankaprv mmaslankaprv merged commit e2c9962 into redpanda-data:dev Jul 11, 2023
@mmaslankaprv mmaslankaprv deleted the fix-11657 branch July 11, 2023 08:02
@vbotbuildovich
Copy link
Collaborator

/backport v23.1.x

@vbotbuildovich
Copy link
Collaborator

/backport v22.3.x

@vbotbuildovich
Copy link
Collaborator

Failed to run cherry-pick command. I executed the commands below:

git checkout -b backport-pr-11726-v23.1.x-26 remotes/upstream/v23.1.x
git cherry-pick -x 4ff810bc8b872fa0ae04533af8d911173fb7e916 7d68e46def71138a77cc71529039b66eb5324a3d 995c7750fdcd6315785c1b982dc415979facb31e c6555b6c6076f760902efb3b97aba87e118c4acf

Workflow run logs.

@vbotbuildovich
Copy link
Collaborator

Failed to run cherry-pick command. I executed the commands below:

git checkout -b backport-pr-11726-v22.3.x-72 remotes/upstream/v22.3.x
git cherry-pick -x 4ff810bc8b872fa0ae04533af8d911173fb7e916 7d68e46def71138a77cc71529039b66eb5324a3d 995c7750fdcd6315785c1b982dc415979facb31e c6555b6c6076f760902efb3b97aba87e118c4acf

Workflow run logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI Failure (new_controller_available times out) in PartitionMovementTest.test_stale_node
3 participants