Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop recovery when follower offset was already updated #15049

Merged
merged 2 commits into from
Nov 21, 2023

Conversation

mmaslankaprv
Copy link
Member

@mmaslankaprv mmaslankaprv commented Nov 20, 2023

Recovery and replicate stms are not synchronized. It may be the case
when both of stms are active at the same time that the same batch is
delivered to the follower twice. In general this batch duplication is
harmless as Raft is not vulnerable for messages redelivery but it may
cause unnecessary truncation and latency increase.

Added a check validating expected log end offset right before sending
recovery append entries request. This will prevent sending the same set
of batches twice to the follower.

Fixes: #14413

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.2.x
  • v23.1.x
  • v22.3.x

Release Notes

  • none

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Recovery and replicate stms are not synchronized. It may be the case
when both of stms are active at the same time that the same batch is
delivered to the follower twice. In general this batch duplication is
harmless as Raft is not vulnerable for messages redelivery but it may
cause unnecessary truncation and latency increase.

Added a check validating expected log end offset right before sending
recovery append entries request. This will prevent sending the same set
of batches twice to the follower.

Fixes: redpanda-data#14413

Signed-off-by: Michal Maslanka <michal@redpanda.com>
@mmaslankaprv mmaslankaprv merged commit 54724bd into redpanda-data:dev Nov 21, 2023
20 checks passed
@mmaslankaprv mmaslankaprv deleted the fix-14413 branch November 21, 2023 17:22
@vbotbuildovich
Copy link
Collaborator

/backport v23.2.x

@vbotbuildovich
Copy link
Collaborator

/backport v23.1.x

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v23.2.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-15049-v23.2.x-941 remotes/upstream/v23.2.x
git cherry-pick -x cb4e6075344b50cc3606e4d7946a4ad985404c4c 7bb54d3c5e747f0809e7d9fc74a36e2a8a34e25a

Workflow run logs.

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v23.1.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-15049-v23.1.x-120 remotes/upstream/v23.1.x
git cherry-pick -x cb4e6075344b50cc3606e4d7946a4ad985404c4c 7bb54d3c5e747f0809e7d9fc74a36e2a8a34e25a

Workflow run logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI Failure (vnode mismatch) in gtest_raft_rpunit
4 participants