-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft: in append_entries skip batches that we already have #17895
Conversation
It is similar to for_each_ref, but advances only if the consumer returns ss::stop_iteration::no. I.e. the batch where the consumer stopped remains available for reading by subsequent consumers.
Extract configurations using a wrapping batch consumer instead.
new failures in https://buildkite.com/redpanda/redpanda/builds/47884#018ee860-7f02-47bd-90b3-3b3f37d26d1c:
|
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/47884#018ee860-7f09-431e-9e3a-b7eeb9d6d3f8 |
09b8df7
to
49d13fd
Compare
oh this is cool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neat fix, mostly nits, makes sense to me.
co_return reply; | ||
} | ||
|
||
co_return co_await do_append_entries(std::move(r)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only deviation is this do_append_entries is out of the try block now (which is still logically same).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is a bit weird that we printed an exception in the recursive call as a "truncation failure". OTOH it is written to not throw exceptions, so probably not a big difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Had the same question
"current state: {}", | ||
batch_prev_log_index, | ||
last_matched, | ||
meta()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe useful to log current log offset state (lstats)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO what we have in meta()
should be mostly enough (commit_index and dirty_offset are there)
src/v/raft/consensus.cc
Outdated
struct find_mismatch_consumer { | ||
ss::future<ss::stop_iteration> | ||
operator()(const model::record_batch& b) { | ||
model::offset last_offset = last_matched |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have this method in record_batch()
model::offset last_offset() const { return _header.last_offset();
I"m curious if we can use that instead of computing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I used initially :) But I forgot that normally replicated batches don't have base_offset set yet (even though recovery batches do!) and we have to calculate the offsets manually. This is actually pretty confusing, I wonder if we should add some non-serialized flag to the batch header indicating that the offsets are still not set.
Ok, 300 iterations of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great !
This is important for the case when we already have _all_ batches locally (possible if e.g. the request was delayed/duplicated). In this case we don't want to truncate, otherwise we might lose already committed data. Fixes redpanda-data#17731
49d13fd
to
f0c5772
Compare
test failure is #17847 (and some result publishing woes) |
/backport v23.3.x |
/backport v23.2.x |
Failed to create a backport PR to v23.3.x branch. I tried:
|
Failed to create a backport PR to v23.2.x branch. I tried:
|
This is important for the case when we already have all batches locally (possible if e.g. the request was delayed/duplicated). In this case we don't want to truncate, otherwise we might lose already committed data.
Fixes #17731
Backports Required
Release Notes
Bug Fixes