-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloud_storage: Fix offset for leader epoch request for read-replica #14403
cloud_storage: Fix offset for leader epoch request for read-replica #14403
Conversation
8203a82
to
7a8d871
Compare
_partition->is_remote_fetch_enabled() | ||
&& _partition->cloud_data_available()) { | ||
is_read_replica | ||
|| (_partition->is_remote_fetch_enabled() && _partition->cloud_data_available())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the case is_read_replica && !cloud_data_available()
, we should explicitly return nullopt. At least it seems like if the cloud partition is empty, we could hit some asserts:
stmm.size() > 0, redpanda/src/v/cluster/partition.cc
Line 315 in 39062df
vassert(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In test if the request is performed before we sync it returns -1, but I added the fix just in case
RRR partition can return incorrect offset for epoch because it uses information from local Raft group. This commit fixes the issue by always using data from the cloud storage if the topic is a read-replica.
7a8d871
to
523bb54
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me. I'd maybe do a ci-repeat of the new test only to surface any potential flakiness.
wait_until(has_leader, | ||
timeout_sec=60, | ||
backoff_sec=10, | ||
err_msg="Failed to create a read-replica, no leadership") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could use Admin.await_stable_leader
I was running the test locally with |
/backport v23.2.x |
/backport v23.1.x |
Failed to create a backport PR to v23.1.x branch. I tried:
|
Read-replica uses local Raft group as a source of information for
OffsetForLeaderEpoch
request. This PR fixes this problem and adds a reproducer in form of ducktape test.Fixes #14402
Backports Required
Release Notes
Bug Fixes