Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-15510: Fix follower's lastFetchedEpoch when fetch response has … #14457

Merged
merged 2 commits into from
Sep 28, 2023

Conversation

chernyih
Copy link
Contributor

@chernyih chernyih commented Sep 27, 2023

…no record

A regression is introduced by #13843. When a fetch response has no record for a partition, validBytes is 0. In this case, we incorrectly set the last fetched epoch to logAppendInfo.lastLeaderEpoch.asScala which is Optional.empty. We should use currentFetchState.lastFetchedEpoch instead.

…no record

A regression is introduced by apache#13843.
When a fetch response has no record for a partition, validBytes is 0. We shouldn't
set the last fetched epoch to logAppendInfo.lastLeaderEpoch.asScala since there
is no record and it is Optional.empty. We should use currentFetchState.lastFetchedEpoch
instead.
Copy link
Contributor

@rajinisivaram rajinisivaram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chernyih Thanks for the PR, LGTM

@rajinisivaram
Copy link
Contributor

@dajac Can you take a look as well, please? Thanks.

Copy link
Contributor

@divijvaidya divijvaidya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Though, I am having a hard time wondering about the impact of this bug. The next fetch call from follower will provide (incorrectly) an empty epoch for last fetched epoch. In which case, if the follower was diverging, it will not correctly reconcile since leader will not know about the divergence. But if the follower was diverging, then is there a case where the new leader would have returned empty records?

I understand that we have added a unit test, but I would feel much more comfortable if we add a test which would demonstrates the impact of this bug on log reconciliation and hence, would catch similar bugs in future.

@rajinisivaram
Copy link
Contributor

I couldn't think of a case where we might hit an actual truncation issue since divergence should have been detected in the first request after leader change, so will wait for @chernyih to respond. In any case, I think it makes sense to maintain the correct value for lastFetchedEpoch to avoid relying on timing assumptions that may not hold in future.

@divijvaidya
Copy link
Contributor

In any case, I think it makes sense to maintain the correct value for lastFetchedEpoch to avoid relying on timing assumptions that may not hold in future.

Totally agree. The intention of my comment earlier was to gauge the potential impact of this bug and prevent re-occurrence in future. I completely agree that it should be fixed nevertheless.

@viktorsomogyi
Copy link
Contributor

I also looked at how this lastFetchedEpoch is used in the next fetch but I don't think either it'd trigger truncation. Let's wait for @chernyih 's response. Since this is a small fix I think it'd be good to include it in the release.

@chernyih
Copy link
Contributor Author

@rajinisivaram @divijvaidya @viktorsomogyi Thank you for the review. After reading the comments and thinking more about it, it won't impact truncation of fetch since if a leader has returned empty records, it means there is no divergence.

@chernyih
Copy link
Contributor Author

The test failures do not look related. I compared with a few recent merged commit tests. For the failed tests that are not found in the recent commit tests, i ran those tests locally without failures.
https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14457/1/tests/
https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14457/2/tests

The second build was aborted. Didn't see any obvious error from the build log. I don't think the abort is related to this PR since the first build succeeded and saw another PR's build also was aborted. To be safe, we can rebuild. However, i couldn't find a way to rebuild.

From build log

Gradle Test Run :connect:runtime:test > Gradle Test Executor 83 > org.apache.kafka.connect.integration.BlockingConnectorTest > testBlockInSinkTaskStop SKIPPED
Daemon vm is shutting down... The daemon has exited normally or was terminated in response to a user interrupt.
----- End of the daemon log -----
FAILURE: Build failed with an exception.
* What went wrong:
Gradle build daemon disappeared unexpectedly (it may have been killed or may have crashed)

Copy link
Contributor

@rajinisivaram rajinisivaram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chernyih Thanks for the update. Test failures not related, merging to trunk.

@rajinisivaram rajinisivaram merged commit dedfed0 into apache:trunk Sep 28, 2023
1 check failed
@chernyih chernyih deleted the KAFKA-15510 branch September 28, 2023 16:08
rreddy-22 pushed a commit to rreddy-22/kafka-rreddy that referenced this pull request Oct 3, 2023
apache#14457)

When a fetch response has no record for a partition, validBytes is 0. We shouldn't set the last fetched epoch to logAppendInfo.lastLeaderEpoch.asScala since there is no record and it is Optional.empty. We should use currentFetchState.lastFetchedEpoch instead.

Reviewers: Divij Vaidya <diviv@amazon.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
jeqo pushed a commit to aiven/kafka that referenced this pull request Oct 4, 2023
apache#14457)

When a fetch response has no record for a partition, validBytes is 0. We shouldn't set the last fetched epoch to logAppendInfo.lastLeaderEpoch.asScala since there is no record and it is Optional.empty. We should use currentFetchState.lastFetchedEpoch instead.

Reviewers: Divij Vaidya <diviv@amazon.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
k-wall pushed a commit to k-wall/kafka that referenced this pull request Nov 21, 2023
apache#14457)

When a fetch response has no record for a partition, validBytes is 0. We shouldn't set the last fetched epoch to logAppendInfo.lastLeaderEpoch.asScala since there is no record and it is Optional.empty. We should use currentFetchState.lastFetchedEpoch instead.

Reviewers: Divij Vaidya <diviv@amazon.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
jeqo pushed a commit to aiven/kafka that referenced this pull request Nov 24, 2023
apache#14457)

When a fetch response has no record for a partition, validBytes is 0. We shouldn't set the last fetched epoch to logAppendInfo.lastLeaderEpoch.asScala since there is no record and it is Optional.empty. We should use currentFetchState.lastFetchedEpoch instead.

Reviewers: Divij Vaidya <diviv@amazon.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
jeqo pushed a commit to aiven/kafka that referenced this pull request Nov 24, 2023
apache#14457)

When a fetch response has no record for a partition, validBytes is 0. We shouldn't set the last fetched epoch to logAppendInfo.lastLeaderEpoch.asScala since there is no record and it is Optional.empty. We should use currentFetchState.lastFetchedEpoch instead.

Reviewers: Divij Vaidya <diviv@amazon.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
jeqo pushed a commit to aiven/kafka that referenced this pull request Dec 5, 2023
apache#14457)

When a fetch response has no record for a partition, validBytes is 0. We shouldn't set the last fetched epoch to logAppendInfo.lastLeaderEpoch.asScala since there is no record and it is Optional.empty. We should use currentFetchState.lastFetchedEpoch instead.

Reviewers: Divij Vaidya <diviv@amazon.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
giuseppelillo pushed a commit to aiven/kafka that referenced this pull request Jan 2, 2024
apache#14457)

When a fetch response has no record for a partition, validBytes is 0. We shouldn't set the last fetched epoch to logAppendInfo.lastLeaderEpoch.asScala since there is no record and it is Optional.empty. We should use currentFetchState.lastFetchedEpoch instead.

Reviewers: Divij Vaidya <diviv@amazon.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
AnatolyPopov pushed a commit to aiven/kafka that referenced this pull request Jan 2, 2024
apache#14457)

When a fetch response has no record for a partition, validBytes is 0. We shouldn't set the last fetched epoch to logAppendInfo.lastLeaderEpoch.asScala since there is no record and it is Optional.empty. We should use currentFetchState.lastFetchedEpoch instead.

Reviewers: Divij Vaidya <diviv@amazon.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
giuseppelillo pushed a commit to aiven/kafka that referenced this pull request Jan 2, 2024
apache#14457)

When a fetch response has no record for a partition, validBytes is 0. We shouldn't set the last fetched epoch to logAppendInfo.lastLeaderEpoch.asScala since there is no record and it is Optional.empty. We should use currentFetchState.lastFetchedEpoch instead.

Reviewers: Divij Vaidya <diviv@amazon.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
AnatolyPopov pushed a commit to aiven/kafka that referenced this pull request Jan 11, 2024
apache#14457)

When a fetch response has no record for a partition, validBytes is 0. We shouldn't set the last fetched epoch to logAppendInfo.lastLeaderEpoch.asScala since there is no record and it is Optional.empty. We should use currentFetchState.lastFetchedEpoch instead.

Reviewers: Divij Vaidya <diviv@amazon.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
jeqo pushed a commit to aiven/kafka that referenced this pull request Jan 23, 2024
apache#14457)

When a fetch response has no record for a partition, validBytes is 0. We shouldn't set the last fetched epoch to logAppendInfo.lastLeaderEpoch.asScala since there is no record and it is Optional.empty. We should use currentFetchState.lastFetchedEpoch instead.

Reviewers: Divij Vaidya <diviv@amazon.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
jeqo pushed a commit to aiven/kafka that referenced this pull request Jan 24, 2024
apache#14457)

When a fetch response has no record for a partition, validBytes is 0. We shouldn't set the last fetched epoch to logAppendInfo.lastLeaderEpoch.asScala since there is no record and it is Optional.empty. We should use currentFetchState.lastFetchedEpoch instead.

Reviewers: Divij Vaidya <diviv@amazon.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
jeqo pushed a commit to aiven/kafka that referenced this pull request Feb 6, 2024
apache#14457)

When a fetch response has no record for a partition, validBytes is 0. We shouldn't set the last fetched epoch to logAppendInfo.lastLeaderEpoch.asScala since there is no record and it is Optional.empty. We should use currentFetchState.lastFetchedEpoch instead.

Reviewers: Divij Vaidya <diviv@amazon.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
AnatolyPopov pushed a commit to aiven/kafka that referenced this pull request Feb 16, 2024
apache#14457)

When a fetch response has no record for a partition, validBytes is 0. We shouldn't set the last fetched epoch to logAppendInfo.lastLeaderEpoch.asScala since there is no record and it is Optional.empty. We should use currentFetchState.lastFetchedEpoch instead.

Reviewers: Divij Vaidya <diviv@amazon.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
jeqo pushed a commit to aiven/kafka that referenced this pull request Apr 17, 2024
apache#14457)

When a fetch response has no record for a partition, validBytes is 0. We shouldn't set the last fetched epoch to logAppendInfo.lastLeaderEpoch.asScala since there is no record and it is Optional.empty. We should use currentFetchState.lastFetchedEpoch instead.

Reviewers: Divij Vaidya <diviv@amazon.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
jeqo pushed a commit to aiven/kafka that referenced this pull request May 10, 2024
apache#14457)

When a fetch response has no record for a partition, validBytes is 0. We shouldn't set the last fetched epoch to logAppendInfo.lastLeaderEpoch.asScala since there is no record and it is Optional.empty. We should use currentFetchState.lastFetchedEpoch instead.

Reviewers: Divij Vaidya <diviv@amazon.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
jeqo pushed a commit to aiven/kafka that referenced this pull request May 10, 2024
apache#14457)

When a fetch response has no record for a partition, validBytes is 0. We shouldn't set the last fetched epoch to logAppendInfo.lastLeaderEpoch.asScala since there is no record and it is Optional.empty. We should use currentFetchState.lastFetchedEpoch instead.

Reviewers: Divij Vaidya <diviv@amazon.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
jeqo pushed a commit to aiven/kafka that referenced this pull request May 17, 2024
apache#14457)

When a fetch response has no record for a partition, validBytes is 0. We shouldn't set the last fetched epoch to logAppendInfo.lastLeaderEpoch.asScala since there is no record and it is Optional.empty. We should use currentFetchState.lastFetchedEpoch instead.

Reviewers: Divij Vaidya <diviv@amazon.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
jeqo pushed a commit to aiven/kafka that referenced this pull request May 17, 2024
apache#14457)

When a fetch response has no record for a partition, validBytes is 0. We shouldn't set the last fetched epoch to logAppendInfo.lastLeaderEpoch.asScala since there is no record and it is Optional.empty. We should use currentFetchState.lastFetchedEpoch instead.

Reviewers: Divij Vaidya <diviv@amazon.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
jeqo pushed a commit to aiven/kafka that referenced this pull request May 20, 2024
apache#14457)

When a fetch response has no record for a partition, validBytes is 0. We shouldn't set the last fetched epoch to logAppendInfo.lastLeaderEpoch.asScala since there is no record and it is Optional.empty. We should use currentFetchState.lastFetchedEpoch instead.

Reviewers: Divij Vaidya <diviv@amazon.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
tvainika pushed a commit to aiven/kafka that referenced this pull request Jul 2, 2024
apache#14457)

When a fetch response has no record for a partition, validBytes is 0. We shouldn't set the last fetched epoch to logAppendInfo.lastLeaderEpoch.asScala since there is no record and it is Optional.empty. We should use currentFetchState.lastFetchedEpoch instead.

Reviewers: Divij Vaidya <diviv@amazon.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants