-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: cdc/ledger/rangefeed=true failed #36077
Comments
SHA: https://github.com/cockroachdb/cockroach/commits/837e946efc272bd8a9e0e08484733f8755ff5ab1 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1247401&tab=buildLog
|
First failure has no logs and is from long enough ago that it's missing all kinds of fixes. Most recent failure is on release-19.1 and should be fixed by backporting #36852, but I'm waiting for 19.1.1 to do that |
SHA: https://github.com/cockroachdb/cockroach/commits/46f8608c4fe2d94b771beb37bcee19136040fd74 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1253450&tab=buildLog
|
Fished this from the logs in the above failures:
If I'm tracing this right, it's from cockroach/pkg/cmd/roachtest/cdc.go Lines 928 to 931 in 04f4b54
|
Ooo and this is on master. Yesterday, I got a "descriptor not found" error that was missing the retryable tag, so there's two outstanding now. Sadly, in my logging to print the error details, I swapped which one should have the |
Repro'd locally which pinned it down to coming out of the rangefeedImpl call on poller. I added more debugging inside that method, but in the meantime I suspect WaitForTS. cockroach/pkg/ccl/changefeedccl/poller.go Line 202 in 1841bc4
|
Aha! Got one coming out of
|
Oh, I bet it's this. I noticed that we weren't handling NotLeaseHolderError in cockroach/pkg/storage/store.go Lines 3125 to 3142 in 8dfd761
Sounds like we need to handle NotLeaseHolderError in this switch. @nvanbenschoten does that sound right to you? cockroach/pkg/kv/dist_sender_rangefeed.go Lines 159 to 194 in 8dfd761
|
Yes, that all sounds correct to me. It's a little strange that we need to handle |
Would it be cleaner to remove the optimization and make that code throw a RangeNotFoundError instead then? |
RangeFeeds work on followers, so they don't get NotLeaseHolderError in the normal course of things. However, if an uninitialized replica is contacted for a RangeFeed, it uses NotLeaseHolderError as a hint for where a better one might be found. The error handling in partialRangeFeed was missing this case, causing changefeeds to fail on what should be a retryable error. Touches cockroachdb#36077 Release note: None
(*Store).Send has an optimization for uninitialized replicas to send back a NotLeaseHolderError with a hint of where an initialized replica might be found. RangeFeeds can always be served from followers and so don't otherwise return NotLeaseHolderError. This commit changes the corresponding codepath in (*Store).Send to return a RangeNotFoundError instead. Touches cockroachdb#36077 Release note: None
37099: storage: stop returning NotLeaseHolderError from (*Store).RangeFeed r=nvanbenschoten a=danhhz (*Store).Send has an optimization for uninitialized replicas to send back a NotLeaseHolderError with a hint of where an initialized replica might be found. RangeFeeds can always be served from followers and so don't otherwise return NotLeaseHolderError. This commit changes the corresponding codepath in (*Store).Send to return a RangeNotFoundError instead. Touches #36077 Release note: None Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com>
(*Store).Send has an optimization for uninitialized replicas to send back a NotLeaseHolderError with a hint of where an initialized replica might be found. RangeFeeds can always be served from followers and so don't otherwise return NotLeaseHolderError. This commit changes the corresponding codepath in (*Store).Send to return a RangeNotFoundError instead. Touches cockroachdb#36077 Release note: None
A bunch of things got fixed, so at this point seems like any new failures are likely to be new issues. Closing |
SHA: https://github.com/cockroachdb/cockroach/commits/6cac063ae1cb578130afbafb2abf4035268a10c9
Parameters:
To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1194308&tab=buildLog
The text was updated successfully, but these errors were encountered: