-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
changefeedccl: fix job progress regressing #37009
Conversation
When the ChangeFrontier processor for a changefeed is started up, it creates a spanFrontier to track the overall progression for the feed. Each watched span is set to the empty timestamp so we can tell the difference between a rangefeed being behind and one still initializing. However, rangefeed will happily return resolved timestamps less than the one it was started with. If we feed this into the spanFrontier, it can report a timestamp behind the one everything started up at. In the case of a new changefeed, this is fine (though perhaps odd). In the case of a changefeed being restarted from the job's high-water, this means we can overwrite the high-water progress with the lower value, regressing it. At the least, this is very surprising UX, but it's easy to argue that it's a correctness issue. Touches cockroachdb#36879 Release note (bug fix): Fixed a bug where `CHANGEFEED` job progress would regress when the job was restarted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1, 5 of 5 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @danhhz)
pkg/roachpb/api.proto, line 1766 at r2 (raw file):
// // Note that these resolved timestamps may be lower than the timestamp used in // the RangeFeedRequest used to start the RnageFeed.
s/RnageFeed/RangeFeed/
I was surprised that rangefeeds would return resolved timestamps from before the request timestamp and this expectation mismatch was a component in a changefeed bug. It's not clear yet if the contract is wrong or simply underdocumented, so document it louder for now and leave a TODO to reconsider the contract when rangefeeds have more users. Also a very small cleanup on the `(*DistSender).RangeFeed` api while I'm in here. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r=nvanbenschoten
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/roachpb/api.proto, line 1766 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
s/RnageFeed/RangeFeed/
Done
37009: changefeedccl: fix job progress regressing r=nvanbenschoten a=danhhz When the ChangeFrontier processor for a changefeed is started up, it creates a spanFrontier to track the overall progression for the feed. Each watched span is set to the empty timestamp so we can tell the difference between a rangefeed being behind and one still initializing. However, rangefeed will happily return resolved timestamps less than the one it was started with. If we feed this into the spanFrontier, it can report a timestamp behind the one everything started up at. In the case of a new changefeed, this is fine (though perhaps odd). In the case of a changefeed being restarted from the job's high-water, this means we can overwrite the high-water progress with the lower value, regressing it. At the least, this is very surprising UX, but it's easy to argue that it's a correctness issue. Touches #36879 Release note (bug fix): Fixed a bug where `CHANGEFEED` job progress would regress when the job was restarted. 37030: changefeedccl: log more details when returning with non-retryable errors r=tbg a=danhhz The `%v` and `%+v` were switched from what I intended in the line that logs retryable errors and the one that handles non-retryable errors. Release note: None Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com>
Build succeeded |
When the ChangeFrontier processor for a changefeed is started up, it
creates a spanFrontier to track the overall progression for the feed.
Each watched span is set to the empty timestamp so we can tell the
difference between a rangefeed being behind and one still initializing.
However, rangefeed will happily return resolved timestamps less than the
one it was started with. If we feed this into the spanFrontier, it can
report a timestamp behind the one everything started up at. In the case
of a new changefeed, this is fine (though perhaps odd). In the case of a
changefeed being restarted from the job's high-water, this means we can
overwrite the high-water progress with the lower value, regressing it.
At the least, this is very surprising UX, but it's easy to argue that
it's a correctness issue.
Touches #36879
Release note (bug fix): Fixed a bug where
CHANGEFEED
job progresswould regress when the job was restarted.