changefeedccl: fix job progress regressing #37009

danhhz · 2019-04-22T23:12:51Z

When the ChangeFrontier processor for a changefeed is started up, it
creates a spanFrontier to track the overall progression for the feed.
Each watched span is set to the empty timestamp so we can tell the
difference between a rangefeed being behind and one still initializing.

However, rangefeed will happily return resolved timestamps less than the
one it was started with. If we feed this into the spanFrontier, it can
report a timestamp behind the one everything started up at. In the case
of a new changefeed, this is fine (though perhaps odd). In the case of a
changefeed being restarted from the job's high-water, this means we can
overwrite the high-water progress with the lower value, regressing it.
At the least, this is very surprising UX, but it's easy to argue that
it's a correctness issue.

Touches #36879

Release note (bug fix): Fixed a bug where CHANGEFEED job progress
would regress when the job was restarted.

When the ChangeFrontier processor for a changefeed is started up, it creates a spanFrontier to track the overall progression for the feed. Each watched span is set to the empty timestamp so we can tell the difference between a rangefeed being behind and one still initializing. However, rangefeed will happily return resolved timestamps less than the one it was started with. If we feed this into the spanFrontier, it can report a timestamp behind the one everything started up at. In the case of a new changefeed, this is fine (though perhaps odd). In the case of a changefeed being restarted from the job's high-water, this means we can overwrite the high-water progress with the lower value, regressing it. At the least, this is very surprising UX, but it's easy to argue that it's a correctness issue. Touches cockroachdb#36879 Release note (bug fix): Fixed a bug where `CHANGEFEED` job progress would regress when the job was restarted.

cockroach-teamcity · 2019-04-22T23:13:00Z

This change is

nvanbenschoten

Reviewed 2 of 2 files at r1, 5 of 5 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @danhhz)

pkg/roachpb/api.proto, line 1766 at r2 (raw file):

//
// Note that these resolved timestamps may be lower than the timestamp used in
// the RangeFeedRequest used to start the RnageFeed.

s/RnageFeed/RangeFeed/

I was surprised that rangefeeds would return resolved timestamps from before the request timestamp and this expectation mismatch was a component in a changefeed bug. It's not clear yet if the contract is wrong or simply underdocumented, so document it louder for now and leave a TODO to reconsider the contract when rangefeeds have more users. Also a very small cleanup on the `(*DistSender).RangeFeed` api while I'm in here. Release note: None

danhhz

TFTR!

bors r=nvanbenschoten

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

pkg/roachpb/api.proto, line 1766 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/RnageFeed/RangeFeed/

Done

37009: changefeedccl: fix job progress regressing r=nvanbenschoten a=danhhz When the ChangeFrontier processor for a changefeed is started up, it creates a spanFrontier to track the overall progression for the feed. Each watched span is set to the empty timestamp so we can tell the difference between a rangefeed being behind and one still initializing. However, rangefeed will happily return resolved timestamps less than the one it was started with. If we feed this into the spanFrontier, it can report a timestamp behind the one everything started up at. In the case of a new changefeed, this is fine (though perhaps odd). In the case of a changefeed being restarted from the job's high-water, this means we can overwrite the high-water progress with the lower value, regressing it. At the least, this is very surprising UX, but it's easy to argue that it's a correctness issue. Touches #36879 Release note (bug fix): Fixed a bug where `CHANGEFEED` job progress would regress when the job was restarted. 37030: changefeedccl: log more details when returning with non-retryable errors r=tbg a=danhhz The `%v` and `%+v` were switched from what I intended in the line that logs retryable errors and the one that handles non-retryable errors. Release note: None Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com>

craig · 2019-04-23T15:47:24Z

Build succeeded

GitHub CI (Cockroach)

danhhz requested review from nvanbenschoten and a team April 22, 2019 23:12

solongordon mentioned this pull request Apr 23, 2019

release: v19.1.0-rc.4 #37010

Closed

18 tasks

danhhz mentioned this pull request Apr 23, 2019

roachtest: cdc/crdb-chaos/rangefeed=true failed (skipped on release-19.1) #36905

Closed

nvanbenschoten approved these changes Apr 23, 2019

View reviewed changes

danhhz force-pushed the cdc_crdbchaos branch from eda9e25 to 6ee3261 Compare April 23, 2019 14:53

danhhz commented Apr 23, 2019

View reviewed changes

craig bot merged commit 6ee3261 into cockroachdb:master Apr 23, 2019

danhhz deleted the cdc_crdbchaos branch April 23, 2019 16:02

danhhz mentioned this pull request Apr 24, 2019

release-19.1: changefeedccl: fix job progress regressing #37091

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

changefeedccl: fix job progress regressing #37009

changefeedccl: fix job progress regressing #37009

danhhz commented Apr 22, 2019

cockroach-teamcity commented Apr 22, 2019

nvanbenschoten left a comment

danhhz left a comment

craig bot commented Apr 23, 2019

changefeedccl: fix job progress regressing #37009

changefeedccl: fix job progress regressing #37009

Conversation

danhhz commented Apr 22, 2019

cockroach-teamcity commented Apr 22, 2019

nvanbenschoten left a comment

Choose a reason for hiding this comment

danhhz left a comment

Choose a reason for hiding this comment

craig bot commented Apr 23, 2019

Build succeeded