Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changefeedccl: fix job progress regressing #37009

Merged
merged 2 commits into from
Apr 23, 2019

Conversation

danhhz
Copy link
Contributor

@danhhz danhhz commented Apr 22, 2019

When the ChangeFrontier processor for a changefeed is started up, it
creates a spanFrontier to track the overall progression for the feed.
Each watched span is set to the empty timestamp so we can tell the
difference between a rangefeed being behind and one still initializing.

However, rangefeed will happily return resolved timestamps less than the
one it was started with. If we feed this into the spanFrontier, it can
report a timestamp behind the one everything started up at. In the case
of a new changefeed, this is fine (though perhaps odd). In the case of a
changefeed being restarted from the job's high-water, this means we can
overwrite the high-water progress with the lower value, regressing it.
At the least, this is very surprising UX, but it's easy to argue that
it's a correctness issue.

Touches #36879

Release note (bug fix): Fixed a bug where CHANGEFEED job progress
would regress when the job was restarted.

When the ChangeFrontier processor for a changefeed is started up, it
creates a spanFrontier to track the overall progression for the feed.
Each watched span is set to the empty timestamp so we can tell the
difference between a rangefeed being behind and one still initializing.

However, rangefeed will happily return resolved timestamps less than the
one it was started with. If we feed this into the spanFrontier, it can
report a timestamp behind the one everything started up at. In the case
of a new changefeed, this is fine (though perhaps odd). In the case of a
changefeed being restarted from the job's high-water, this means we can
overwrite the high-water progress with the lower value, regressing it.
At the least, this is very surprising UX, but it's easy to argue that
it's a correctness issue.

Touches cockroachdb#36879

Release note (bug fix): Fixed a bug where `CHANGEFEED` job progress
would regress when the job was restarted.
@danhhz danhhz requested review from nvanbenschoten and a team April 22, 2019 23:12
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1, 5 of 5 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @danhhz)


pkg/roachpb/api.proto, line 1766 at r2 (raw file):

//
// Note that these resolved timestamps may be lower than the timestamp used in
// the RangeFeedRequest used to start the RnageFeed.

s/RnageFeed/RangeFeed/

I was surprised that rangefeeds would return resolved timestamps from
before the request timestamp and this expectation mismatch was a
component in a changefeed bug. It's not clear yet if the contract is
wrong or simply underdocumented, so document it louder for now and leave
a TODO to reconsider the contract when rangefeeds have more users.

Also a very small cleanup on the `(*DistSender).RangeFeed` api while I'm
in here.

Release note: None
Copy link
Contributor Author

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r=nvanbenschoten

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/roachpb/api.proto, line 1766 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/RnageFeed/RangeFeed/

Done

craig bot pushed a commit that referenced this pull request Apr 23, 2019
37009: changefeedccl: fix job progress regressing r=nvanbenschoten a=danhhz

When the ChangeFrontier processor for a changefeed is started up, it
creates a spanFrontier to track the overall progression for the feed.
Each watched span is set to the empty timestamp so we can tell the
difference between a rangefeed being behind and one still initializing.

However, rangefeed will happily return resolved timestamps less than the
one it was started with. If we feed this into the spanFrontier, it can
report a timestamp behind the one everything started up at. In the case
of a new changefeed, this is fine (though perhaps odd). In the case of a
changefeed being restarted from the job's high-water, this means we can
overwrite the high-water progress with the lower value, regressing it.
At the least, this is very surprising UX, but it's easy to argue that
it's a correctness issue.

Touches #36879

Release note (bug fix): Fixed a bug where `CHANGEFEED` job progress
would regress when the job was restarted.

37030: changefeedccl: log more details when returning with non-retryable errors r=tbg a=danhhz

The `%v` and `%+v` were switched from what I intended in the line that
logs retryable errors and the one that handles non-retryable errors.

Release note: None

Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com>
@craig
Copy link
Contributor

craig bot commented Apr 23, 2019

Build succeeded

@craig craig bot merged commit 6ee3261 into cockroachdb:master Apr 23, 2019
@danhhz danhhz deleted the cdc_crdbchaos branch April 23, 2019 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants