storage: Don't swallow ErrEpochAlreadyIncremented #36942

bdarnell · 2019-04-18T22:12:04Z

IncrementEpoch was failing to distinguish between the request that
caused the increment and another caller making the same increment.
This is incorrect since a successful increment tells you when the
node was confirmed dead, while relying on another node's increment
leaves this uncertain.

In rare cases involving a badly overloaded cluster, this could result
in inconsistencies (non-serializable transactions) due to incorrect
timestamp cache management.

Fixes #35986

Release note (bug fix): Fix a rare inconsistency that could occur on
badly overloaded clusters.

cockroach-teamcity · 2019-04-18T22:12:10Z

This change is

tbg

While looking at this issue I was surprised how subtle the guarantee that our status.Expiration becomes the base of the liveness CPut is. Can you think of any assertions we could add to make sure this is true?

At the very least, leave a comment on requestLeaseAsync to highlight the role of the status (I don't think we use much of it, so it could be tempting to refactor things there, introducing bugs by accident)

Reviewed 3 of 3 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell)

pkg/storage/replica_range_lease.go, line 311 at r1 (raw file):
The explanation of why we can't just go ahead is awkwardly broken up over these two paragraphs. Consider editing from this line on (and giving another pass for clarity to my text):

without having its epoch incremented, i.e. that it was in fact live at status.Timestamp. It would be incorrect to simply proceed to sending our lease request since our lease.Start may precede the start timestamp of the predecessor lease (i.e. likely that put into place by whoever beat us to the epoch increment), and so under this lease this node's timestamp cache would not necessarily reflect all reads served by the other leaseholder.
...
ErrEpochAlreadyIncremented is not an unusual error to observe, so we don't log it.

tbg · 2019-04-19T07:36:27Z

CI failed because of a known flake (now skipped): #36948

IncrementEpoch was failing to distinguish between the request that caused the increment and another caller making the same increment. This is incorrect since a successful increment tells you *when* the node was confirmed dead, while relying on another node's increment leaves this uncertain. In rare cases involving a badly overloaded cluster, this could result in inconsistencies (non-serializable transactions) due to incorrect timestamp cache management. Fixes cockroachdb#35986 Release note (bug fix): Fix a rare inconsistency that could occur on badly overloaded clusters.

bdarnell · 2019-04-19T17:36:14Z

I've updated the new comment, using part of your new text (minus the parenthetical about the node who completed the IncrementEpoch having a lease - if that node managed to get a lease, our lease would fail because PrevLease wouldn't match. The problematic case is when the node whose epoch we're trying to increment still has the lease, and the node that wins the race to perform the IncrementEpoch loses the race to RequestLease).

I'm not sure what specifically you're looking for in a requestLeaseAsync comment, but later I'm going to do another pass over this file while it's fresh in my mind to document everything I can.

bdarnell · 2019-04-19T17:37:28Z

bors r=tbg

36942: storage: Don't swallow ErrEpochAlreadyIncremented r=tbg a=bdarnell IncrementEpoch was failing to distinguish between the request that caused the increment and another caller making the same increment. This is incorrect since a successful increment tells you *when* the node was confirmed dead, while relying on another node's increment leaves this uncertain. In rare cases involving a badly overloaded cluster, this could result in inconsistencies (non-serializable transactions) due to incorrect timestamp cache management. Fixes #35986 Release note (bug fix): Fix a rare inconsistency that could occur on badly overloaded clusters. Co-authored-by: Ben Darnell <ben@bendarnell.com>

craig · 2019-04-19T18:04:31Z

Build succeeded

GitHub CI (Cockroach)

tbg · 2019-04-19T20:49:08Z

Comment looks really good now, thanks!

bdarnell requested review from tbg and a team April 18, 2019 22:12

tbg approved these changes Apr 19, 2019

View reviewed changes

bdarnell force-pushed the epoch-already-incremented branch from 473f106 to 8de884a Compare April 19, 2019 17:32

This was referenced Apr 19, 2019

release-19.1: storage: Don't swallow ErrEpochAlreadyIncremented #36959

Merged

release-2.1: storage: Don't swallow ErrEpochAlreadyIncremented #36960

Merged

craig bot merged commit 8de884a into cockroachdb:master Apr 19, 2019

bdarnell deleted the epoch-already-incremented branch April 21, 2019 16:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: Don't swallow ErrEpochAlreadyIncremented #36942

storage: Don't swallow ErrEpochAlreadyIncremented #36942

bdarnell commented Apr 18, 2019

cockroach-teamcity commented Apr 18, 2019

tbg left a comment

tbg commented Apr 19, 2019

bdarnell commented Apr 19, 2019

bdarnell commented Apr 19, 2019

craig bot commented Apr 19, 2019

tbg commented Apr 19, 2019

storage: Don't swallow ErrEpochAlreadyIncremented #36942

storage: Don't swallow ErrEpochAlreadyIncremented #36942

Conversation

bdarnell commented Apr 18, 2019

cockroach-teamcity commented Apr 18, 2019

tbg left a comment

Choose a reason for hiding this comment

tbg commented Apr 19, 2019

bdarnell commented Apr 19, 2019

bdarnell commented Apr 19, 2019

craig bot commented Apr 19, 2019

Build succeeded

tbg commented Apr 19, 2019