Skip to content

Commit

Permalink
Merge #40435 #40454
Browse files Browse the repository at this point in the history
40435: storage: be more resilient to learner snap conflicts r=tbg a=danhhz

The replica addition code first adds it as a raft learner, then hands it
a snapshot, then promotes it to a voter. For various unfortunate reasons
described in the code, we have to allow the raft snapshot queue to
_also_ send snapshots to learners. A recent etcd change exposed that
this code has always been brittle to the raft snapshot queue winning the
race and starting the snapshot first by making the race dramatically
more likely.

After this commit, learner replica addition grabs a (best effort) lock
before the conf change txn to add the learner is started. This prevents
the race when the raft leader (and thus the raft snapshot queue for that
range) is on the same node.

Closes #40207

Release note: None

40454: kv: improve unit testing around txnSpanRefresher r=nvanbenschoten a=nvanbenschoten

This was decently tested through its inclusion in the TxnCoordSender,
but since I'm planning on changing some behavior to address #36431, I
figured we should first build up a test suite in the same style that
we have for other transaction interceptors.

Release note: None

Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
  • Loading branch information
3 people committed Sep 4, 2019
3 parents db03a5c + 20ba7bc + 6ffbc65 commit 09457e4
Show file tree
Hide file tree
Showing 16 changed files with 752 additions and 62 deletions.
4 changes: 2 additions & 2 deletions Gopkg.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 5 additions & 6 deletions Gopkg.toml
Original file line number Diff line number Diff line change
Expand Up @@ -38,13 +38,12 @@ ignored = [

[[constraint]]
name = "go.etcd.io/etcd"
# We're stopping just shy of 4a2b4c8f7e0a3754fdd5a3341a27c2431c2a5385
# which picks up a fix to an inefficiency that at the time of writing
# triggers a bug:
# https://github.com/cockroachdb/cockroach/issues/40207
#
# The last time this was bumped forward, it was for a targeted fix of #40207,
# which temporarily prevented us from being compatible with etcd's HEAD. The
# PR fixing 40207 bumped it forward exactly one commit to minimize unrelated
# fallout. Feel free to move this back to `branch = "master"` at any time.
# branch = "master"
revision = "9b29151d3072511f574e7272a5348504086013fa"
revision = "4a2b4c8f7e0a3754fdd5a3341a27c2431c2a5385"

# Used for the API client; we want the latest.
[[constraint]]
Expand Down
25 changes: 25 additions & 0 deletions pkg/kv/txn_interceptor_pipeliner_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -43,12 +43,37 @@ func (m *mockLockedSender) SendLocked(
return m.mockFn(ba)
}

// MockSend sets the mockLockedSender mocking function.
func (m *mockLockedSender) MockSend(
fn func(roachpb.BatchRequest) (*roachpb.BatchResponse, *roachpb.Error),
) {
m.mockFn = fn
}

// Reset resets the mockLockedSender mocking function to a no-op.
func (m *mockLockedSender) Reset() {
m.mockFn = nil
}

// ChainMockSend sets a series of mocking functions on the mockLockedSender.
// The provided mocking functions are set in the order that they are provided
// and a given mocking function is set after the previous one has been called.
func (m *mockLockedSender) ChainMockSend(
fns ...func(roachpb.BatchRequest) (*roachpb.BatchResponse, *roachpb.Error),
) {
for i := range fns {
i := i
fn := fns[i]
fns[i] = func(ba roachpb.BatchRequest) (*roachpb.BatchResponse, *roachpb.Error) {
if i < len(fns)-1 {
m.mockFn = fns[i+1]
}
return fn(ba)
}
}
m.mockFn = fns[0]
}

func makeMockTxnPipeliner() (txnPipeliner, *mockLockedSender) {
mockSender := &mockLockedSender{}
return txnPipeliner{
Expand Down
3 changes: 1 addition & 2 deletions pkg/kv/txn_interceptor_span_refresher.go
Original file line number Diff line number Diff line change
Expand Up @@ -100,8 +100,6 @@ var MaxTxnRefreshSpansBytes = settings.RegisterIntSetting(
// the timestamp cache entries for these reads are updated and the transaction
// is free to update its provisional commit timestamp without needing to
// restart.
//
// TODO(nvanbenschoten): Unit test this file.
type txnSpanRefresher struct {
st *cluster.Settings
knobs *ClientTestingKnobs
Expand Down Expand Up @@ -315,6 +313,7 @@ func (sr *txnSpanRefresher) tryUpdatingTxnSpans(
}

// Refresh all spans (merge first).
// TODO(nvanbenschoten): actually merge spans.
refreshSpanBa := roachpb.BatchRequest{}
refreshSpanBa.Txn = refreshTxn
addRefreshes := func(refreshes []roachpb.Span, write bool) {
Expand Down
Loading

0 comments on commit 09457e4

Please sign in to comment.