Skip to content

Commit

Permalink
intentresolver: fix testrace flake by extending timeouts
Browse files Browse the repository at this point in the history
The unit tests which failed assumed that all messages intents in a single call
to ResolveIntents could be queued within the batch timeout and idle window which
is by default rather short (10 and 5 ms respectively). The testing validation
logic is rather strict and assumes that the batch will fire due to size rather
than time constraints. The fix is to allow the test to increase these values to
make the test more robust to load. Before this change the flake was reproducible
running testrace within several hundred iterations. After the test it has not
failed after running over 20k test race iterations from two concurrently running
executions.

Fixes #35064.

Release note: None
  • Loading branch information
ajwerner committed Feb 20, 2019
1 parent 1b0a449 commit aa1c66f
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 1 deletion.
5 changes: 4 additions & 1 deletion pkg/storage/intentresolver/intent_resolver.go
Original file line number Diff line number Diff line change
Expand Up @@ -82,14 +82,17 @@ const (
// defaultGCBatchWait is the default duration which the gc request batcher
// will wait between requests for a range before sending it.
defaultGCBatchWait = time.Second
)

var (
// intentResolutionBatchWait is used to configure the RequestBatcher which
// batches intent resolution requests across transactions. Intent resolution
// needs to occur in a relatively short period of time after the completion
// of a transaction in order to minimize the contention footprint of the write
// for other contending reads or writes. The chosen value was selected based
// on some light experimentation to ensure that performance does not degrade
// in the face of highly contended workloads.
// in the face of highly contended workloads. This setting is a var rather
// than a const in order to allow injection by unit tests.
intentResolutionBatchWait = 10 * time.Millisecond

// intentResolutionBatchIdle is similar to the above setting but is used when
Expand Down
19 changes: 19 additions & 0 deletions pkg/storage/intentresolver/intent_resolver_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -642,6 +642,8 @@ func TestCleanupIntents(t *testing.T) {
sendFuncs []sendFunc
expectedErr bool
expectedNum int
before func(tc *testCase)
after func()
}

cases := []testCase{
Expand Down Expand Up @@ -677,12 +679,23 @@ func TestCleanupIntents(t *testing.T) {
resolveIntentsSendFunc,
},
expectedNum: 3*intentResolverBatchSize + 3,
before: func(tc *testCase) {
// Under stress sometimes it can take more than 10ms to even call send.
// The batch wait is increased dramatically during this test to eliminate
// flakiness and ensure that all requests make it to the batcher in a timely
// manner.
resetWait := setDuration(&intentResolutionBatchWait, 500*time.Millisecond)
resetIdle := setDuration(&intentResolutionBatchIdle, 20*time.Millisecond)
tc.after = func() { resetWait(); resetIdle() }
},
},
}
stopper := stop.NewStopper()
defer stopper.Stop(context.Background())
for _, c := range cases {
t.Run("", func(t *testing.T) {
c.before(&c)
defer c.after()
ir := newIntentResolverWithSendFuncs(stopper, clock, newSendFuncs(c.sendFuncs...))
num, err := ir.CleanupIntents(context.Background(), c.intents, clock.Now(), roachpb.PUSH_ABORT)
assert.Equal(t, num, c.expectedNum, "number of resolved intents")
Expand Down Expand Up @@ -840,3 +853,9 @@ var (
return resp, nil
}
)

func setDuration(cur *time.Duration, new time.Duration) (cleanup func()) {
old := *cur
*cur = new
return func() { *cur = old }
}

0 comments on commit aa1c66f

Please sign in to comment.