Skip to content

Commit

Permalink
Merge #100541
Browse files Browse the repository at this point in the history
100541: logictest: disable circuit breakers in TestParallel r=yuzefovich a=yuzefovich

**roachtest: remove redundant enabling of circuit breakers in TPCC**

This commit removes the now redundant change to the
`kv.replica_circuit_breaker.slow_replication_threshold` cluster setting
which enables the circuit breakers on slow replication. This setting is
already enabled by default.

Release note: None

**logictest: disable circuit breakers in TestParallel**

We've seen TestParallel fail a couple of times in the last few weeks
with an error like
```
(XXUUU) job 852880168431714305: could not mark as reverting: job 852880168431714305: select-job: replica unavailable: (n3,s3):3 unable to serve request to r9:/Table/{5-6} [(n1,s1):1, (n2,s2):2, (n3,s3):3, next=4, gen=4]: closed timestamp: 1680349889.289547265,0 (2023-04-01 11:51:29); raft status: {"id":"3","term":70,"vote":"3","commit":110,"lead":"0","raftState":"StateCandidate","applied":110,"progress":{},"leadtransferee":"0"}: have been waiting 62.00s for slow proposal RequestLease [/Table/5,/Min)
replica_raft.go:1387: in refreshProposalsLocked()
```
My hypothesis is that this failure is not concerning, and it simply
occurs rarely because TestParallel overloads the machine significantly.
To prevent this flake from occurring this commit disables the circuit
breakers on slow proposals.

Fixes: #100393.

Release note: None

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
  • Loading branch information
craig[bot] and yuzefovich committed Apr 4, 2023
2 parents 15b482d + ae3e14d commit 7b3f394
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 13 deletions.
16 changes: 3 additions & 13 deletions pkg/cmd/roachtest/tests/tpcc.go
Original file line number Diff line number Diff line change
Expand Up @@ -93,11 +93,6 @@ type tpccOptions struct {
// also be doing a rolling-restart into the new binary while the cluster
// is running, but that feels like jamming too much into the tpcc setup.
Start func(context.Context, test.Test, cluster.Cluster)
// EnableCircuitBreakers causes the kv.replica_circuit_breaker.slow_replication_threshold
// setting to be populated, which enables per-Replica circuit breakers.
//
// TODO(tbg): remove this once https://github.com/cockroachdb/cockroach/issues/74705 is completed.
EnableCircuitBreakers bool
// SkipPostRunCheck, if set, skips post TPC-C run checks.
SkipPostRunCheck bool
DisableDefaultScheduledBackup bool
Expand Down Expand Up @@ -165,10 +160,6 @@ func setupTPCC(
opts.Start(ctx, t, c)
db := c.Conn(ctx, t.L(), 1)
defer db.Close()
if opts.EnableCircuitBreakers {
_, err := db.Exec(`SET CLUSTER SETTING kv.replica_circuit_breaker.slow_replication_threshold = '15s'`)
require.NoError(t, err)
}

if t.SkipInit() {
return
Expand Down Expand Up @@ -508,10 +499,9 @@ func registerTPCC(r registry.Registry) {
headroomWarehouses := int(float64(maxWarehouses) * 0.7)
t.L().Printf("computed headroom warehouses of %d\n", headroomWarehouses)
runTPCC(ctx, t, c, tpccOptions{
Warehouses: headroomWarehouses,
Duration: 120 * time.Minute,
SetupType: usingImport,
EnableCircuitBreakers: true,
Warehouses: headroomWarehouses,
Duration: 120 * time.Minute,
SetupType: usingImport,
})
},
})
Expand Down
5 changes: 5 additions & 0 deletions pkg/sql/logictest/parallel_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -218,6 +218,11 @@ func (t *parallelTest) setup(ctx context.Context, spec *parTestSpec) {
r0.Exec(t, `UPDATE system.zones SET config = $2 WHERE id = $1`, objID, buf)
}

// Disable the circuit breakers on this cluster because they can lead to
// rare test flakes since the machine is likely to be overloaded when
// running TestParallel.
r0.Exec(t, `SET CLUSTER SETTING kv.replica_circuit_breaker.slow_replication_threshold = '0s'`)

if testing.Verbose() || log.V(1) {
log.Infof(t.ctx, "Creating database")
}
Expand Down

0 comments on commit 7b3f394

Please sign in to comment.