Skip to content

Commit

Permalink
roachtest: enable txn heartbeat loops for 1PC txns in kv/contention/n…
Browse files Browse the repository at this point in the history
…odes=4

I noticed when debugging issues in cockroachdb#45482 that unhandled deadlocks occasionally
resolved themselves because txns would eventually time out. This was because
we don't start the txn heartbeat loop for 1PC txns. In this kind of test, we
want any unhandled deadlocks to be as loud as possible, so just like we set
a very long txn expiration, we also enable the txn heartbeat loop for all
txns, even those that we expect will be 1PC.

This commit also drops the kv.lock_table.deadlock_detection_push_delay down
for the test, since it's already touching this code.
  • Loading branch information
nvanbenschoten committed Mar 3, 2020
1 parent 860c137 commit 898383f
Show file tree
Hide file tree
Showing 2 changed files with 31 additions and 8 deletions.
24 changes: 21 additions & 3 deletions pkg/cmd/roachtest/kv.go
Original file line number Diff line number Diff line change
Expand Up @@ -202,13 +202,31 @@ func registerKVContention(r *testRegistry) {
// If requests ever get stuck on a transaction that was abandoned
// then it will take 2m for them to get unstuck, at which point the
// QPS threshold check in the test is likely to fail.
args := startArgs("--env=COCKROACH_TXN_LIVENESS_HEARTBEAT_MULTIPLIER=120")
//
// Additionally, ensure that even transactions that issue a 1PC
// batch begin heartbeating. This ensures that if they end up in
// part of a dependency cycle, they can never be expire without
// being actively aborted.
args := startArgs(
"--env=COCKROACH_TXN_LIVENESS_HEARTBEAT_MULTIPLIER=120 COCKROACH_TXN_HEARTBEAT_DURING_1PC=true",
)
c.Start(ctx, t, args, c.Range(1, nodes))

conn := c.Conn(ctx, 1)
// Enable request tracing, which is a good tool for understanding
// how different transactions are interacting.
c.Run(ctx, c.Node(1),
`./cockroach sql --insecure -e "SET CLUSTER SETTING trace.debug.enable = true"`)
if _, err := conn.Exec(`
SET CLUSTER SETTING trace.debug.enable = true;
`); err != nil {
t.Fatal(err)
}
// Drop the deadlock detection delay because the test creates a
// large number transaction deadlocks.
if _, err := conn.Exec(`
SET CLUSTER SETTING kv.lock_table.deadlock_detection_push_delay = '5ms'
`); err != nil && !strings.Contains(err.Error(), "unknown cluster setting") {
t.Fatal(err)
}

t.Status("running workload")
m := newMonitor(ctx, c, c.Range(1, nodes))
Expand Down
15 changes: 10 additions & 5 deletions pkg/kv/txn_interceptor_heartbeater.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,20 @@ import (
"time"

"github.com/cockroachdb/cockroach/pkg/roachpb"
"github.com/cockroachdb/cockroach/pkg/util/envutil"
"github.com/cockroachdb/cockroach/pkg/util/hlc"
"github.com/cockroachdb/cockroach/pkg/util/log"
"github.com/cockroachdb/cockroach/pkg/util/stop"
opentracing "github.com/opentracing/opentracing-go"
)

// txnHeartbeatDuring1PC defines whether the txnHeartbeater should launch a
// heartbeat loop for 1PC transactions. The value defaults to false even though
// 1PC transactions leave intents around on retriable errors if the batch has
// been split between ranges and may be pushed when in lock wait-queues because
// we consider that unlikely enough so we prefer to not pay for a goroutine.
var txnHeartbeatFor1PC = envutil.EnvOrDefaultBool("COCKROACH_TXN_HEARTBEAT_DURING_1PC", false)

// txnHeartbeater is a txnInterceptor in charge of a transaction's heartbeat
// loop. Transaction coordinators heartbeat their transaction record
// periodically to indicate the liveness of their transaction. Other actors like
Expand Down Expand Up @@ -151,12 +159,9 @@ func (h *txnHeartbeater) SendLocked(
}

// Start the heartbeat loop if it has not already started.
//
// Note that we don't do it for 1PC txns: they only leave intents around on
// retriable errors if the batch has been split between ranges. We consider
// that unlikely enough so we prefer to not pay for a goroutine.
if !h.mu.loopStarted {
if _, haveEndTxn := ba.GetArg(roachpb.EndTxn); !haveEndTxn {
_, haveEndTxn := ba.GetArg(roachpb.EndTxn)
if !haveEndTxn || txnHeartbeatFor1PC {
if err := h.startHeartbeatLoopLocked(ctx); err != nil {
return nil, roachpb.NewError(err)
}
Expand Down

0 comments on commit 898383f

Please sign in to comment.