Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: use non-cancelable contexts in raft application #75656

Closed
tbg opened this issue Jan 28, 2022 · 3 comments · Fixed by #78174
Closed

kvserver: use non-cancelable contexts in raft application #75656

tbg opened this issue Jan 28, 2022 · 3 comments · Fixed by #78174
Assignees
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@tbg
Copy link
Member

tbg commented Jan 28, 2022

Describe the problem

As of d064059, raft application on the proposal happens under the proposer's context, which may well be canceled before or mid-way through the application. However, raft application should not care about the cancellation, and we just diagnosed a bug where it did.

Also, we had problems with cancellation on the raft goroutine in the past and check against it, but evidently not well enough yet.

Expected behavior

The raft apply loop should use a ctx that is not cancelable. It makes sense to derive certain parts from the proposer - such as a trace span - but not the cancellation policy. That is, in

// createTracingSpans creates and assigns a new tracing span for each decoded
// command. If a command was proposed locally, it will be given a tracing span
// that follows from its proposal's span.
func (d *replicaDecoder) createTracingSpans(ctx context.Context) {
const opName = "raft application"
var it replicatedCmdBufSlice
for it.init(&d.cmdBuf); it.Valid(); it.Next() {
cmd := it.cur()
if cmd.IsLocal() {
cmd.ctx, cmd.sp = tracing.ChildSpan(cmd.proposal.ctx, opName)
} else if cmd.raftCmd.TraceData != nil {
// The proposal isn't local, and trace data is available. Extract
// the remote span and start a server-side span that follows from it.
spanMeta, err := d.r.AmbientContext.Tracer.ExtractMetaFrom(tracing.MapCarrier{
Map: cmd.raftCmd.TraceData,
})
if err != nil {
log.Errorf(ctx, "unable to extract trace data from raft command: %s", err)
} else {
cmd.ctx, cmd.sp = d.r.AmbientContext.Tracer.StartSpanCtx(
ctx,
opName,
// NB: Nobody is collecting the recording of this span; we have no
// mechanism for it.
tracing.WithRemoteParent(spanMeta),
tracing.WithFollowsFrom(),
)
}
} else {
cmd.ctx, cmd.sp = tracing.ChildSpan(ctx, opName)
}
}
}

we should not use a child context of cmd.proposal.ctx but create one derived from ctx (the worker ctx) or context.Background().

Additional data / screenshots

Should also write a test that uses a TestingPostApplyHook to ensure that the ctx is not cancelable, or at least that the cancellation doesn't carry over to the child. (Or more simply, that a value stashed in the proposer context is not visible at apply time).

@erikgrinaker didn't we put in some invariant check about this sometime recently?

Jira issue: CRDB-12769

@tbg tbg added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-replication Relating to Raft, consensus, and coordination. labels Jan 28, 2022
@tbg tbg self-assigned this Jan 28, 2022
@erikgrinaker
Copy link
Contributor

Yeah, in handleRaftReadyRaftMuLocked:

func (r *Replica) handleRaftReadyRaftMuLocked(
ctx context.Context, inSnap IncomingSnapshot,
) (handleRaftReadyStats, string, error) {
// handleRaftReadyRaftMuLocked is not prepared to handle context cancellation,
// so assert that it's given a non-cancellable context.
if ctx.Done() != nil {
return handleRaftReadyStats{}, "", errors.AssertionFailedf(
"handleRaftReadyRaftMuLocked cannot be called with a cancellable context")
}

We should either do something similar for command application, or derive a non-cancellable context.

tbg added a commit to tbg/cockroach that referenced this issue Feb 22, 2022
We've seen in the events leading up to cockroachdb#75448 that a build-up of
consistency check computations on a node can severely impact node
performance. This commit attempts to address the main source of
that, while re-working the code for easier maintainability.

The way the consistency checker works is by replicating a command through
Raft that, on each Replica, triggers an async checksum computations
the results of which the caller collects via `CollectChecksum` requests
addressed to each `Replica`.

If for any reason, the caller does *not* wait to collect the checksums
but instead moves on to run another consistency check (perhaps on
another Range), these inflight computations can build up over time.

This was the main issue in cockroachdb#75448; we were accidentally canceling the
context on the leaseholder "right away", failing the consistency check
(but leaving it running on all other replicas), and moving on to the
next Range.
As a result, some (but with spread out leaseholders, ultimately all)
Replicas ended up with dozens of consistency check computations,
starving I/O and CPU.  We "addressed" this by avoiding this errant ctx
cancellation (cockroachdb#75448 but longer-term cockroachdb#75656), but this isn't a holistic
fix yet.

In this commit, we make three main changes:

- give the inflight consistency check computations a clean API, which
  makes it much easier to understand "how it works".
- when returning from CollectChecksum (either on success or error,
  notably including context cancellation), cancel the corresponding
  consistency check. This solves the problem, *assuming* that
  CollectChecksum is reliably issued to each Replica.
- reliably issue CollectChecksum to each Replica on which a computation
  may have been triggered. When the caller's context is canceled, still
  do the call with a one-second-timeout one-off Context which should be
  good enough to make it to the Replica and short-circuit the call.

Release note: None
@tbg
Copy link
Member Author

tbg commented Mar 1, 2022

Wrote a bit of code here, putting this here so I remember: https://github.com/cockroachdb/cockroach/compare/master...tbg:no-downstream-ctx?expand=1

@tbg
Copy link
Member Author

tbg commented Mar 21, 2022

I updated the above prototype a little bit (removing the ctx altogether), but I don't think it's what we want for stability. Instead, I'll send a surgical PR that makes sure we don't set the caller's ctx into the proposal.

tbg added a commit to tbg/cockroach that referenced this issue Mar 22, 2022
Previously, for local proposals, we were threading the caller's context
down into Raft. This has caused a few bugs related to context
cancellation sensitivity - the execution of a raft command should not be
impacted by whether or not the client's context has already finished.

This commit derives the proposal context from the Replica's annotated
context, using a tracing span derived from the client context. That
way, the derived context is not cancelable, but contains the same
tracing Span (thus providing similar utility) as before. Notably,
the trace span should also already contain the log tags present
in the client's context.

The tracing has coverage through (at least) a few tests that make
assertions about trace events from below-raft showing up in a trace
collected by the client, which I verified by omitting the Span:

```
TestRaftSSTableSideloadingProposal
TestReplicaAbandonProposal
TestReplicaRetryRaftProposal
TestReplicaBurstPendingCommandsAndRepropose
TestReplicaReproposalWithNewLeaseIndexError
TestFailureToProcessCommandClearsLocalResult
TestDBAddSSTable
```

See cockroachdb#75656.

Release note: None
@craig craig bot closed this as completed in e0a4c51 Mar 22, 2022
blathers-crl bot pushed a commit that referenced this issue Mar 22, 2022
Previously, for local proposals, we were threading the caller's context
down into Raft. This has caused a few bugs related to context
cancellation sensitivity - the execution of a raft command should not be
impacted by whether or not the client's context has already finished.

This commit derives the proposal context from the Replica's annotated
context, using a tracing span derived from the client context. That
way, the derived context is not cancelable, but contains the same
tracing Span (thus providing similar utility) as before. Notably,
the trace span should also already contain the log tags present
in the client's context.

The tracing has coverage through (at least) a few tests that make
assertions about trace events from below-raft showing up in a trace
collected by the client, which I verified by omitting the Span:

```
TestRaftSSTableSideloadingProposal
TestReplicaAbandonProposal
TestReplicaRetryRaftProposal
TestReplicaBurstPendingCommandsAndRepropose
TestReplicaReproposalWithNewLeaseIndexError
TestFailureToProcessCommandClearsLocalResult
TestDBAddSSTable
```

See #75656.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants