kvserver: use background context when applying snapshots #73484

erikgrinaker · 2021-12-05T13:13:46Z

Following a3fd4fb, context errors are now propagated back up the
stack when receiving snapshots. However, this triggered a
maybeFatalOnRaftReadyErr assertion which crashed the node.

handleRaftReadyRaftMuLocked (which is called directly when applying
snapshots) does not appear prepared to safely handle arbitrary context
cancellation, as it is typically called with the Raft scheduler's
background context. Furthermore, by the time we call it we have already
received the entire snapshot, so it does not seem useful to abort
snapshot application just because the caller goes away.

This patch therefore uses a new background context for applying
snapshots, disconnected from the client's context, once the entire
snapshot has been received.

Resolves #73371.
Resolves #73469.
Resolves #73482.

Release note: None

cockroach-teamcity · 2021-12-05T13:13:53Z

This change is

nvanbenschoten

Thanks for looking into this Erik!

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @tbg)

pkg/kv/kvserver/store_raft.go, line 341 at r1 (raw file):

		var err error
		stats, expl, err = r.handleRaftReadyRaftMuLocked(ctx, inSnap)
		if errors.IsAny(err, context.Canceled, context.DeadlineExceeded) {

This is an interesting approach. It seems like this would work, but it also begs the question of what state the replica is left in if one of these errors is returned. Can the range recover from a partial snapshot application? Are the state transitions in handleRaftReadyRaftMuLocked all atomic and independent?

Given the location that we currently observe context cancellation within handleRaftReadyRaftMuLocked, things should be fine today. However, I don't know if it's worth exposing ourselves to future bugs that could be masked or obfuscated by ignoring a context cancellation here. For instance, if we began detecting context cancellation between appends to a replica's durable log and updates to its in-memory view of the log (e.g. r.mu.lastIndex), we could run into very subtle coherence bugs.

The alternative to this approach would be being honest about handleRaftReadyRaftMuLocked not being built with cancellation tolerance in mind. All other callers of the state machine loop use the non-cancellable raft scheduler context. We don't go through the Raft scheduler during snapshots as a convenience, but maybe we should still disconnect the context used for this call from the context of the snapshot RPC stream.

tbg

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @nvanbenschoten)

pkg/kv/kvserver/store_raft.go, line 341 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This is an interesting approach. It seems like this would work, but it also begs the question of what state the replica is left in if one of these errors is returned. Can the range recover from a partial snapshot application? Are the state transitions in handleRaftReadyRaftMuLocked all atomic and independent?

Given the location that we currently observe context cancellation within handleRaftReadyRaftMuLocked, things should be fine today. However, I don't know if it's worth exposing ourselves to future bugs that could be masked or obfuscated by ignoring a context cancellation here. For instance, if we began detecting context cancellation between appends to a replica's durable log and updates to its in-memory view of the log (e.g. r.mu.lastIndex), we could run into very subtle coherence bugs.

The alternative to this approach would be being honest about handleRaftReadyRaftMuLocked not being built with cancellation tolerance in mind. All other callers of the state machine loop use the non-cancellable raft scheduler context. We don't go through the Raft scheduler during snapshots as a convenience, but maybe we should still disconnect the context used for this call from the context of the snapshot RPC stream.

Agreed with Nathan's reservations, we need to be explicit about which context cancellations we make graceful or we make it harder to keep the system correct. It also seems more correct to detach the snapshot application's ctx from the caller; whether the caller goes away should have no say in whether the snapshot is going to be fully applied.

We may want to add a sanity check in handleRaftReadyRaftMuLocked that ctx.Done() is nil and when it isn't, return an error (which will in turn shut down the server, so this is in effect an assertion), to prevent similar bugs. (The alternative, making the ctx uncancelable, is unidiomatic, I went back and forth over this with Andrei a while back, don't remember the PR right now but can find it if there's interest).

erikgrinaker

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @nvanbenschoten)

pkg/kv/kvserver/store_raft.go, line 341 at r1 (raw file):
Yeah, it's clearly better to use a background context when applying the snapshot. I somehow missed that we'd received the entire snapshot by the time we called into this, and figured we'd already have to be tolerant to clients going away -- don't know why I was thinking that, since holding onto raftMu while streaming would clearly be horrible. And as @tbg points out, once we've received the snapshot there's no point in abandoning it just because the client happens to go away. Updated the PR.

We may want to add a sanity check in handleRaftReadyRaftMuLocked that ctx.Done() is nil and when it isn't, return an error (which will in turn shut down the server, so this is in effect an assertion), to prevent similar bugs. (The alternative, making the ctx uncancelable, is unidiomatic, I went back and forth over this with Andrei a while back, don't remember the PR right now but can find it if there's interest).

Added on a commit for this. @knz has mentioned before that we may want to somehow mark contexts that are tied to clients, so that we can assert against them in code paths like these, but I can't remember where we discussed this.

knz · 2021-12-06T12:04:52Z

we may want to somehow mark contexts that are tied to clients

can you remind me the context of the convo? there's multiple things I had in mind and I'm not sure which one you're referring to.

erikgrinaker · 2021-12-06T12:09:00Z

we may want to somehow mark contexts that are tied to clients

can you remind me the context of the convo? there's multiple things I had in mind and I'm not sure which one you're referring to.

I think it was that we kicked off async cleanup tasks that inherited the client's context, possibly txn record cleanup (which would then never run in cases where the client disconnected in the middle of a txn).

knz · 2021-12-06T12:18:44Z

Oh yes right, we're already assuming that client context cancellation propagates.

However, today, the client context cancellation and the context cancellation that results from e.g a network disconnect, a server shutdown or other places where we use cancellation too are not distinguishable from each other - they are currently all reflected in the context interface as a ctx.Done() closure and a context.Canceled in ctx.Err().

A way forward could be to link client context cancellation to a specific error wrapper returned by ctx.Err(). This would be infra work which I don't think we've thought through yet. Would be interesting though.

tbg · 2021-12-06T12:22:44Z

A way forward could be to link client context cancellation to a specific error wrapper returned by ctx.Err(). This would be infra work which I don't think we've thought through yet. Would be interesting though.

Haven't looked at it in a while but I think that's what I wanted in #66884

tbg

Reviewed 1 of 2 files at r4, 2 of 2 files at r6, 1 of 1 files at r8, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @nvanbenschoten)

pkg/kv/kvserver/replica_raft.go, line 522 at r8 (raw file):

		// handleRaftReadyRaftMuLocked is not prepared to handle context
		// cancellation, so assert that it's never canceled.
		if err := ctx.Err(); err != nil {

I thought we would check ctx.Done() == nil, i.e. check that the context cannot be canceled, which is an always-on dynamic check. Any reason you didn't do that?

erikgrinaker

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)

pkg/kv/kvserver/replica_raft.go, line 522 at r8 (raw file):
Ah, wasn't aware of that trick -- thought you meant doing a non-blocking channel read. The docs only say that it may be nil though:

Done may return nil if this context can never be canceled.

Sure we want to rely on this?

tbg

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @nvanbenschoten)

pkg/kv/kvserver/replica_raft.go, line 522 at r8 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Ah, wasn't aware of that trick -- thought you meant doing a non-blocking channel read. The docs only say that it may be nil though:

Done may return nil if this context can never be canceled.

Sure we want to rely on this?

Hm, that's mildly awkward, but it's definitely true in practice. I think this assertion isn't super useful if it only catches the case in which the ctx is actually canceled, so given that we know exactly who the callers here are and that they don't go through too many custom contexts (grpc and our own), I would still employ the nil check.

erikgrinaker

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)

pkg/kv/kvserver/replica_raft.go, line 522 at r8 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Hm, that's mildly awkward, but it's definitely true in practice. I think this assertion isn't super useful if it only catches the case in which the ctx is actually canceled, so given that we know exactly who the callers here are and that they don't go through too many custom contexts (grpc and our own), I would still employ the nil check.

Sure, probably fine -- done.

erikgrinaker

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @nvanbenschoten, and @tbg)

pkg/kv/kvserver/replica_raft.go, line 522 at r8 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Sure, probably fine -- done.

Turns out this blew up, because we're actually using a cancellable context for the Raft scheduler. It's rooted in the CLI context from here:

cockroach/pkg/cli/start.go

Lines 407 to 408 in c3e8d85

    
           ctx, cancel := context.WithCancel(context.Background()) 
        
           defer cancel()

I suppose we'll need to set up a new background context in raftScheduler.Start() and populate it with necessary info from the passed context? Would've been nice to have @andreimatei's contextutil.WithoutCancel() from #66387 here.

erikgrinaker

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @nvanbenschoten, and @tbg)

pkg/kv/kvserver/replica_raft.go, line 522 at r8 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Turns out this blew up, because we're actually using a cancellable context for the Raft scheduler. It's rooted in the CLI context from here:

cockroach/pkg/cli/start.go

Lines 407 to 408 in c3e8d85

ctx, cancel := context.WithCancel(context.Background())

defer cancel()

I suppose we'll need to set up a new background context in raftScheduler.Start() and populate it with necessary info from the passed context? Would've been nice to have @andreimatei's contextutil.WithoutCancel() from #66387 here.

In the interest of getting the tests fixed, I suggest we split off this assertion and the context handling fallout into a separate PR and deal with it there. If I can get a ✅ on the first commit I'll move things around and get this merged.

tbg · 2021-12-07T13:23:07Z

LGTM for first commit.

Following a3fd4fb, context errors are now propagated back up the stack when receiving snapshots. However, this triggered a `maybeFatalOnRaftReadyErr` assertion which crashed the node. `handleRaftReadyRaftMuLocked` (which is called directly when applying snapshots) does not appear prepared to safely handle arbitrary context cancellation, as it is typically called with the Raft scheduler's background context. Furthermore, by the time we call it we have already received the entire snapshot, so it does not seem useful to abort snapshot application just because the caller goes away. This patch therefore uses a new background context for applying snapshots, disconnected from the client's context, once the entire snapshot has been received. Release note: None

erikgrinaker

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @nvanbenschoten, and @tbg)

pkg/kv/kvserver/replica_raft.go, line 522 at r8 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

In the interest of getting the tests fixed, I suggest we split off this assertion and the context handling fallout into a separate PR and deal with it there. If I can get a ✅ on the first commit I'll move things around and get this merged.

Split out to #73554.

nvanbenschoten

Reviewed 1 of 2 files at r4, 1 of 2 files at r6, 1 of 1 files at r10, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei)

erikgrinaker · 2021-12-07T17:22:33Z

bors r=nvanbenschoten,tbg

craig · 2021-12-07T19:14:44Z

Build succeeded:

GitHub CI (Cockroach)

@andreimatei

73554: kvserver: use and assert non-cancellable Raft scheduler context r=tbg a=erikgrinaker `handleRaftReadyRaftMuLocked` is not prepared to handle context cancellation. It is typically called via the Raft scheduler, which uses a background context, but can be called via other paths as well (e.g. snapshot application). This patch adds an assertion that the given context is not cancellable, and creates a new background context for the main scheduler code path instead of using the CLI's cancellable context. Release note: None --- Split off from #73484, see previous discussion there. Turns out that this fails because the Raft scheduler context is in fact cancellable. It's rooted at the CLI context: https://github.com/cockroachdb/cockroach/blob/c3e8d8568467809c50a8eb8911fd120fe22661bb/pkg/cli/start.go#L407-L408 There's a few different options here, including: 1. Fixing `handleRaftReady` to to handle context cancellation safely. 2. Using a new background context for the Raft scheduler and populating it with necessary data from the passed context. 3. Getting @andreimatei's `contextutil.WithoutCancel()` from #66387 merged, and use it. Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>

erikgrinaker requested review from tbg and nvanbenschoten December 5, 2021 13:13

erikgrinaker requested a review from a team as a code owner December 5, 2021 13:13

erikgrinaker self-assigned this Dec 5, 2021

erikgrinaker force-pushed the raft-snapshot-context-error branch from cc765a5 to ea68173 Compare December 5, 2021 13:20

nvanbenschoten reviewed Dec 6, 2021

View reviewed changes

tbg requested a review from nvanbenschoten December 6, 2021 11:33

tbg reviewed Dec 6, 2021

View reviewed changes

erikgrinaker force-pushed the raft-snapshot-context-error branch from ea68173 to 9c8e092 Compare December 6, 2021 11:41

erikgrinaker commented Dec 6, 2021

View reviewed changes

erikgrinaker force-pushed the raft-snapshot-context-error branch from 4805333 to a829356 Compare December 6, 2021 12:05

erikgrinaker force-pushed the raft-snapshot-context-error branch from a829356 to 14c7c5a Compare December 6, 2021 12:12

erikgrinaker changed the title ~~kvserver: handle context cancellation when receiving snapshots~~ kvserver: use background context when applying snapshots Dec 6, 2021

erikgrinaker force-pushed the raft-snapshot-context-error branch from 14c7c5a to a24487a Compare December 6, 2021 13:09

tbg reviewed Dec 6, 2021

View reviewed changes

erikgrinaker commented Dec 6, 2021

View reviewed changes

nvanbenschoten mentioned this pull request Dec 6, 2021

roachtest: acceptance/gossip/peerings failed #73465

Closed

AlexTalks linked an issue Dec 6, 2021 that may be closed by this pull request

roachtest: acceptance/gossip/peerings failed #73465

Closed

tbg reviewed Dec 7, 2021

View reviewed changes

tbg mentioned this pull request Dec 7, 2021

kv/kvserver: TestReplicateQueueRebalance failed #73482

Closed

erikgrinaker force-pushed the raft-snapshot-context-error branch from a24487a to b9ec482 Compare December 7, 2021 09:29

erikgrinaker commented Dec 7, 2021

View reviewed changes

erikgrinaker mentioned this pull request Dec 7, 2021

kvserver: use and assert non-cancellable Raft scheduler context #73554

Merged

erikgrinaker force-pushed the raft-snapshot-context-error branch from b9ec482 to 541adf9 Compare December 7, 2021 13:38

erikgrinaker commented Dec 7, 2021

View reviewed changes

tbg mentioned this pull request Dec 7, 2021

kv/kvserver: TestReplicateQueueUpReplicateOddVoters failed #73469

Closed

nvanbenschoten approved these changes Dec 7, 2021

View reviewed changes

craig bot merged commit 67662d8 into cockroachdb:master Dec 7, 2021

erikgrinaker deleted the raft-snapshot-context-error branch December 8, 2021 09:26

nvanbenschoten mentioned this pull request Dec 13, 2021

sql/logictest: TestParallel failed #73483

Closed

nvanbenschoten mentioned this pull request Jan 28, 2022

roachtest: clearrange/zfs/checks=true failed #68303

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: use background context when applying snapshots #73484

kvserver: use background context when applying snapshots #73484

erikgrinaker commented Dec 5, 2021 •

edited

Loading

cockroach-teamcity commented Dec 5, 2021

nvanbenschoten left a comment

tbg left a comment

erikgrinaker left a comment

knz commented Dec 6, 2021

erikgrinaker commented Dec 6, 2021

knz commented Dec 6, 2021

tbg commented Dec 6, 2021

tbg left a comment

erikgrinaker left a comment

tbg left a comment

erikgrinaker left a comment

erikgrinaker left a comment

erikgrinaker left a comment

tbg commented Dec 7, 2021

erikgrinaker left a comment

nvanbenschoten left a comment

erikgrinaker commented Dec 7, 2021

craig bot commented Dec 7, 2021

	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

kvserver: use background context when applying snapshots #73484

kvserver: use background context when applying snapshots #73484

Conversation

erikgrinaker commented Dec 5, 2021 • edited Loading

cockroach-teamcity commented Dec 5, 2021

nvanbenschoten left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

erikgrinaker left a comment

Choose a reason for hiding this comment

knz commented Dec 6, 2021

erikgrinaker commented Dec 6, 2021

knz commented Dec 6, 2021

tbg commented Dec 6, 2021

tbg left a comment

Choose a reason for hiding this comment

erikgrinaker left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

erikgrinaker left a comment

Choose a reason for hiding this comment

erikgrinaker left a comment

Choose a reason for hiding this comment

erikgrinaker left a comment

Choose a reason for hiding this comment

tbg commented Dec 7, 2021

erikgrinaker left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

erikgrinaker commented Dec 7, 2021

craig bot commented Dec 7, 2021

erikgrinaker commented Dec 5, 2021 •

edited

Loading