-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: use background context when applying snapshots #73484
kvserver: use background context when applying snapshots #73484
Conversation
cc765a5
to
ea68173
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this Erik!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @tbg)
pkg/kv/kvserver/store_raft.go, line 341 at r1 (raw file):
var err error stats, expl, err = r.handleRaftReadyRaftMuLocked(ctx, inSnap) if errors.IsAny(err, context.Canceled, context.DeadlineExceeded) {
This is an interesting approach. It seems like this would work, but it also begs the question of what state the replica is left in if one of these errors is returned. Can the range recover from a partial snapshot application? Are the state transitions in handleRaftReadyRaftMuLocked
all atomic and independent?
Given the location that we currently observe context cancellation within handleRaftReadyRaftMuLocked
, things should be fine today. However, I don't know if it's worth exposing ourselves to future bugs that could be masked or obfuscated by ignoring a context cancellation here. For instance, if we began detecting context cancellation between appends to a replica's durable log and updates to its in-memory view of the log (e.g. r.mu.lastIndex
), we could run into very subtle coherence bugs.
The alternative to this approach would be being honest about handleRaftReadyRaftMuLocked
not being built with cancellation tolerance in mind. All other callers of the state machine loop use the non-cancellable raft scheduler context. We don't go through the Raft scheduler during snapshots as a convenience, but maybe we should still disconnect the context used for this call from the context of the snapshot RPC stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @nvanbenschoten)
pkg/kv/kvserver/store_raft.go, line 341 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This is an interesting approach. It seems like this would work, but it also begs the question of what state the replica is left in if one of these errors is returned. Can the range recover from a partial snapshot application? Are the state transitions in
handleRaftReadyRaftMuLocked
all atomic and independent?Given the location that we currently observe context cancellation within
handleRaftReadyRaftMuLocked
, things should be fine today. However, I don't know if it's worth exposing ourselves to future bugs that could be masked or obfuscated by ignoring a context cancellation here. For instance, if we began detecting context cancellation between appends to a replica's durable log and updates to its in-memory view of the log (e.g.r.mu.lastIndex
), we could run into very subtle coherence bugs.The alternative to this approach would be being honest about
handleRaftReadyRaftMuLocked
not being built with cancellation tolerance in mind. All other callers of the state machine loop use the non-cancellable raft scheduler context. We don't go through the Raft scheduler during snapshots as a convenience, but maybe we should still disconnect the context used for this call from the context of the snapshot RPC stream.
Agreed with Nathan's reservations, we need to be explicit about which context cancellations we make graceful or we make it harder to keep the system correct. It also seems more correct to detach the snapshot application's ctx from the caller; whether the caller goes away should have no say in whether the snapshot is going to be fully applied.
We may want to add a sanity check in handleRaftReadyRaftMuLocked
that ctx.Done()
is nil and when it isn't, return an error (which will in turn shut down the server, so this is in effect an assertion), to prevent similar bugs. (The alternative, making the ctx uncancelable, is unidiomatic, I went back and forth over this with Andrei a while back, don't remember the PR right now but can find it if there's interest).
ea68173
to
9c8e092
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @nvanbenschoten)
pkg/kv/kvserver/store_raft.go, line 341 at r1 (raw file):
Yeah, it's clearly better to use a background context when applying the snapshot. I somehow missed that we'd received the entire snapshot by the time we called into this, and figured we'd already have to be tolerant to clients going away -- don't know why I was thinking that, since holding onto raftMu
while streaming would clearly be horrible. And as @tbg points out, once we've received the snapshot there's no point in abandoning it just because the client happens to go away. Updated the PR.
We may want to add a sanity check in handleRaftReadyRaftMuLocked that ctx.Done() is nil and when it isn't, return an error (which will in turn shut down the server, so this is in effect an assertion), to prevent similar bugs. (The alternative, making the ctx uncancelable, is unidiomatic, I went back and forth over this with Andrei a while back, don't remember the PR right now but can find it if there's interest).
Added on a commit for this. @knz has mentioned before that we may want to somehow mark contexts that are tied to clients, so that we can assert against them in code paths like these, but I can't remember where we discussed this.
can you remind me the context of the convo? there's multiple things I had in mind and I'm not sure which one you're referring to. |
4805333
to
a829356
Compare
I think it was that we kicked off async cleanup tasks that inherited the client's context, possibly txn record cleanup (which would then never run in cases where the client disconnected in the middle of a txn). |
a829356
to
14c7c5a
Compare
Oh yes right, we're already assuming that client context cancellation propagates. However, today, the client context cancellation and the context cancellation that results from e.g a network disconnect, a server shutdown or other places where we use cancellation too are not distinguishable from each other - they are currently all reflected in the context interface as a A way forward could be to link client context cancellation to a specific error wrapper returned by |
Haven't looked at it in a while but I think that's what I wanted in #66884 |
14c7c5a
to
a24487a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r4, 2 of 2 files at r6, 1 of 1 files at r8, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @nvanbenschoten)
pkg/kv/kvserver/replica_raft.go, line 522 at r8 (raw file):
// handleRaftReadyRaftMuLocked is not prepared to handle context // cancellation, so assert that it's never canceled. if err := ctx.Err(); err != nil {
I thought we would check ctx.Done() == nil
, i.e. check that the context cannot be canceled, which is an always-on dynamic check. Any reason you didn't do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)
pkg/kv/kvserver/replica_raft.go, line 522 at r8 (raw file):
Ah, wasn't aware of that trick -- thought you meant doing a non-blocking channel read. The docs only say that it may be nil though:
Done may return nil if this context can never be canceled.
Sure we want to rely on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @nvanbenschoten)
pkg/kv/kvserver/replica_raft.go, line 522 at r8 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
Ah, wasn't aware of that trick -- thought you meant doing a non-blocking channel read. The docs only say that it may be nil though:
Done may return nil if this context can never be canceled.
Sure we want to rely on this?
Hm, that's mildly awkward, but it's definitely true in practice. I think this assertion isn't super useful if it only catches the case in which the ctx is actually canceled, so given that we know exactly who the callers here are and that they don't go through too many custom contexts (grpc and our own), I would still employ the nil check.
a24487a
to
b9ec482
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)
pkg/kv/kvserver/replica_raft.go, line 522 at r8 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Hm, that's mildly awkward, but it's definitely true in practice. I think this assertion isn't super useful if it only catches the case in which the ctx is actually canceled, so given that we know exactly who the callers here are and that they don't go through too many custom contexts (grpc and our own), I would still employ the nil check.
Sure, probably fine -- done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @nvanbenschoten, and @tbg)
pkg/kv/kvserver/replica_raft.go, line 522 at r8 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
Sure, probably fine -- done.
Turns out this blew up, because we're actually using a cancellable context for the Raft scheduler. It's rooted in the CLI context from here:
Lines 407 to 408 in c3e8d85
ctx, cancel := context.WithCancel(context.Background()) | |
defer cancel() |
I suppose we'll need to set up a new background context in raftScheduler.Start()
and populate it with necessary info from the passed context? Would've been nice to have @andreimatei's contextutil.WithoutCancel()
from #66387 here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @nvanbenschoten, and @tbg)
pkg/kv/kvserver/replica_raft.go, line 522 at r8 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
Turns out this blew up, because we're actually using a cancellable context for the Raft scheduler. It's rooted in the CLI context from here:
Lines 407 to 408 in c3e8d85
ctx, cancel := context.WithCancel(context.Background()) defer cancel() I suppose we'll need to set up a new background context in
raftScheduler.Start()
and populate it with necessary info from the passed context? Would've been nice to have @andreimatei'scontextutil.WithoutCancel()
from #66387 here.
In the interest of getting the tests fixed, I suggest we split off this assertion and the context handling fallout into a separate PR and deal with it there. If I can get a ✅ on the first commit I'll move things around and get this merged.
LGTM for first commit. |
Following a3fd4fb, context errors are now propagated back up the stack when receiving snapshots. However, this triggered a `maybeFatalOnRaftReadyErr` assertion which crashed the node. `handleRaftReadyRaftMuLocked` (which is called directly when applying snapshots) does not appear prepared to safely handle arbitrary context cancellation, as it is typically called with the Raft scheduler's background context. Furthermore, by the time we call it we have already received the entire snapshot, so it does not seem useful to abort snapshot application just because the caller goes away. This patch therefore uses a new background context for applying snapshots, disconnected from the client's context, once the entire snapshot has been received. Release note: None
b9ec482
to
541adf9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @nvanbenschoten, and @tbg)
pkg/kv/kvserver/replica_raft.go, line 522 at r8 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
In the interest of getting the tests fixed, I suggest we split off this assertion and the context handling fallout into a separate PR and deal with it there. If I can get a ✅ on the first commit I'll move things around and get this merged.
Split out to #73554.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r4, 1 of 2 files at r6, 1 of 1 files at r10, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei)
bors r=nvanbenschoten,tbg |
Build succeeded: |
73554: kvserver: use and assert non-cancellable Raft scheduler context r=tbg a=erikgrinaker `handleRaftReadyRaftMuLocked` is not prepared to handle context cancellation. It is typically called via the Raft scheduler, which uses a background context, but can be called via other paths as well (e.g. snapshot application). This patch adds an assertion that the given context is not cancellable, and creates a new background context for the main scheduler code path instead of using the CLI's cancellable context. Release note: None --- Split off from #73484, see previous discussion there. Turns out that this fails because the Raft scheduler context is in fact cancellable. It's rooted at the CLI context: https://github.com/cockroachdb/cockroach/blob/c3e8d8568467809c50a8eb8911fd120fe22661bb/pkg/cli/start.go#L407-L408 There's a few different options here, including: 1. Fixing `handleRaftReady` to to handle context cancellation safely. 2. Using a new background context for the Raft scheduler and populating it with necessary data from the passed context. 3. Getting @andreimatei's `contextutil.WithoutCancel()` from #66387 merged, and use it. Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
Following a3fd4fb, context errors are now propagated back up the
stack when receiving snapshots. However, this triggered a
maybeFatalOnRaftReadyErr
assertion which crashed the node.handleRaftReadyRaftMuLocked
(which is called directly when applyingsnapshots) does not appear prepared to safely handle arbitrary context
cancellation, as it is typically called with the Raft scheduler's
background context. Furthermore, by the time we call it we have already
received the entire snapshot, so it does not seem useful to abort
snapshot application just because the caller goes away.
This patch therefore uses a new background context for applying
snapshots, disconnected from the client's context, once the entire
snapshot has been received.
Resolves #73371.
Resolves #73469.
Resolves #73482.
Release note: None