-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: prevent reproposals of applied commands #40505
storage: prevent reproposals of applied commands #40505
Conversation
ed9aace
to
49836b2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)
pkg/storage/replica_application_state_machine.go, line 906 at r1 (raw file):
// If any reproposals at a higher MaxLeaseIndex exist we know that they will // never successfully apply, remove them from the map to avoid future // reproposals.
Make a note of why we don't need to do this unconditionally. This will likely reference shouldRemove
in replicaDecoder.retrieveLocalProposals
.
pkg/storage/replica_test.go, line 11584 at r1 (raw file):
// The test does the following things: // // * Propose cmd at MaxLeaseIndex '0
'0
?
pkg/storage/replica_test.go, line 11589 at r1 (raw file):
// reproposal at a higher MaxLeaseIndex. // * Simultaneously update the lease sequence number on the replica so all // future commands will fail.
With what kind of error?
TestHighestMaxLeaseIndexReproposalFinishesCommand rotted when cockroachdb#39425 was merged. Prior to that change there was an invariant that if a command was reproposed at a higher MaxLeaseIndex then the client would only be acknowledged by a command which applied at that higher MaxLeaseIndex. That change also worked to simplify context lifecycle management for replicatedCmd's by creating individual child spans for each application. This was not a complete solution however because all of the commands derived from the same proposal share a context when used for reproposals. As a consequence, commands which are reproposed and are at a higher MaxLeaseIndex than an already applied command would use a context which carried a tracing span which might already be finished. Several approaches were explored to fix this issue. The one chosen here seems to be the least invasive. The rotten test has been simplified to cover the now problematic case. The enabling mechanism for the testing is a hammer of a TestingKnob which will always refresh unconditionally all pending proposals in the proposals map at the end of a raft ready iteration. The new test fails reliably under stress in ~10s of iterations and <5s before making the change to delete proposals after they've been applied. An alternative approach would have been to partially revert cockroachdb#39425 and ensure that only commands which carry the highest MaxLeaseIndex for a proposal may be locally applied. If it's deemed cleaner and simpler then we can go with it. This approach prevents some reproposals and allows clients of commands which will fail due to non-equivalent lease changes to be acknowledged sooner of their need to retry. Fixes cockroachdb#40478 Release note: None
49836b2
to
0e81674
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r=nvanbenschoten
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
40505: storage: prevent reproposals of applied commands r=nvanbenschoten a=ajwerner TestHighestMaxLeaseIndexReproposalFinishesCommand rotted when #39425 was merged. Prior to that change there was an invariant that if a command was reproposed at a higher MaxLeaseIndex then the client would only be acknowledged by a command which applied at that higher MaxLeaseIndex. That change also worked to simplify context lifecycle management for replicatedCmd's by creating individual child spans for each application. This was not a complete solution however because all of the commands derived from the same proposal share a context when used for reproposals. As a consequence, commands which are reproposed and are at a higher MaxLeaseIndex than an already applied command would use a context which carried a tracing span which might already be finished. Several approaches were explored to fix this issue. The one chosen here seems to be the least invasive. The rotten test has been simplified to cover the now problematic case. The enabling mechanism for the testing is a hammer of a TestingKnob which will always refresh unconditionally all pending proposals in the proposals map at the end of a raft ready iteration. The new test fails reliably under stress in ~10s of iterations and <5s before making the change to delete proposals after they've been applied. An alternative approach would have been to partially revert #39425 and ensure that only commands which carry the highest MaxLeaseIndex for a proposal may be locally applied. If it's deemed cleaner and simpler then we can go with it. This approach prevents some reproposals and allows clients of commands which will fail due to non-equivalent lease changes to be acknowledged sooner of their need to retry. Fixes #40478 Release note: None Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Build succeeded |
TestHighestMaxLeaseIndexReproposalFinishesCommand rotted when #39425 was
merged. Prior to that change there was an invariant that if a command was
reproposed at a higher MaxLeaseIndex then the client would only be acknowledged
by a command which applied at that higher MaxLeaseIndex. That change also
worked to simplify context lifecycle management for replicatedCmd's by creating
individual child spans for each application. This was not a complete solution
however because all of the commands derived from the same proposal share a
context when used for reproposals. As a consequence, commands which are
reproposed and are at a higher MaxLeaseIndex than an already applied command
would use a context which carried a tracing span which might already be
finished.
Several approaches were explored to fix this issue. The one chosen here seems
to be the least invasive.
The rotten test has been simplified to cover the now problematic case. The
enabling mechanism for the testing is a hammer of a TestingKnob which will
always refresh unconditionally all pending proposals in the proposals map
at the end of a raft ready iteration. The new test fails reliably under
stress in ~10s of iterations and <5s before making the change to delete
proposals after they've been applied.
An alternative approach would have been to partially revert #39425 and
ensure that only commands which carry the highest MaxLeaseIndex for a proposal
may be locally applied. If it's deemed cleaner and simpler then we can go with
it. This approach prevents some reproposals and allows clients of commands
which will fail due to non-equivalent lease changes to be acknowledged sooner
of their need to retry.
Fixes #40478
Release note: None