Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: prevent reproposals of applied commands #40505

Merged
merged 1 commit into from
Sep 5, 2019

Conversation

ajwerner
Copy link
Contributor

@ajwerner ajwerner commented Sep 5, 2019

TestHighestMaxLeaseIndexReproposalFinishesCommand rotted when #39425 was
merged. Prior to that change there was an invariant that if a command was
reproposed at a higher MaxLeaseIndex then the client would only be acknowledged
by a command which applied at that higher MaxLeaseIndex. That change also
worked to simplify context lifecycle management for replicatedCmd's by creating
individual child spans for each application. This was not a complete solution
however because all of the commands derived from the same proposal share a
context when used for reproposals. As a consequence, commands which are
reproposed and are at a higher MaxLeaseIndex than an already applied command
would use a context which carried a tracing span which might already be
finished.

Several approaches were explored to fix this issue. The one chosen here seems
to be the least invasive.

The rotten test has been simplified to cover the now problematic case. The
enabling mechanism for the testing is a hammer of a TestingKnob which will
always refresh unconditionally all pending proposals in the proposals map
at the end of a raft ready iteration. The new test fails reliably under
stress in ~10s of iterations and <5s before making the change to delete
proposals after they've been applied.

An alternative approach would have been to partially revert #39425 and
ensure that only commands which carry the highest MaxLeaseIndex for a proposal
may be locally applied. If it's deemed cleaner and simpler then we can go with
it. This approach prevents some reproposals and allows clients of commands
which will fail due to non-equivalent lease changes to be acknowledged sooner
of their need to retry.

Fixes #40478

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 4 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)


pkg/storage/replica_application_state_machine.go, line 906 at r1 (raw file):

		// If any reproposals at a higher MaxLeaseIndex exist we know that they will
		// never successfully apply, remove them from the map to avoid future
		// reproposals.

Make a note of why we don't need to do this unconditionally. This will likely reference shouldRemove in replicaDecoder.retrieveLocalProposals.


pkg/storage/replica_test.go, line 11584 at r1 (raw file):

// The test does the following things:
//
//  * Propose cmd at MaxLeaseIndex '0

'0?


pkg/storage/replica_test.go, line 11589 at r1 (raw file):

//    reproposal at a higher MaxLeaseIndex.
//  * Simultaneously update the lease sequence number on the replica so all
//    future commands will fail.

With what kind of error?

TestHighestMaxLeaseIndexReproposalFinishesCommand rotted when cockroachdb#39425 was
merged. Prior to that change there was an invariant that if a command was
reproposed at a higher MaxLeaseIndex then the client would only be acknowledged
by a command which applied at that higher MaxLeaseIndex. That change also
worked to simplify context lifecycle management for replicatedCmd's by creating
individual child spans for each application. This was not a complete solution
however because all of the commands derived from the same proposal share a
context when used for reproposals. As a consequence, commands which are
reproposed and are at a higher MaxLeaseIndex than an already applied command
would use a context which carried a tracing span which might already be
finished.

Several approaches were explored to fix this issue. The one chosen here seems
to be the least invasive.

The rotten test has been simplified to cover the now problematic case. The
enabling mechanism for the testing is a hammer of a TestingKnob which will
always refresh unconditionally all pending proposals in the proposals map
at the end of a raft ready iteration. The new test fails reliably under
stress in ~10s of iterations and <5s before making the change to delete
proposals after they've been applied.

An alternative approach would have been to partially revert cockroachdb#39425 and
ensure that only commands which carry the highest MaxLeaseIndex for a proposal
may be locally applied. If it's deemed cleaner and simpler then we can go with
it. This approach prevents some reproposals and allows clients of commands
which will fail due to non-equivalent lease changes to be acknowledged sooner
of their need to retry.

Fixes cockroachdb#40478

Release note: None
@ajwerner ajwerner force-pushed the ajwerner/fix-reuse-after-finish branch from 49836b2 to 0e81674 Compare September 5, 2019 15:21
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r=nvanbenschoten

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)

craig bot pushed a commit that referenced this pull request Sep 5, 2019
40505: storage: prevent reproposals of applied commands r=nvanbenschoten a=ajwerner

TestHighestMaxLeaseIndexReproposalFinishesCommand rotted when #39425 was
merged. Prior to that change there was an invariant that if a command was
reproposed at a higher MaxLeaseIndex then the client would only be acknowledged
by a command which applied at that higher MaxLeaseIndex. That change also
worked to simplify context lifecycle management for replicatedCmd's by creating
individual child spans for each application. This was not a complete solution
however because all of the commands derived from the same proposal share a
context when used for reproposals. As a consequence, commands which are
reproposed and are at a higher MaxLeaseIndex than an already applied command
would use a context which carried a tracing span which might already be
finished.

Several approaches were explored to fix this issue. The one chosen here seems
to be the least invasive.

The rotten test has been simplified to cover the now problematic case. The
enabling mechanism for the testing is a hammer of a TestingKnob which will
always refresh unconditionally all pending proposals in the proposals map
at the end of a raft ready iteration. The new test fails reliably under
stress in ~10s of iterations and <5s before making the change to delete
proposals after they've been applied.

An alternative approach would have been to partially revert #39425 and
ensure that only commands which carry the highest MaxLeaseIndex for a proposal
may be locally applied. If it's deemed cleaner and simpler then we can go with
it. This approach prevents some reproposals and allows clients of commands
which will fail due to non-equivalent lease changes to be acknowledged sooner
of their need to retry.

Fixes #40478

Release note: None

Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Sep 5, 2019

Build succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage: panic due to tracing use after finish
3 participants