Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: fix nil pointer panic during replica changes #40961

Conversation

ajwerner
Copy link
Contributor

Before this PR there was a bug whereby a command attempting to move a range out
of a joint config would fail to find a RangeDescriptor for a range because it
was racing with a merge which destroyed that range. The code used to panic.

This fix is simply detecting the nil and not attempting to move out of a
joint config on a range that no longer exists.

This PR is almost exclusively testing.

Fixes #40877.

Release justification: Fixes a panic.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. I wish this stuff were easier to test.

Reviewed 3 of 3 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)


pkg/storage/client_replica_test.go, line 2311 at r1 (raw file):

	// We're going to have one goroutine trying to ADD the 4th node.
	// and another goroutine trying to move out of a joint config on both
	// sides and then

Sentence needs an end.


pkg/storage/client_replica_test.go, line 2320 at r1 (raw file):

	require.NoError(t, err)

	// Manually construct the batch because the (*DB).AdminChangeReplicas does

It does, but you need this magical hack

context.WithValue(ctx, "testing", "testing"),

@ajwerner ajwerner force-pushed the ajwerner/fix-nil-pointer-in-change-replicas-when-racing-with-merge branch from c060a7b to f50f5d8 Compare September 23, 2019 11:54
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that there's a bad case if we read a range descriptor for a different range. I added testing for that too.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)


pkg/storage/client_replica_test.go, line 2311 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Sentence needs an end.

Done.


pkg/storage/client_replica_test.go, line 2320 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

It does, but you need this magical hack

context.WithValue(ctx, "testing", "testing"),

Done.

@ajwerner ajwerner force-pushed the ajwerner/fix-nil-pointer-in-change-replicas-when-racing-with-merge branch from f50f5d8 to ffccfde Compare September 23, 2019 12:00
Before this PR there was a bug whereby a command attempting to move a range out
of a joint config would fail to find a RangeDescriptor for a range because it
was racing with a merge which destroyed that range. The code used to panic.

This fix is simply detecting the nil and not attempting to move out of a
joint config on a range that no longer exists. Another slightly less problematic
but still wrong case was if the merged range had been resplit at the same key.
In this case we might attempt to modify a different range.

This PR is almost exclusively testing.

Fixes cockroachdb#40877.

Release justification: Fixes a panic.

Release note: None
@ajwerner ajwerner force-pushed the ajwerner/fix-nil-pointer-in-change-replicas-when-racing-with-merge branch from ffccfde to 1f55bb1 Compare September 23, 2019 13:35
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)


pkg/storage/replica_command.go, line 1547 at r2 (raw file):

		// NB: We might fail to find the range if the range has been merged away
		// in which case we definitely want to fail the check below.
		if kvDesc != nil && kvDesc.RangeID == referenceDesc.RangeID && chgs.leaveJoint() {

Good catch.

@ajwerner
Copy link
Contributor Author

TFTR!

bors r+

craig bot pushed a commit that referenced this pull request Sep 23, 2019
40961: storage: fix nil pointer panic during replica changes r=ajwerner a=ajwerner

Before this PR there was a bug whereby a command attempting to move a range out
of a joint config would fail to find a RangeDescriptor for a range because it
was racing with a merge which destroyed that range. The code used to panic.

This fix is simply detecting the nil and not attempting to move out of a
joint config on a range that no longer exists.

This PR is almost exclusively testing.

Fixes #40877.

Release justification: Fixes a panic.

Release note: None

Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Sep 23, 2019

Build succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage: nil pointer dereference in execChangeReplicasTxn
3 participants