Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: work around can't-swap-leaseholder #40363

Merged
merged 3 commits into from
Sep 3, 2019

Conversation

tbg
Copy link
Member

@tbg tbg commented Aug 30, 2019

As of #40284, the replicate queue was issuing swaps (atomic add+remove)
during rebalancing. TestInitialPartitioning helpfully points out (once you
flip atomic rebalancing on) that when the replication factor is one, there
is no way to perform such an atomic swap because it will necessarily have
to remove the leaseholder.

To work around this restriction (which, by the way, we dislike - see
#40333), fall back to just adding a replica in this case without also
removing one. In the next scanner cycle (which should happen immediately
since we requeue the range) the range will be over-replicated and hopefully
the lease will be transferred over and then the original leaseholder
removed. I would be very doubtful that this all works, but it is how things
worked until #40284, so this PR really just falls back to the previous
behavior in cases where we can't do better.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 1 of 1 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @tbg)


pkg/storage/replicate_queue.go, line 515 at r3 (raw file):

// likely to be the leaseholder), then this removal would fail. Instead, this
// method will attempt to transfer the lease away, and returns true to indicate
// to the caller that it should not pursue the current replication change further.

"because it is no longer the leaseholder"


pkg/storage/replicate_queue.go, line 739 at r3 (raw file):

				// only, which should succeed, and the next time we touch this
				// range, we will have one more replica and hopefully it will
				// take the lease and remove the current leaseholder.

I'm surprised that this case doesn't hit an error when it calls maybeTransferLeaseAway. Could you mention what we expect to happen when you call that?

There may be nothing to roll back, so don't log unconditionally.

Release note: None
This was showing up a lot in TestInitialPartitioning. If we're trying to
remove something but nothing needs to be removed, that seems OK (though
there is some question of why we're hitting this regularly).

Release note: None
As of cockroachdb#40284, the replicate queue was issuing swaps (atomic add+remove)
during rebalancing.
TestInitialPartitioning helpfully points out (once you flip atomic
rebalancing on) that when the replication factor is one, there is
no way to perform such an atomic swap because it will necessarily
have to remove the leaseholder.

To work around this restriction (which, by the way, we dislike - see
\cockroachdb#40333), fall back to just adding a replica in this case without also
removing one. In the next scanner cycle (which should happen immediately
since we requeue the range) the range will be over-replicated and
hopefully the lease will be transferred over and then the original
leaseholder removed. I would be very doubtful that this all works,
but it is how things worked until cockroachdb#40284, so this PR really just
falls back to the previous behavior in cases where we can't do
better.

Release note: None
Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)

@tbg
Copy link
Member Author

tbg commented Sep 3, 2019

bors r=nvanbenschoten

craig bot pushed a commit that referenced this pull request Sep 3, 2019
40363: storage: work around can't-swap-leaseholder r=nvanbenschoten a=tbg

As of #40284, the replicate queue was issuing swaps (atomic add+remove)
during rebalancing. TestInitialPartitioning helpfully points out (once you
flip atomic rebalancing on) that when the replication factor is one, there
is no way to perform such an atomic swap because it will necessarily have
to remove the leaseholder.

To work around this restriction (which, by the way, we dislike - see
\#40333), fall back to just adding a replica in this case without also
removing one. In the next scanner cycle (which should happen immediately
since we requeue the range) the range will be over-replicated and hopefully
the lease will be transferred over and then the original leaseholder
removed. I would be very doubtful that this all works, but it is how things
worked until #40284, so this PR really just falls back to the previous
behavior in cases where we can't do better.

Release note: None

Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
@craig
Copy link
Contributor

craig bot commented Sep 3, 2019

Build succeeded

@craig craig bot merged commit 3686af8 into cockroachdb:master Sep 3, 2019
craig bot pushed a commit that referenced this pull request Sep 4, 2019
40370: storage: prepare for kv.atomic_replication_changes=true r=nvanbenschoten a=tbg

First three commits are #40363.

----

This PR enables atomic replication changes by default. But most of it is
just dealing with the fallout of doing so:

1. we don't handle removal of multiple learners well at the moment. This will
   be fixed more holistically in #40268, but it's not worth waiting for that
   because it's easy for us to just avoid the problem.
2. tests that carry out splits become quite flaky because at the beginning of
   a split, we transition out of a joint config if we see one, and due to
   the initial upreplication we often do. If we lose the race against the
   replicate queue, the split catches an error for no good reason.
   I took this as an opportunity to refactor the descriptor comparisons
   and to make this specific case a noop, but making it easier to avoid
   this general class of conflict where it's avoidable in the future.

There are probably some more problems that will only become apparent over time,
but it's quite simple to turn the cluster setting off again and to patch things
up if we do.

Release note (general change): atomic replication changes are now enabled
by default.

Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants