Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: switching from ZONE to REGION survival causes unexpected data movement #63810

Closed
nvanbenschoten opened this issue Apr 17, 2021 · 2 comments · Fixed by #89650
Closed

kv: switching from ZONE to REGION survival causes unexpected data movement #63810

nvanbenschoten opened this issue Apr 17, 2021 · 2 comments · Fixed by #89650
Assignees
Labels
A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-multiregion

Comments

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Apr 17, 2021

When switching from ZONE to REGION survival in a 3 node cluster, only a single snapshot is necessary per range. This is because we switch from a topology that looks like:

region 1: voter (leaseholder), voter, voter
region 2: non-voter
region 3: non-voter

to a topology that looks like:

region 1: voter (leaseholder), voter
region 2: voter, voter
region 3: voter

So if both non-voters are promoted to voters, there should only be one snapshot necessary. Furthermore, we could do something smart about how we send that snapshot to avoid the WAN traffic - #42491. But let's ignore that for now.

In one of my tests, this is not what I saw. After switching from ZONE to REGION survivability, each range took the following steps:

1. add new voter in region 2
2. add new voter in region 3
3. remove non-voter in region 2
4. remove non-voter in region 3
5. move voter from region 1 to region 2

This resulted in a total of 3 range snapshots all sent over the WAN. This is a decent amount of wasted data movement, given that we had two perfectly good non-voting replicas that we could have promoted. Do we understand why we made these decisions?

r6455_manual_enqueue_logs.txt
r6455 Range _ Debug _ Cockroach Console Before.pdf
r6455 Range _ Debug _ Cockroach Console After.pdf

Here's the log from a second instance that hurts even more because it includes a non-voter that is deleted and then is quickly replaced by a voter on the same node.

r6456_manual_enqueue_logs.txt

Note: this is an inefficiency, but certainly nothing that we need to rush to fix for v21.1.0. Everything still worked, it was just not as optimal as I was hoping.

Jira issue: CRDB-6780

@nvanbenschoten nvanbenschoten added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-distribution Relating to rebalancing and leasing. labels Apr 17, 2021
@aayushshah15
Copy link
Contributor

aayushshah15 commented Apr 17, 2021

What we’re missing here is that voter rebalancing doesn’t opportunistically prefer stores that already have non-voters. Currently, we will promote an existing non-voter if we happen to pick a store that has one.

I think another scenario where this inefficiency is unfortunate is when switching the active region for a table or when switching the primary region for a database. If we were smarter about incorporating the presence of these non-voters in our rebalancing decisions, we could save on a WAN snapshot per range in that scenario as well. Moreover, we should be able to move the lease over to the new primary region in essentially the time it takes to push two commands (the ChangeReplicas, which wouldn't involve a snapshot, and the TransferLease) through Raft. That would be pretty sweet.

@nvanbenschoten
Copy link
Member Author

Thanks @aayushshah15, that's all helpful. Is this item to opportunistically prefer stores that already have non-voters so that we can exploit promotion/demotion tracked somewhere, or should we make this the tracking issue for it?

Also, for completeness, here's the behavior of the inverse change, moving from REGION to ZONE survivability:

r6456_manual_enqueue_again_logs.txt

We make the following changes, resulting in 3 range snapshots, 2 of which cross the WAN:

1. remove voter in region 2
2. remove voter in region 3
3. add new non-voter in region 2
4. add new non-voter in region 3
5. add new voter in region 1

Ideally, I think we'd do the following, resulting in only 1 range snapshot:

1. demote voter to non-voter in region 2
2. demote voter to non-voter in region 3
3. add new voter in region 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-multiregion
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants