kv: switching from ZONE to REGION survival causes unexpected data movement #63810

nvanbenschoten · 2021-04-17T06:52:11Z

When switching from ZONE to REGION survival in a 3 node cluster, only a single snapshot is necessary per range. This is because we switch from a topology that looks like:

region 1: voter (leaseholder), voter, voter
region 2: non-voter
region 3: non-voter

to a topology that looks like:

region 1: voter (leaseholder), voter
region 2: voter, voter
region 3: voter

So if both non-voters are promoted to voters, there should only be one snapshot necessary. Furthermore, we could do something smart about how we send that snapshot to avoid the WAN traffic - #42491. But let's ignore that for now.

In one of my tests, this is not what I saw. After switching from ZONE to REGION survivability, each range took the following steps:

1. add new voter in region 2
2. add new voter in region 3
3. remove non-voter in region 2
4. remove non-voter in region 3
5. move voter from region 1 to region 2

This resulted in a total of 3 range snapshots all sent over the WAN. This is a decent amount of wasted data movement, given that we had two perfectly good non-voting replicas that we could have promoted. Do we understand why we made these decisions?

r6455_manual_enqueue_logs.txt
r6455 Range _ Debug _ Cockroach Console Before.pdf
r6455 Range _ Debug _ Cockroach Console After.pdf

Here's the log from a second instance that hurts even more because it includes a non-voter that is deleted and then is quickly replaced by a voter on the same node.

r6456_manual_enqueue_logs.txt

Note: this is an inefficiency, but certainly nothing that we need to rush to fix for v21.1.0. Everything still worked, it was just not as optimal as I was hoping.

Jira issue: CRDB-6780

The text was updated successfully, but these errors were encountered:

aayushshah15 · 2021-04-17T20:11:00Z

What we’re missing here is that voter rebalancing doesn’t opportunistically prefer stores that already have non-voters. Currently, we will promote an existing non-voter if we happen to pick a store that has one.

I think another scenario where this inefficiency is unfortunate is when switching the active region for a table or when switching the primary region for a database. If we were smarter about incorporating the presence of these non-voters in our rebalancing decisions, we could save on a WAN snapshot per range in that scenario as well. Moreover, we should be able to move the lease over to the new primary region in essentially the time it takes to push two commands (the ChangeReplicas, which wouldn't involve a snapshot, and the TransferLease) through Raft. That would be pretty sweet.

nvanbenschoten · 2021-04-19T14:34:37Z

Thanks @aayushshah15, that's all helpful. Is this item to opportunistically prefer stores that already have non-voters so that we can exploit promotion/demotion tracked somewhere, or should we make this the tracking issue for it?

Also, for completeness, here's the behavior of the inverse change, moving from REGION to ZONE survivability:

r6456_manual_enqueue_again_logs.txt

We make the following changes, resulting in 3 range snapshots, 2 of which cross the WAN:

1. remove voter in region 2
2. remove voter in region 3
3. add new non-voter in region 2
4. add new non-voter in region 3
5. add new voter in region 1

Ideally, I think we'd do the following, resulting in only 1 range snapshot:

1. demote voter to non-voter in region 2
2. demote voter to non-voter in region 3
3. add new voter in region 1

nvanbenschoten added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-distribution Relating to rebalancing and leasing. labels Apr 17, 2021

nvanbenschoten assigned aayushshah15 Apr 17, 2021

jlinder added T-kv KV Team T-multiregion labels Jun 16, 2021

aayushshah15 removed their assignment Feb 3, 2022

KaiSun314 mentioned this issue Oct 10, 2022

allocatorimpl: Prioritize non-voters in voter additions #89650

Merged

craig bot closed this as completed in b76537e Oct 28, 2022

exalate-issue-sync bot removed the T-kv KV Team label Oct 28, 2022

exalate-issue-sync bot assigned kvoli May 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: switching from ZONE to REGION survival causes unexpected data movement #63810

kv: switching from ZONE to REGION survival causes unexpected data movement #63810

nvanbenschoten commented Apr 17, 2021 •

edited by cockroach-jira-scripts

Loading

aayushshah15 commented Apr 17, 2021 •

edited

Loading

nvanbenschoten commented Apr 19, 2021

kv: switching from ZONE to REGION survival causes unexpected data movement #63810

kv: switching from ZONE to REGION survival causes unexpected data movement #63810

Comments

nvanbenschoten commented Apr 17, 2021 • edited by cockroach-jira-scripts Loading

aayushshah15 commented Apr 17, 2021 • edited Loading

nvanbenschoten commented Apr 19, 2021

nvanbenschoten commented Apr 17, 2021 •

edited by cockroach-jira-scripts

Loading

aayushshah15 commented Apr 17, 2021 •

edited

Loading