Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: refactor rebalanceCandidates selection to do less comparing across localities #20751

Closed
a-robinson opened this issue Dec 15, 2017 · 5 comments
Assignees
Milestone

Comments

@a-robinson
Copy link
Contributor

When deciding whether or not to rebalance from one store to another, our comparisons of which stores are too full or too empty don't do a great job of considering localities and diversity. We construct a single list of all non-dead stores in the cluster, then compute various averages based on those stores to use in picking rebalance candidates.

However, this can be very unfair if the number of nodes in each locality is very different. For example, if localities A and B have 2 nodes each and locality C has 4 nodes, we'd expect the nodes in locality C to be half as full as the nodes in A and B. It's inaccurate to directly compare them, and can lead to situations where we're too eager to rebalance between nodes in C and not very willing to rebalance between nodes within A or B.

#20241 is a great reproduction of this. I've been fixing the replica thrashing in that by extending #18364 and will send out that change soon, but it still leaves the issue of us being less willing to rebalance between nodes in the smaller localities. Thinking about the problem has made clear that the current structure of the code isn't really aligned with the problem when localities are in play. Rather than one large StoreList containing all the stores, I think we want something like separate StoreLists per locality (or set of "equivalent" localities) so that we don't label all stores in A and B as overfull and all stores in C as underfull.

@cuongdo - this could make for a good introduction to the allocator code for someone - it should take on the order of a week or so. It'd be really nice to get into 2.0 given the focus on global deployments, but isn't absolutely necessary.

@bdarnell
Copy link
Contributor

For example, if localities A and B have 2 nodes each and locality C has 4 nodes, we'd expect the nodes in locality C to be half as full as the nodes in A and B.

On the other hand, sometimes we do want to compare across localities. if there are four localities, and A and B have 2 nodes each and C and D have 4, we'd want things to adjust so that each node is equally full even across localities (in the absence of other constraints).

@a6802739
Copy link
Contributor

@a-robinson , could I have a try?

@a-robinson
Copy link
Contributor Author

On the other hand, sometimes we do want to compare across localities. if there are four localities, and A and B have 2 nodes each and C and D have 4, we'd want things to adjust so that each node is equally full even across localities (in the absence of other constraints).

Yeah, that's where something like the sets of "equivalent" localities that I mentioned would have to come into play. For example, if a range had replicas in localities A, B, and C, and didn't in D, we might use equivalence classes {A, D}, {B, D}, {C, D} when deciding whether a rebalance would be beneficial.

@a-robinson , could I have a try?

Of course! If you want to talk more about the problem at all, just let me know.

@a6802739
Copy link
Contributor

@a-robinson if we use separate StoreLists per locality, how could we know the store within each locality is overfull or underfull?

And what do you mean set of "equivalent" localities?

Thanks!

@a-robinson
Copy link
Contributor Author

@a6802739 I'm actually working on this currently as part of #19985. I'll send something out for it this week.

@a-robinson a-robinson assigned a-robinson and unassigned a6802739 Jan 30, 2018
a-robinson added a commit to a-robinson/cockroach that referenced this issue Feb 6, 2018
This is the core work required for cockroachdb#19985 (but without exposing it to
users yet), and fixes cockroachdb#20751 along the way.

Release note: None
a-robinson added a commit to a-robinson/cockroach that referenced this issue Feb 12, 2018
This is the core work required for cockroachdb#19985 (but without exposing it to
users yet), and fixes cockroachdb#20751 along the way.

Release note: None
a-robinson added a commit to a-robinson/cockroach that referenced this issue Feb 15, 2018
This is the core work required for cockroachdb#19985 (but without exposing it to
users yet), and fixes cockroachdb#20751 along the way.

Release note: None
a-robinson added a commit to a-robinson/cockroach that referenced this issue Feb 20, 2018
This is the core work required for cockroachdb#19985 (but without exposing it to
users yet), and fixes cockroachdb#20751 along the way.

Release note: None
a-robinson added a commit to a-robinson/cockroach that referenced this issue Feb 21, 2018
This creates a new allowable format for zone config constraints, which
lets the user apply different sets of constraints to different numbers
of replicas within a zone. See the included tests for what the new format
looks like.

Fixes cockroachdb#19985

It also improves the handling of localities that are more or less full
than one another for some reason outside the system's control (e.g. if
one has more nodes than the others or if some constraints have required
more ranges to be in one). In such cases, this does a better job of
balancing ranges amongst the nodes that are more or less full because we
compare them amongst each other now rather than comparing them to the
StoreList averages of all the stores in the cluster.

Fixes cockroachdb#20751

This also deprecates positive (non-required, non-prohibited) constraints.
New positive constraints cannot be set, and existing positive
constraints will be ignored.

Release note (cli change): Replication zone constraints can now be
specified on a per-replica basis, meaning you can configure some
replicas in a zone's ranges to follow one set of constraints and other
replicas to follow other constraints.

Release note (backwards-incompatible change): Positive constraints in
replication zone configs no longer work. They have never been
documented or officially supported, but will no longer be allowed at
all. Any existing positive constraints will be ignored.
a-robinson added a commit to a-robinson/cockroach that referenced this issue Feb 21, 2018
This creates a new allowable format for zone config constraints, which
lets the user apply different sets of constraints to different numbers
of replicas within a zone. See the included tests for what the new format
looks like.

Fixes cockroachdb#19985

It also improves the handling of localities that are more or less full
than one another for some reason outside the system's control (e.g. if
one has more nodes than the others or if some constraints have required
more ranges to be in one). In such cases, this does a better job of
balancing ranges amongst the nodes that are more or less full because we
compare them amongst each other now rather than comparing them to the
StoreList averages of all the stores in the cluster.

Fixes cockroachdb#20751

This also deprecates positive (non-required, non-prohibited) constraints.
New positive constraints cannot be set, and existing positive
constraints will be ignored.

Release note (cli change): Replication zone constraints can now be
specified on a per-replica basis, meaning you can configure some
replicas in a zone's ranges to follow one set of constraints and other
replicas to follow other constraints.

Release note (backwards-incompatible change): Positive constraints in
replication zone configs no longer work. They have never been
documented or officially supported, but will no longer be allowed at
all. Any existing positive constraints will be ignored.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants