-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Constraints to de-prioritize nodes from becoming shard allocation targets #43350
Comments
Pinging @elastic/es-distributed |
Thanks for the suggestion @vigyasharma. You've clearly put some effort into this idea. We discussed this recently as a team and drew two conclusions:
To expand on this second point, it sounds like you might be misusing the Would you please describe your cluster and its configuration in significantly more detail? Also, would you prepare a failing test case (e.g. an |
Thanks @DaveCTurner for discussing this with the team, and apologies for the delay in response (I was on a vacation). I don't think this should weaken the Let me provide more context on where this change helps. The problem with In such scenarios, overall cluster workload may go down due to one team scaling down their traffic, while other teams are still unaware of these changes. This makes revising the setting at index level non trivial and a cluster management overhead. Most users add this setting at index level to avoid collocating shards on one node during node replacements and new node additions (i.e. the workaround to problem addressed in this issue). If shard balancer can handle this implicitly, it removes the cognitive and management overhead of setting or revising these limits. Admins can scale down their fleet if net workload is reduced without individual indexes getting impacted.
|
Welcome back @vigyasharma, hope you had a good break. The concerns you describe are broadly addressed by the plan for finer-grained constraints laid out in #17213 (comment) and (as I commented there in reply to you a few months back) predicting the constraints dynamically is out of scope for now. Let's walk before we start to try and run, and let's try and keep this conversation focussed on this particular issue. Just a reminder that we're waiting for more details about your proposal:
|
Closing this as no further details were provided. |
The Problem
In clusters where each node has a reasonable number of shards, when a node is replaced, (or a small number of node(s) added); all shards of newly created indexes get allocated on the new empty node(s). It puts a lot of stress on the single (or few) new node, deteriorating overall cluster performance.
This happens because the
index-balance
factor in weight function is unable to offsetshard-balance
when other nodes are filled with shards. It causes the new node to always have minimum weight and get picked as the target for allocation (until the new node has approx. mean number of shards/node in the cluster).Further,
index-balance
does kick in after the node is relatively filled (as compared to other nodes), and it moves out the recently allocated shards. Thus for the same shards, which are newly created and usually getting indexing traffic, we end up doing twice the work in allocation.Current Workaround
A work around for this is to set the
total-shards-per-node
index/cluster setting , which limits shards of an index on a single node.This however, is a hard limit enforced by Allocation Deciders. If breached on all nodes, the shards go unassigned causing yellow/red clusters. Configuring this setting requires careful calculation around number of nodes, and must be revised when the cluster is scaled down.
Proposal
We propose an allocation constraint mechanism, that de-prioritizes nodes from getting picked for allocation if they breach certain constraints. Whenever an allocation constraint is breached for a shard (or index) on a node, we add a high positive constant to the node's weight. This increased weight makes the node less preferable for allocating the shard. Unlike deciders, however, this is not a hard filter. If no other nodes are eligible to accept shards (say due to deciders like disk watermarks), the shard can still be allocated on nodes with breached constraints.
Index Shards Per Node Constraint
This constraint controls the number of shards of an index on a single node.
indexShardsPerNodeConstraint
is breached if the number of shards of an index allocated on a node, exceeds average shards per node for that index.indexShardsPerNodeConstraint
getting breached, causes shards of the newly created index to get assigned to other nodes, thus preventing indexing hot spots. Post unassigned shard allocation, rebalance fills up the node with other index shards, without having to move out the already allocated shards. Since this does not prevent allocation, we do not run into unassigned shards due to breached constraints.The allocator flow now becomes:
Extension
This framework can be extended with other rules by modeling them as boolean constraints. One possiible example is primary shard density on a node, as required by issue #41543. A constraint that requires
#primaries on a node <= avg-primaries on a node
, could prevent scenarios with all primaries on few nodes and all replicas on others.Comparing Multiple Constraints
For every constraint breached, we add a high positive constant to the weight function. This is a step function approach where we consider all nodes on the lower step (lower weight) as more eligible than those on a higher step. The high positive constant ensures that node weight doesn't go as high as the next step unless it breaches a constraint.
Since the constant is added for every constraint breached, i.e.
c * HIGH_CONSTANT
, nodes with one constraints breached are preferred over nodes with two constraints breached and so on. All nodes with the same number of constraints breached simply resort back to the first part of weight function, which is based on shard count.We could potentially keep different weights for different constraints with some sort of ranking among them. But this can quickly become a maintenance and extension nightmare. Adding new constraints will require going through every other constraint weight and deciding on the right weight and place in priority.
Next Steps
If this idea makes sense and seems like a useful add to the Elasticsearch community, we can raise a PR for these changes.
This change is targeted to solve problems listed in #17213, #37638, #41543, #12279, #29437
The text was updated successfully, but these errors were encountered: