-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A method to reduce the time cost to update cluster state #46941
Comments
Pinging @elastic/es-distributed |
Since #44433 (i.e. 7.4.0)
I don't fully understand how these numbers add up, but I can say that this is an unreasonably large number of shards per node. Since #34892 (i.e. 7.0.0) you are forbidden from having so many shards in a cluster of this size.
I can't recommend this and don't think it will be acceptable as a default. If you set this setting to There may be improvements to make in this area, but we need to investigate whether the issue described here still exists in clusters running a more recent version and with a configuration that's within recommended limits. |
The time spent finding the initialising and relocating shards when computing relocations is heavily dependent on the number of shards per node. Measuring no-op reroutes in which there are no shards moving (the common case) and the disk threshold decider is in play:
This shows that the savings are significant when you have too many shards per node, but much less dramatic in clusters that respect the 1000-shards-per-node limit. We could for instance precompute the |
It's my pleasure to wait for your reply. |
@kkewwei we discussed this today and think it could be worthwhile to explore adding some precomputation to |
We also agreed that setting |
@DaveCTurner, I would like to do it. |
Thanks for volunteering @kkewwei, looking forward to seeing your PR. |
Today a couple of allocation deciders iterate through all the shards on a node to find the `INITIALIZING` or `RELOCATING` ones, and this can slow down cluster state updates in clusters with very high-density nodes holding many thousands of shards even if those shards belong to closed or frozen indices. This commit pre-computes the sets of `INITIALIZING` and `RELOCATING` shards to speed up this search. Closes #46941 Relates #48579 Co-authored-by: "hongju.xhj" <hongju.xhj@alibaba-inc.com>
Today a couple of allocation deciders iterate through all the shards on a node to find the `INITIALIZING` or `RELOCATING` ones, and this can slow down cluster state updates in clusters with very high-density nodes holding many thousands of shards even if those shards belong to closed or frozen indices. This commit pre-computes the sets of `INITIALIZING` and `RELOCATING` shards to speed up this search. Closes #46941 Relates #48579 Co-authored-by: "hongju.xhj" <hongju.xhj@alibaba-inc.com>
ES_VERSION: 5.6.8
JVM version : JDK1.8.0_112
OS version : linux
Description of the problem including expected versus actual behavior:
As it's known, Updating cluster state on master node will cost too much time, which seriously affects the size and stability of the cluster. In out product, updating cluster state will cost 15s+ with the cluster of 50 nodes and 3,000 indices, 60,000 shard, the experience is very poor when we want to create index and delete index.
To find out why it cost so much time on updating cluste state, I get the thread stack about updateTask, such that:
I try several times and get the same thread stack. it seems that
DiskThresholdDecider.sizeOfRelocatingShards
will cost too much time, the code is as follow:It says that: to test whether the shard can remain stay on the node or not ,we will get the size of relocating shards, then we will get all the shards(about 6,000 shards on one node) of the node, check the shards if is be
RELOCATING
orINITIALIZING
. This is only one shard, there have 60,000 shard need to be test, and will be 60,000 * 6,000 times checkout, which will cost too much times.I find that we can use the settings to avoid this check:
"cluster.routing.allocation.disk.include_relocations":"false"
. when i set it to be false, the time to update cluster state decreases from 15s to 3s which has achives better result.if we could set the
cluster.routing.allocation.disk.include_relocations
to befalse
by default, most of us will ignore the default setting. or if we could reserve the shard state of relocating and initializing about every node in cluster state, so we will not find out the shards every time by checking every time when updating cluster state.The text was updated successfully, but these errors were encountered: