Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computing IndicesQueryCache stats is O(N²) in shard count #97222

Open
Tracked by #77466
DaveCTurner opened this issue Jun 29, 2023 · 1 comment · May be fixed by #97600
Open
Tracked by #77466

Computing IndicesQueryCache stats is O(N²) in shard count #97222

DaveCTurner opened this issue Jun 29, 2023 · 1 comment · May be fixed by #97600
Assignees
Labels
>bug :Data Management/Stats Statistics tracking and retrieval APIs Team:Data Management Meta label for data/management team

Comments

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Jun 29, 2023

We loop over all shards in org.elasticsearch.indices.IndicesService#statsByShard calling indexShardStats for each one, but then further down the stack in org.elasticsearch.indices.IndicesQueryCache#getStats we loop over all the shards again in order to compute the portion of the shared RAM usage to attribute to the current shard. These days a node can hold many thousands of shards, so this duplicated work consumes quite some resources.

We have the same loop in org.elasticsearch.action.admin.cluster.stats.TransportClusterStatsAction#nodeOperation.

We have effectively the same loop in org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction#shardOperation, but this one is trickier because the outer loop is within TransportBroadcastByNodeAction which doesn't currently have a facility for sharing any context between invocations on different shards.

Relates #77466.

@DaveCTurner DaveCTurner added >bug :Data Management/Stats Statistics tracking and retrieval APIs labels Jun 29, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Jun 29, 2023
@HiDAl HiDAl linked a pull request Jul 12, 2023 that will close this issue
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Aug 25, 2023
We don't need to build a map of all the shards just to compute the
shared RAM usage. With this commit we just compute the values for that
calculation directly.

This computation is still O(N²) in shard count (elastic#97222) but with a much
smaller constant now.
DaveCTurner added a commit that referenced this issue Aug 29, 2023
We don't need to build a map of all the shards just to compute the
shared RAM usage. With this commit we just compute the values for that
calculation directly.

This computation is still O(N²) in shard count (#97222) but with a much
smaller constant now.
@andreidan andreidan self-assigned this Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/Stats Statistics tracking and retrieval APIs Team:Data Management Meta label for data/management team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants