-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
execution_hint: 'map' loads global ords when it doesn't need to #37705
Labels
Comments
Pinging @elastic/es-analytics-geo |
Just a clarification note for anyone working on this in the future: the issue is that The hint works as expected, it's that the relationship between global ords and the aggregator aren't as you'd expect. |
polyfractal
changed the title
execution_hint: 'map' ignored in aggregation
execution_hint: 'map' loads global ords when it doesn't need to
Jan 22, 2019
jimczi
added a commit
that referenced
this issue
Feb 1, 2019
The terms aggregator loads the global ordinals to retrieve the cardinality of the field to aggregate on. This information is then used to select the strategy to use for the aggregation (breadth_first or depth_first). However this should be avoided if the execution_hint is explicitly set to map since this mode doesn't really need the global ordinals. Since we still need the cardinality of the field this change picks the maximum cardinality in the segments as an estimation of the total cardinality to select the strategy to use (breadth_first or depth_first). This estimation is only used if the execution hint is set to map, otherwise the global ordinals are still used to retrieve the accurate cardinality. Closes #37705
jimczi
added a commit
that referenced
this issue
Feb 1, 2019
The terms aggregator loads the global ordinals to retrieve the cardinality of the field to aggregate on. This information is then used to select the strategy to use for the aggregation (breadth_first or depth_first). However this should be avoided if the execution_hint is explicitly set to map since this mode doesn't really need the global ordinals. Since we still need the cardinality of the field this change picks the maximum cardinality in the segments as an estimation of the total cardinality to select the strategy to use (breadth_first or depth_first). This estimation is only used if the execution hint is set to map, otherwise the global ordinals are still used to retrieve the accurate cardinality. Closes #37705
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
(corrected description) 'execution_hint': 'map' loads global ordinals, even though they are not required. This feature is documented here.
I have a client with an index containing hundreds of millions of documents. Within this index, there is a high cardinality field with hundreds of millions of possible values. When the client executes a query that matches a few hundred documents, and then runs a terms aggregation on the high-cardinality field, Elastic will rebuild global ordinals, which can take 15 seconds (In-fact it even does this rebuild of the global ordinals if the query matched zero documents).There are several options for solving this issue:
wait 15 seconds to build global ordinals on execution of the aggregation (not acceptable, and not a real solution)
enable eager global ordinals and increase the refresh interval to minimize the impact of constant rebuilding of global ordinals (which is not ideal due to having to wait to see results, and the constant work of rebuilding global ordinals)
use ‘map’ to only evaluate documents that match the query when running the terms aggregation (doesn’t work)
do a hack - use a script to return the value for the terms aggregation, which forces global ordinals to be ignored as they don't exist for a script-generated field (this works, but feels hackey).
To give context, this is for a bank. A given client will want to see all the IBAN numbers they have transfered to. There are hundreds of millions of IBAN numbers, but each client will have only used on the order of hundreds.
I am currently using option (4) to work around the fact that (3) does not work. Ideally I would like to use (3) execution_hint: map to solve this issue.
This was discussed in the #elasticsearch slack channel on Jan 22, 2019
The text was updated successfully, but these errors were encountered: