Explore option of supporting more flexible search types #12316

clintongormley · 2015-07-17T12:17:42Z

Today we have query_then_fetch and query_and_fetch. This imposes a limit on the types of search functionality we can support. For instance, if you want to auto-adjust the bucket interval so that your documents fit neatly into 10 buckets, you first need to determine the min and max values in order to calculate the correct interval (eg see #9572 and #9531).

This requires two round trips:

first determine the min/max values
calculate the required interval
do a second trip to bucket documents per interval

Or to improve term count accuracy in a terms agg, you could:

retrieve eg the top 20 terms from each shard
choose the top 10 overall
do a second trip (if needed) to get accurate counts for all terms

Or to guarantee that you get the top 10 terms overall:

first trip retrieves the top 20 terms per shard
calculate the overall top 10
take the doc count of the 10th term -> 10th_count
second trip retrieves all terms that have at least 10th_count / num_shards
third trip calculates accurate counts for all the terms returned by the second trip

Multiple search phases would also help with clustering algorithms

The text was updated successfully, but these errors were encountered:

colings86 · 2015-07-24T10:11:07Z

#10217 will be required before we do this as decisions on how many phases are required will need to be made on the coordinating node so the query needs to be parsed there before we can do this.

Also this could get very complex since term count accuracy would require re-running the parent aggregations to get the right context (right documents) for the terms aggregation to work on for the accuracy round and would also require running the sub-aggregations on the accuracy round (and not on the initial round) to get the right values for the sub-aggregations. This gets even more complex if multiple terms aggregations are nested all with accuracy set to true.

brettlyman · 2016-06-06T16:16:27Z

We're seeing the same problem mentioned in #1305 that was closed since facets were deprecated, and we're using terms aggregations. We have a pretty complex setup with multiple shards and replicas per index, and the field being aggregated is a nested document.

When we do the terms aggregation we often see buckets with wrong counts, or even no buckets returned at all. If we change the terms aggregation to a filter aggregation looking for a specific value in the nested document that should result in a bucket, we get hits returned. Note that we're not looking for "top X" buckets, just returning all buckets and trying to get an accurate count.

I believe our queries were fine up until a couple of weeks ago, so perhaps there's a shard/routing/etc. setting that causes this to happen? Otherwise, please add my +1 to the request for a parameter to force accurate results, even though execution would be slower.

colings86 · 2018-03-13T12:15:11Z

@clintongormley do you think this could now be closed since we have the composite aggregation?

clintongormley · 2018-03-13T12:23:04Z

@colings86 these changes are all about the top-n results, which you can't get with the composite agg without retrieving all results. i think these requests are still valid

colings86 · 2018-03-13T12:23:53Z

@elastic/es-search-aggs

javanna · 2022-10-13T08:58:27Z

This is a rather old issue that had no activity in a long while. There are no concrete plans to work on addressing it at this time, hence I am closing it.

clintongormley added >enhancement high hanging fruit discuss :Search/Search Search-related issues that do not fall into other categories labels Jul 17, 2015

This was referenced Jul 17, 2015

terms facet gives wrong count with n_shards > 1 #1305

Closed

Terms agg: calculate aggs on 'other' bucket #12411

Closed

jpountz mentioned this issue Jul 31, 2015

Support dynamic interval and fixed buckets for histogram aggregation #9572

Closed

clintongormley added the Meta label Jan 26, 2016

clintongormley mentioned this issue Apr 7, 2016

Simplify ordering support on terms aggregations #17588

Closed

jccq mentioned this issue Jul 5, 2016

Added option to display 'Others' buckets in the pie chart elastic/kibana#7464

Closed

jccq mentioned this issue Oct 18, 2016

add auto_scale option to histogram configuration to auto scale interval elastic/kibana#8139

Closed

clintongormley added the :Analytics/Aggregations Aggregations label Nov 28, 2016

gingerwizard mentioned this issue Mar 30, 2017

Distinct Terms Aggregations #23818

Closed

mrec mentioned this issue Apr 21, 2017

Enhancement: Range agg specified as max bucket count rather than explicit ranges #24254

Closed

colings86 removed the discuss label Mar 13, 2018

rjernst added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Meta label for search team labels May 4, 2020

javanna closed this as not planned Won't fix, can't repro, duplicate, stale Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore option of supporting more flexible search types #12316

Explore option of supporting more flexible search types #12316

clintongormley commented Jul 17, 2015

colings86 commented Jul 24, 2015

brettlyman commented Jun 6, 2016

colings86 commented Mar 13, 2018

clintongormley commented Mar 13, 2018

colings86 commented Mar 13, 2018

javanna commented Oct 13, 2022

Explore option of supporting more flexible search types #12316

Explore option of supporting more flexible search types #12316

Comments

clintongormley commented Jul 17, 2015

colings86 commented Jul 24, 2015

brettlyman commented Jun 6, 2016

colings86 commented Mar 13, 2018

clintongormley commented Mar 13, 2018

colings86 commented Mar 13, 2018

javanna commented Oct 13, 2022