Build global ordinals terms bucket from matching ordinals #30166

jimczi · 2018-04-26T07:54:55Z

The global ordinals terms aggregator has an option to remap global ordinals to
dense ordinal that match the request. This mode is automatically picked when the terms
aggregator is a child of another bucket aggregator or when it needs to defer buckets to an
aggregation that is used in the ordering of the terms.
Though when building the final buckets, this aggregator loops over all possible global ordinals
rather than using the hash map that was built to remap the ordinals.
For fields with high cardinality this is highly inefficient and can lead to slow responses even
when the number of terms that match the query is low.
This change fixes this performance issue by using the hash table of matching ordinals to perform
the pruning of the final buckets for the terms and significant_terms aggregation.
I ran a simple benchmark with 1M documents containing 0 to 10 keywords randomly selected among 1M unique terms.
This field is used to perform a multi-level terms aggregation using rally to collect the response times.
The aggregation below is an example of a two-level terms aggregation that was used to perform the benchmark:

"aggregations":{
   "1":{
      "terms":{
         "field":"keyword"
      },
      "aggregations":{
         "2":{
            "terms":{
               "field":"keyword"
            }
         }
      }
   }
}

Levels of aggregation	50th percentile ms (master)	50th percentile ms (patch)
2	640.41	577.499
3	2239.66	600.154
4	14141.2	703.512

Closes #30117

The global ordinals terms aggregator has an option to remap global ordinals to dense ordinal that match the request. This mode is automatically picked when the terms aggregator is a child of another bucket aggregator or when it needs to defer buckets to an aggregation that is used in the ordering of the terms. Though when building the final buckets, this aggregator loops over all possible global ordinals rather than using the hash map that was built to remap the ordinals. For fields with high cardinality this is highly inefficient and can lead to slow responses even when the number of terms that match the query is low. This change fixes this performance issue by using the hash table of matching ordinals to perform the pruning of the final buckets for the terms and significant_terms aggregation. I ran a simple benchmark with 1M documents containing 0 to 10 keywords randomly selected among 1M unique terms. This field is used to perform a multi-level terms aggregation using rally to collect the response times. The aggregation below is an example of a two-level terms aggregation that was used to perform the benchmark: ``` "aggregations":{ "1":{ "terms":{ "field":"keyword" }, "aggregations":{ "2":{ "terms":{ "field":"keyword" } } } } } ``` | Levels of aggregation | 50th percentile ms (master) | 50th percentile ms (patch) | | --- | --- | --- | | 2 | 640.41ms | 577.499ms | | 3 | 2239.66ms | 600.154ms | | 4 | 14141.2ms | 703.512ms | Closes elastic#30117

elasticmachine · 2018-04-26T07:54:57Z

Pinging @elastic/es-search-aggs

markharwood · 2018-04-26T08:26:32Z

...csearch/search/aggregations/bucket/significant/GlobalOrdinalsSignificantTermsAggregator.java

@@ -103,11 +101,22 @@ public SignificantStringTerms buildAggregation(long owningBucketOrdinal) throws

        BucketSignificancePriorityQueue<SignificantStringTerms.Bucket> ordered = new BucketSignificancePriorityQueue<>(size);
        SignificantStringTerms.Bucket spare = null;
-        for (long globalTermOrd = 0; globalTermOrd < valueCount; ++globalTermOrd) {
-            if (includeExclude != null && !acceptedGlobalOrdinals.get(globalTermOrd)) {
+        boolean needsFullSan = bucketOrds == null || bucketCountThresholds.getMinDocCount() == 0;


Typo - "needsFullScan"

jpountz

Thanks @jimczi !

jpountz · 2018-04-26T09:51:05Z

...csearch/search/aggregations/bucket/significant/GlobalOrdinalsSignificantTermsAggregator.java

-        for (long globalTermOrd = 0; globalTermOrd < valueCount; ++globalTermOrd) {
-            if (includeExclude != null && !acceptedGlobalOrdinals.get(globalTermOrd)) {
+        boolean needsFullScan = bucketOrds == null || bucketCountThresholds.getMinDocCount() == 0;
+        long maxId = needsFullScan ? valueCount : bucketOrds.size();


let's make both final

jpountz · 2018-04-26T09:51:32Z

.../org/elasticsearch/search/aggregations/bucket/terms/GlobalOrdinalsStringTermsAggregator.java

-        for (long globalTermOrd = 0; globalTermOrd < valueCount; ++globalTermOrd) {
-            if (includeExclude != null && !acceptedGlobalOrdinals.get(globalTermOrd)) {
+        boolean needsFullScan = bucketOrds == null || bucketCountThresholds.getMinDocCount() == 0;
+        long maxId = needsFullScan ? valueCount : bucketOrds.size();


let's make them final?

pakein · 2018-04-26T12:09:11Z

Get globalOrd by globalOrd = bucketOrds.get(bucketOrd)
if globalOrd == 0, it may be
bucket exists and globalOrd=0
or
bucket not exists
when globalOrd =0 match , it seems the result my be duplicate ?

jimczi · 2018-04-26T12:15:32Z

We use bucketOrds#get on a bucket ord that is guaranteed to exist since we only iterate the existing slots in LongHash and the ids in the hash are dense. The returned long is undefined if the slot is empty so 0 does not indicate a non-existing bucket but we should be safe here since we always use the filled slot.

pakein · 2018-04-26T13:48:01Z

thanks @jimczi. I read the code carefully and it's my fault.

The global ordinals terms aggregator has an option to remap global ordinals to dense ordinal that match the request. This mode is automatically picked when the terms aggregator is a child of another bucket aggregator or when it needs to defer buckets to an aggregation that is used in the ordering of the terms. Though when building the final buckets, this aggregator loops over all possible global ordinals rather than using the hash map that was built to remap the ordinals. For fields with high cardinality this is highly inefficient and can lead to slow responses even when the number of terms that match the query is low. This change fixes this performance issue by using the hash table of matching ordinals to perform the pruning of the final buckets for the terms and significant_terms aggregation. I ran a simple benchmark with 1M documents containing 0 to 10 keywords randomly selected among 1M unique terms. This field is used to perform a multi-level terms aggregation using rally to collect the response times. The aggregation below is an example of a two-level terms aggregation that was used to perform the benchmark: ``` "aggregations":{ "1":{ "terms":{ "field":"keyword" }, "aggregations":{ "2":{ "terms":{ "field":"keyword" } } } } } ``` | Levels of aggregation | 50th percentile ms (master) | 50th percentile ms (patch) | | --- | --- | --- | | 2 | 640.41ms | 577.499ms | | 3 | 2239.66ms | 600.154ms | | 4 | 14141.2ms | 703.512ms | Closes #30117

* master: (7173 commits) Bump changelog version to 6.4 (elastic#30217) [DOCS] Adds native realm security settings (elastic#30186) Test: Switch painless test to 1 shard CCS: Drop http address from remote cluster info (elastic#29568) Reindex: Fold "from old" tests into reindex module (elastic#30142) Convert FieldCapabilitiesResponse to a ToXContentObject. (elastic#30182) [DOCS] Added 'on a single shard' to description of max_thread_count. Closes 28518 (elastic#29686) [TEST] Redirect links to new locations (elastic#30179) Move repository-s3 fixture tests to QA test project (elastic#29372) Fail snapshot operations early on repository corruption (elastic#30140) Docs: Document `failures` on reindex and friends Build global ordinals terms bucket from matching ordinals (elastic#30166) Watcher: Ensure mail message ids are unique per watch action (elastic#30112) REST: Remove GET support for clear cache indices (elastic#29525) SQL: Correct error message (elastic#30138) Require acknowledgement to start_trial license (elastic#30135) Fix a bug in FieldCapabilitiesRequest#equals and hashCode. (elastic#30181) SQL: Add BinaryMathProcessor to named writeables list (elastic#30127) Tests: Use buildDir as base for generated-resources (elastic#30191) Fix SliceBuilderTests#testRandom failures ...

mattweber · 2018-05-02T19:31:24Z

Thank you @jimczi and @pakein!

I came across this issue when investigating a performance issue for one of my customers aggregations and applied this patch to a forked terms aggregation in a plugin for ES 6.2.1. For our use-case which is 4-levels of terms aggregations against ~3M docs and the 4th level being a very high cardinality field we went from ~350s down to ~20s!!

Any chance of getting this in before 6.4.0 as it has the potential for such massive performance improvements with no side-affects?

jimczi · 2018-05-03T08:59:37Z

Any chance of getting this in before 6.4.0 as it has the potential for such massive performance improvements with no side-affects?

It was too late for 6.3.0 and it's not a critical bug (though this is arguable ;) ) hence 6.4.0.

For our use-case which is 4-levels of terms aggregations against ~3M docs and the 4th level being a very high cardinality field we went from ~350s down to ~20s!!

Nice numbers ! Though 20s for 3M docs seems quite high even with 4 levels. How many terms do you retrieve per level ?

jimczi · 2018-05-03T09:14:19Z

I discussed with @colings86 and we've decided to backport this pr in 6.3.1. It's really too late for 6.3.0 but if there is a 6.3.1 (which is not guaranteed) this pr will be included.

mattweber · 2018-05-03T16:55:38Z

@jimczi Thank you, 6.3.1 is perfectly fine and I understand it is not guaranteed to happen.

For our use-case, the first 3-levels are 100, and the 4th is unbounded (essentially all doc ids for the bucket). Its this 4th level id collection that is the issue and I am working on trying to get them to remove this "requirement".

The global ordinals terms aggregator has an option to remap global ordinals to dense ordinal that match the request. This mode is automatically picked when the terms aggregator is a child of another bucket aggregator or when it needs to defer buckets to an aggregation that is used in the ordering of the terms. Though when building the final buckets, this aggregator loops over all possible global ordinals rather than using the hash map that was built to remap the ordinals. For fields with high cardinality this is highly inefficient and can lead to slow responses even when the number of terms that match the query is low. This change fixes this performance issue by using the hash table of matching ordinals to perform the pruning of the final buckets for the terms and significant_terms aggregation. I ran a simple benchmark with 1M documents containing 0 to 10 keywords randomly selected among 1M unique terms. This field is used to perform a multi-level terms aggregation using rally to collect the response times. The aggregation below is an example of a two-level terms aggregation that was used to perform the benchmark: ``` "aggregations":{ "1":{ "terms":{ "field":"keyword" }, "aggregations":{ "2":{ "terms":{ "field":"keyword" } } } } } ``` | Levels of aggregation | 50th percentile ms (master) | 50th percentile ms (patch) | | --- | --- | --- | | 2 | 640.41ms | 577.499ms | | 3 | 2239.66ms | 600.154ms | | 4 | 14141.2ms | 703.512ms | Closes #30117

fredgalvao · 2018-05-26T01:16:13Z

Do {the tag swap + 6.3 commit} mean anything in regards to the possibility of having this in 6.3.0? Did anything change?
This is such a huge win and amazing optimization!

jimczi · 2018-05-28T06:52:47Z

@fredgalvao yes this change will be in 6.3.0

jimczi added 2 commits April 26, 2018 09:52

unused import

333e338

jimczi added >enhancement :Analytics/Aggregations Aggregations v7.0.0 v6.4.0 labels Apr 26, 2018

jimczi mentioned this pull request Apr 26, 2018

string terms is very slow when there are millions of buckets #30117

Closed

markharwood reviewed Apr 26, 2018

View reviewed changes

typos

5b6ed93

jpountz approved these changes Apr 26, 2018

View reviewed changes

address review

69c8160

jimczi added 2 commits April 26, 2018 18:24

Merge branch 'master' into global_ordinal_loop

fa905ab

Merge branch 'master' into global_ordinal_loop

78034d1

jimczi merged commit c08daf2 into elastic:master Apr 27, 2018

jimczi deleted the global_ordinal_loop branch April 27, 2018 13:26

jimczi added backport pending v6.3.1 labels May 3, 2018

bleskes added v6.3.0 and removed v6.3.1 labels May 16, 2018

bleskes added v6.3.1 and removed v6.3.0 labels May 16, 2018

jimczi removed the backport pending label May 16, 2018

jimczi added v6.3.0 and removed v6.3.1 labels May 28, 2018

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build global ordinals terms bucket from matching ordinals #30166

Build global ordinals terms bucket from matching ordinals #30166

jimczi commented Apr 26, 2018 •

edited

Loading

elasticmachine commented Apr 26, 2018

markharwood Apr 26, 2018

jpountz left a comment

jpountz Apr 26, 2018

jpountz Apr 26, 2018

pakein commented Apr 26, 2018

jimczi commented Apr 26, 2018

pakein commented Apr 26, 2018

mattweber commented May 2, 2018

jimczi commented May 3, 2018

jimczi commented May 3, 2018

mattweber commented May 3, 2018

fredgalvao commented May 26, 2018

jimczi commented May 28, 2018

Build global ordinals terms bucket from matching ordinals #30166

Build global ordinals terms bucket from matching ordinals #30166

Conversation

jimczi commented Apr 26, 2018 • edited Loading

elasticmachine commented Apr 26, 2018

markharwood Apr 26, 2018

Choose a reason for hiding this comment

jpountz left a comment

Choose a reason for hiding this comment

jpountz Apr 26, 2018

Choose a reason for hiding this comment

jpountz Apr 26, 2018

Choose a reason for hiding this comment

pakein commented Apr 26, 2018

jimczi commented Apr 26, 2018

pakein commented Apr 26, 2018

mattweber commented May 2, 2018

jimczi commented May 3, 2018

jimczi commented May 3, 2018

mattweber commented May 3, 2018

fredgalvao commented May 26, 2018

jimczi commented May 28, 2018

jimczi commented Apr 26, 2018 •

edited

Loading