Latency improvements to Multi Term Aggregations #14993

expani · 2024-07-29T11:42:09Z

Description

This PR aims to introduce the following improvements :

Reduces the latency of Multi Term Aggregation queries by 7-10 seconds
Reduces the memory footprint of multi term aggregation queries by decreasing its allocations.

Testing was done on a c5.9xlarge with 20GB of JVM heap and store type as mmapfs to ensure any affect of EBS Latencies doesn't affect the result much. The same was verified using lsof on the OS pid to ensure all index files are m-mapped ( mem is the type in such cases )

The numbers below are averaged after 20 iterations of each type of query with and w/o changes.

Workload	Field1	Field2	WithoutChanges	WithChanges
big5	agent_name	host_name	236 secs	226 secs
big5	process_name	agent_id	45 secs	38 secs
nyc_taxi	store_and_fwd_flag	payment_type	53 secs	44 secs

Sample Aggregation Query

curl -k -H 'Content-Type: application/json' https://localhost:9200/nyc_taxis/_search -u 'admin:xxx' -d '{
  "aggs": {
    "flag_and_payment_type": {
      "multi_terms": {
        "terms": [{
          "field": "store_and_fwd_flag"
        }, {
          "field": "payment_type"
        }]
      }
    }
  }
}'

Multi term aggregation goes through all the docs given by the collector of the filter query ( MatchAllDocs Query if no filter is present )

For every document given by the collector, it generates cartesian product of all the values for all the fields present in the aggregation here

A deep copy is generated for every composite key here which is eventually copied again ( only for the first time ) while adding to the bucket. This PR refactors the code to remove the need for a deep copy of every composite key.

We also perform a deep copy of the field values retrieved by Lucene here This is only essential for fields with multiple values in a document and can be avoided for fields with single value in a document.

Allocation Profiling

For Big5 Benchmark Process Name and Agent Id ( LOW CARDINALITY )

Deep Copy of composite key and for single valued fields takes around 25% of the overall allocations for a multi term aggregation query.

Collecting all the composite keys for every document in a list here also takes around 9% of the overall allocations.

For Big5 Benchmark Agent Name and Host Name ( HIGH CARDINALITY )

19% of overall allocations spent in deep copy of composite key

Collecting all composite keys taking around 9% same as before.

Also, for each loop to go over the field values for a document here contributes to 17% of overall allocations because of creating a new Iterator every time. Changed the same to use a regular for loop.

Testing

I ensured that results of the output were same with and without my changes for aggregation queries for different fields of the Big5 dataset.
Testing was done for concurrent search using different field combinations and the results were the same.

Will see the existing integs and UTs for Multi term aggregations to ensure if any corner cases are not covered.

Signed-off-by: expani <anijainc@amazon.com>

github-actions · 2024-07-29T12:30:50Z

❌ Gradle check result for fb84412: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

bowenlan-amzn

Looks good! Like how you document the profiling results to explain this!

server/src/main/java/org/opensearch/search/aggregations/bucket/terms/MultiTermsAggregator.java

sandeshkr419 · 2024-07-31T18:59:26Z

Thanks @expani for the code changes and detailed explanation on PR - I do have some minor comments on refactoring mainly. Please add relevant comments/javadocs to help future developers understand minor optimization and utilities for various low level operations.

Did we check performance on any data which has multi-valued fields as well? If not, let us propose a change in OSB for multi-valued fields in some workloads, in case we don't have any such available workloads.

Also, let us iterate CI to green and check if we have a solid code coverage as well.

expani · 2024-07-31T20:28:15Z

Thanks for taking the time to review @bowenlan-amzn and @sandeshkr419

I will add required java docs and comments as suggested.
I have explained the reasoning for code structure, let me know if you feel otherwise.

Did we check performance on any data which has multi-valued fields as well? If not, let us propose a change in OSB for multi-valued fields in some workloads, in case we don't have any such available workloads.

Checked with few multi-values fields only for testing and not from a performance perspective.
Will check with OSB team for any such existing workloads as I need it for other possible optimisations as well.

Current CI seems to be failing due to 2 tests that have been reported to be flaky by multiple other folks.

> Task :server:internalClusterTest

Tests with failures:
 - org.opensearch.action.admin.indices.create.RemoteSplitIndexIT.classMethod
 - org.opensearch.remotestore.RemoteStoreStatsIT.testDownloadStatsCorrectnessSinglePrimarySingleReplica

Will check on the coverage of the existing tests and add any if required.

github-actions · 2024-09-06T10:37:20Z

❌ Gradle check result for 73b9bf5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

expani · 2024-09-06T10:54:54Z

Creating this comment to record all the build failures in the PR after rebasing with mainline, all these test execute just fine in local :

RemoteStoreMigrationSettingsUpdateIT
Cluster taking longer to become green causing test to time out waiting for cluster to be green

./gradlew :server:internalClusterTest --tests "org.opensearch.remotemigration.RemoteStoreMigrationSettingsUpdateIT.*"

https://github.com/opensearch-project/OpenSearch/actions/runs/10735985172/job/29774489951?pr=14993

github-actions · 2024-09-06T15:32:44Z

❕ Gradle check result for 25617bd: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Signed-off-by: expani <anijainc@amazon.com>

github-actions · 2024-09-10T14:46:13Z

❌ Gradle check result for 6645e73: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-09-10T14:55:37Z

❌ Gradle check result for 4b11dcb: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-09-10T19:03:21Z

✅ Gradle check result for 4b11dcb: SUCCESS

sandeshkr419 · 2024-09-18T20:13:21Z

@expani Can you please add changelog - that should help in closing this out.

github-actions · 2024-09-18T20:24:02Z

✅ Gradle check result for 4b11dcb: SUCCESS

Avoid deep copy and other allocation improvements

fb84412

Signed-off-by: expani <anijainc@amazon.com>

expani requested review from anasalkouz, andrross, ashking94, Bukhtawar, CEHENKLE, dblock, dbwiddis, gbbafna, kotwanikunal, mch2, msfroh, nknize, owaiskazi19, reta, Rishikesh1159, sachinpkale, saratvemulapalli, shwetathareja, sohami and VachaShah as code owners July 29, 2024 11:42

bowenlan-amzn added v2.17.0 backport 2.x Backport to 2.x branch labels Jul 31, 2024

bowenlan-amzn approved these changes Jul 31, 2024

View reviewed changes

sandeshkr419 reviewed Jul 31, 2024

View reviewed changes

server/src/main/java/org/opensearch/search/aggregations/bucket/terms/MultiTermsAggregator.java Outdated Show resolved Hide resolved

sandeshkr419 reviewed Jul 31, 2024

View reviewed changes

server/src/main/java/org/opensearch/search/aggregations/bucket/terms/MultiTermsAggregator.java Outdated Show resolved Hide resolved

sandeshkr419 reviewed Jul 31, 2024

View reviewed changes

server/src/main/java/org/opensearch/search/aggregations/bucket/terms/MultiTermsAggregator.java Outdated Show resolved Hide resolved

Merge branch 'opensearch-project:main' into main

25617bd

expani and others added 2 commits September 10, 2024 19:22

Merge branch 'opensearch-project:main' into main

6645e73

Added test to cover branches in collectZeroDocEntriesIfRequired

4b11dcb

Signed-off-by: expani <anijainc@amazon.com>

expani closed this Sep 10, 2024

expani reopened this Sep 10, 2024

sandeshkr419 added v2.18.0 Issues and PRs related to version 2.18.0 and removed v2.17.0 labels Sep 18, 2024

sandeshkr419 closed this Sep 18, 2024

sandeshkr419 reopened this Sep 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Latency improvements to Multi Term Aggregations #14993

Latency improvements to Multi Term Aggregations #14993

expani commented Jul 29, 2024 •

edited

Loading

github-actions bot commented Jul 29, 2024

bowenlan-amzn left a comment

sandeshkr419 commented Jul 31, 2024 •

edited

Loading

expani commented Jul 31, 2024

github-actions bot commented Sep 6, 2024

expani commented Sep 6, 2024

github-actions bot commented Sep 6, 2024

github-actions bot commented Sep 10, 2024

github-actions bot commented Sep 10, 2024

github-actions bot commented Sep 10, 2024

sandeshkr419 commented Sep 18, 2024

github-actions bot commented Sep 18, 2024

Latency improvements to Multi Term Aggregations #14993

Are you sure you want to change the base?

Latency improvements to Multi Term Aggregations #14993

Conversation

expani commented Jul 29, 2024 • edited Loading

Description

Allocation Profiling

Testing

github-actions bot commented Jul 29, 2024

bowenlan-amzn left a comment

Choose a reason for hiding this comment

sandeshkr419 commented Jul 31, 2024 • edited Loading

expani commented Jul 31, 2024

github-actions bot commented Sep 6, 2024

expani commented Sep 6, 2024

github-actions bot commented Sep 6, 2024

github-actions bot commented Sep 10, 2024

github-actions bot commented Sep 10, 2024

github-actions bot commented Sep 10, 2024

sandeshkr419 commented Sep 18, 2024

github-actions bot commented Sep 18, 2024

expani commented Jul 29, 2024 •

edited

Loading

sandeshkr419 commented Jul 31, 2024 •

edited

Loading