Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query-level resource usages tracking #13172

Merged

Conversation

ansjcy
Copy link
Member

@ansjcy ansjcy commented Apr 12, 2024

Description

  • Instrument resource usages before a task finishes on a data node, more specifically, get resource usages before a phase response is sent from query/fetch/.. phase.
  • Piggyback the resource usages data with shard search response headers.
  • Gather data node search tasks resource usages from headers, on coordinator node, and store them into the search context, so that we can infer the query-level resource usage.

Related Issues

Resolves #12399

benchmark tests

Did extensive benchmark tests, merged the tests results by calculating average on multiple runs, and here are the test results:

baseline-resulsts.txt
feature-ressults.txt

I don't see significant impact on search latency with this change.

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for 32f9a75: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

Compatibility status:

Checks if related components are compatible with change 32f9a75

Incompatible components

Incompatible components: [https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/sql.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/common-utils.git]

@ansjcy ansjcy changed the title Query-level resource usages tracking [DRAFT] Query-level resource usages tracking Apr 15, 2024
kkewwei pushed a commit to kkewwei/OpenSearch that referenced this pull request Jul 24, 2024
…s tracking (opensearch-project#14085)

* Query-level resource usages tracking (opensearch-project#13172)

* Query-level resource usages tracking

Signed-off-by: Chenyang Ji <cyji@amazon.com>

* Moving TaskResourceTrackingService to clusterService

Signed-off-by: Chenyang Ji <cyji@amazon.com>

* use shard response header to piggyback task resource usages

Signed-off-by: Chenyang Ji <cyji@amazon.com>

* split changes for query insights plugin

Signed-off-by: Chenyang Ji <cyji@amazon.com>

* improve the supplier logic and other misc items

Signed-off-by: Chenyang Ji <cyji@amazon.com>

* track resource usage for failed requests

Signed-off-by: Chenyang Ji <cyji@amazon.com>

* move resource usages interactions into TaskResourceTrackingService

Signed-off-by: Chenyang Ji <cyji@amazon.com>

---------

Signed-off-by: Chenyang Ji <cyji@amazon.com>
(cherry picked from commit 3d1fa98)

* fix concurrent modification issue in thread context (opensearch-project#14084)

Signed-off-by: Chenyang Ji <cyji@amazon.com>
(cherry picked from commit c8f0b6d)

* consume query level cpu and memory usage in query insights (opensearch-project#13739)

* consume query level cpu and memory usage in query insights

Signed-off-by: Chenyang Ji <cyji@amazon.com>

* handle failed requests metrics in query insights

Signed-off-by: Chenyang Ji <cyji@amazon.com>

* refactor the code to make it more maintainable

Signed-off-by: Chenyang Ji <cyji@amazon.com>

---------

Signed-off-by: Chenyang Ji <cyji@amazon.com>
(cherry picked from commit 04a417a)

* fix japicmp check for threadContext

Signed-off-by: Chenyang Ji <cyji@amazon.com>
(cherry picked from commit b403fdc)
Signed-off-by: kkewwei <kkewwei@163.com>
wdongyu pushed a commit to wdongyu/OpenSearch that referenced this pull request Aug 22, 2024
* Query-level resource usages tracking

Signed-off-by: Chenyang Ji <cyji@amazon.com>

* Moving TaskResourceTrackingService to clusterService

Signed-off-by: Chenyang Ji <cyji@amazon.com>

* use shard response header to piggyback task resource usages

Signed-off-by: Chenyang Ji <cyji@amazon.com>

* split changes for query insights plugin

Signed-off-by: Chenyang Ji <cyji@amazon.com>

* improve the supplier logic and other misc items

Signed-off-by: Chenyang Ji <cyji@amazon.com>

* track resource usage for failed requests

Signed-off-by: Chenyang Ji <cyji@amazon.com>

* move resource usages interactions into TaskResourceTrackingService

Signed-off-by: Chenyang Ji <cyji@amazon.com>

---------

Signed-off-by: Chenyang Ji <cyji@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport-failed enhancement Enhancement or improvement to existing feature or request Search:Query Insights v2.15.0 Issues and PRs related to version 2.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Query Insights] Capture query-level resource usage metrics
9 participants