Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug/sbp cancellation #13474

Merged
merged 16 commits into from
Jun 21, 2024
Merged

Conversation

kaushalmahi12
Copy link
Contributor

@kaushalmahi12 kaushalmahi12 commented Apr 30, 2024

Description

This PR is to address and fix the BUG: #13295

Changes

  • Refactor SearchBackpressureService to introduce resource wise cancellation when node in duress because of the resource
  • Move all resourceTrackers into a single class
  • Put the logic to calculate whether a resource usage is breaching for a task behind an interface and make it a instance member
  • Add an UT to cover the mentioned bug scenario

New Logic for Cancellation

SBP_cancellation

Related Issues

Resolves #13295

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for aa4fd2b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for bf11c85: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 5bcac55: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@ticheng-aws
Copy link
Contributor

Hi @kaushalmahi12, thank you for submitting this PR. Would you mind also creating a cancellation logic diagram similar to
what you've previously done #13295 (comment)? It would really help us grasp the changes for search backpressure.

Copy link
Contributor

❌ Gradle check result for 6b1c658: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for cd3e65b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 2c3c4bc: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 3c3c64e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
Copy link
Contributor

❕ Gradle check result for 214feba: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link
Contributor

❌ Gradle check result for becc022: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
Copy link
Contributor

✅ Gradle check result for bd55e42: SUCCESS

Copy link
Contributor

✅ Gradle check result for 0e38dee: SUCCESS

@sohami
Copy link
Collaborator

sohami commented Jun 21, 2024

❌ Gradle check result for becc022: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Failure related to IndicesRequestCacheIT timeout which was recently fixed by #14369

@sohami sohami merged commit bcccedb into opensearch-project:main Jun 21, 2024
29 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jun 21, 2024
* change cancellation logic to fix disparity bw trackers and resource duress

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add additional tests for searchBackpressureService and refactor code

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add enumMap instead of list for tracking taskResourceUsageTrackets

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add nodeNotInDuress test for nodeDuressTrackers class

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* address comments

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add entry in CHANGELOG

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* address comments

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* address comments

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* remove wildcard import

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* streamline imports

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* address comments

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add additional test case to test the circuit breaker for SBP logic

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add missing javadoc to resourece type enum

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add javadoc to a method

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* fix javadoc warnings

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* fix javadoc warnings

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

---------

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
(cherry picked from commit bcccedb)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
sohami pushed a commit that referenced this pull request Jun 24, 2024
* Bug/sbp cancellation (#13474)

* change cancellation logic to fix disparity bw trackers and resource duress

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add additional tests for searchBackpressureService and refactor code

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add enumMap instead of list for tracking taskResourceUsageTrackets

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add nodeNotInDuress test for nodeDuressTrackers class

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* address comments

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add entry in CHANGELOG

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* address comments

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* address comments

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* remove wildcard import

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* streamline imports

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* address comments

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add additional test case to test the circuit breaker for SBP logic

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add missing javadoc to resourece type enum

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add javadoc to a method

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* fix javadoc warnings

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* fix javadoc warnings

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

---------

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
(cherry picked from commit bcccedb)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Bug/sbp cancellation (#13474)

* change cancellation logic to fix disparity bw trackers and resource duress

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add additional tests for searchBackpressureService and refactor code

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add enumMap instead of list for tracking taskResourceUsageTrackets

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add nodeNotInDuress test for nodeDuressTrackers class

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* address comments

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add entry in CHANGELOG

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* address comments

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* address comments

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* remove wildcard import

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* streamline imports

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* address comments

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add additional test case to test the circuit breaker for SBP logic

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add missing javadoc to resourece type enum

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add javadoc to a method

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* fix javadoc warnings

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* fix javadoc warnings

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

---------

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* fix compilation error

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

---------

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
harshavamsi pushed a commit to harshavamsi/OpenSearch that referenced this pull request Jul 12, 2024
* change cancellation logic to fix disparity bw trackers and resource duress

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add additional tests for searchBackpressureService and refactor code

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add enumMap instead of list for tracking taskResourceUsageTrackets

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add nodeNotInDuress test for nodeDuressTrackers class

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* address comments

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add entry in CHANGELOG

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* address comments

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* address comments

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* remove wildcard import

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* streamline imports

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* address comments

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add additional test case to test the circuit breaker for SBP logic

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add missing javadoc to resourece type enum

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add javadoc to a method

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* fix javadoc warnings

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* fix javadoc warnings

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

---------

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
kkewwei pushed a commit to kkewwei/OpenSearch that referenced this pull request Jul 24, 2024
* Bug/sbp cancellation (opensearch-project#13474)

* change cancellation logic to fix disparity bw trackers and resource duress

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add additional tests for searchBackpressureService and refactor code

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add enumMap instead of list for tracking taskResourceUsageTrackets

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add nodeNotInDuress test for nodeDuressTrackers class

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* address comments

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add entry in CHANGELOG

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* address comments

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* address comments

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* remove wildcard import

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* streamline imports

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* address comments

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add additional test case to test the circuit breaker for SBP logic

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add missing javadoc to resourece type enum

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add javadoc to a method

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* fix javadoc warnings

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* fix javadoc warnings

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

---------

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
(cherry picked from commit bcccedb)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Bug/sbp cancellation (opensearch-project#13474)

* change cancellation logic to fix disparity bw trackers and resource duress

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add additional tests for searchBackpressureService and refactor code

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add enumMap instead of list for tracking taskResourceUsageTrackets

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add nodeNotInDuress test for nodeDuressTrackers class

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* address comments

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add entry in CHANGELOG

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* address comments

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* address comments

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* remove wildcard import

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* streamline imports

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* address comments

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add additional test case to test the circuit breaker for SBP logic

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add missing javadoc to resourece type enum

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add javadoc to a method

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* fix javadoc warnings

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* fix javadoc warnings

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

---------

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* fix compilation error

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

---------

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Signed-off-by: kkewwei <kkewwei@163.com>
wdongyu pushed a commit to wdongyu/OpenSearch that referenced this pull request Aug 22, 2024
* change cancellation logic to fix disparity bw trackers and resource duress

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add additional tests for searchBackpressureService and refactor code

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add enumMap instead of list for tracking taskResourceUsageTrackets

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add nodeNotInDuress test for nodeDuressTrackers class

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* address comments

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add entry in CHANGELOG

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* address comments

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* address comments

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* remove wildcard import

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* streamline imports

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* address comments

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add additional test case to test the circuit breaker for SBP logic

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add missing javadoc to resourece type enum

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* add javadoc to a method

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* fix javadoc warnings

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

* fix javadoc warnings

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>

---------

Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
@github-actions github-actions bot added bug Something isn't working enhancement Enhancement or improvement to existing feature or request Search:Performance Search:Resiliency labels Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch bug Something isn't working enhancement Enhancement or improvement to existing feature or request Search:Performance Search:Resiliency
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG][Search Backpressure] High Heap Usage Cancellation Due to High Node-Level CPU Utilization
4 participants