Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClusterManagerTaskThrottler Improvements #15508

Conversation

sumitasr
Copy link
Member

@sumitasr sumitasr commented Aug 29, 2024

Description

ClusterManagerTaskThrottler Improvements

  • Add shallow check in ClusterManagerTaskThrottler's onBeginSubmit method before computeIfPresent to avoid lock when queue is full

  • Remove stack trace filling in ClusterManagerThrottlingException

  • Update log level from WARN to trace for performance improvements.

  • Added unit test to verify throttling during shallow check when lock is acquired on task count map by different thread.

Related Issues

#13741

Testing

  • Not seeing BLOCKED threads after the code changes deployed to cluster manager nodes in the thread dumps.
  • ClusterManagerThrottlingException is not filling up stack traces.
  • Added unit tests

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@sumitasr sumitasr changed the title ClusterManagerTaskThrottler Improvements [Draft - Do not review] ClusterManagerTaskThrottler Improvements Aug 29, 2024
@sumitasr sumitasr force-pushed the add_changes_for_task_throttle_improvements branch from 767f990 to b627fac Compare August 29, 2024 17:36
  + Add shallow check in ClusterManagerTaskThrottler's onBeginSubmit method before computeIfPresent to avoid lock when queue is full
  + Remove stack trace filling in ClusterManagerThrottlingException

Signed-off-by: Sumit Bansal <sumitsb@amazon.com>
@sumitasr sumitasr force-pushed the add_changes_for_task_throttle_improvements branch from b627fac to 2ada278 Compare August 29, 2024 17:39
Copy link
Contributor

❌ Gradle check result for 767f990: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for b627fac: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@sumitasr sumitasr changed the title [Draft - Do not review] ClusterManagerTaskThrottler Improvements [Draft] ClusterManagerTaskThrottler Improvements Aug 29, 2024
Signed-off-by: Sumit Bansal <sumitsb@amazon.com>
Copy link
Contributor

✅ Gradle check result for 2ada278: SUCCESS

Copy link
Contributor

❕ Gradle check result for 27b68c5: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Aug 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.90%. Comparing base (0753461) to head (7b98efe).
Report is 2 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #15508      +/-   ##
============================================
+ Coverage     71.85%   71.90%   +0.04%     
- Complexity    64054    64090      +36     
============================================
  Files          5269     5269              
  Lines        299679   299687       +8     
  Branches      43311    43310       -1     
============================================
+ Hits         215343   215480     +137     
+ Misses        66693    66480     -213     
- Partials      17643    17727      +84     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sumitasr sumitasr force-pushed the add_changes_for_task_throttle_improvements branch from 1ca0ac0 to 6249251 Compare September 3, 2024 05:41
Copy link
Contributor

github-actions bot commented Sep 3, 2024

❌ Gradle check result for 1ca0ac0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Sumit Bansal <sumitsb@amazon.com>
@sumitasr sumitasr force-pushed the add_changes_for_task_throttle_improvements branch from 6249251 to d80be09 Compare September 3, 2024 06:43
@sumitasr sumitasr changed the title [Draft] ClusterManagerTaskThrottler Improvements ClusterManagerTaskThrottler Improvements Sep 3, 2024
Signed-off-by: Sumit Bansal <sumitsb@amazon.com>
@sumitasr
Copy link
Member Author

sumitasr commented Sep 4, 2024

❌ Gradle check result for 37ea333: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

org.opensearch.backwards.IndexingIT.testUpdateSnapshotStatus
org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=snapshot.status/10_basic/Get snapshot status}
org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=snapshot.clone/10_basic/Clone a snapshot}
org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=snapshot.create/10_basic/Create a snapshot and clean up repository}
org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=snapshot.get/10_basic/Get snapshot info}
org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=snapshot.restore/10_basic/Create a snapshot and then restore it}
org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=search_pipeline/10_basic/Test Put Versioned Pipeline}
org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=snapshot.get/10_basic/Get snapshot info contains include_global_state}
org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=search.aggregation/230_composite/Composite aggregation with nested parent}
org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=cat.snapshots/10_basic/Test cat snapshots output}
org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=snapshot.get/10_basic/Get snapshot info with metadata}
org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=snapshot.create/10_basic/Create a snapshot}

Flaky #14302 and #14294

Copy link
Contributor

github-actions bot commented Sep 4, 2024

❕ Gradle check result for 7b98efe: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@shwetathareja shwetathareja merged commit 17b5f98 into opensearch-project:main Sep 4, 2024
33 of 34 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-15508-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 17b5f98c5e28c5b56fe88d39513c4ce119600b9c
# Push it to GitHub
git push --set-upstream origin backport/backport-15508-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-15508-to-2.x.

sumitasr added a commit to sumitasr/OpenSearch that referenced this pull request Sep 4, 2024
* [ClusterManagerTaskThrottler Improvements] : Add shallow check in ClusterManagerTaskThrottler to fail fast before computeIfPresent to avoid lock when queue is full

Signed-off-by: Sumit Bansal <sumitsb@amazon.com>
(cherry picked from commit 17b5f98)
@sumitasr
Copy link
Member Author

sumitasr commented Sep 4, 2024

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-15508-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 17b5f98c5e28c5b56fe88d39513c4ce119600b9c
# Push it to GitHub
git push --set-upstream origin backport/backport-15508-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-15508-to-2.x.

Raised backport - #15647

sumitasr added a commit to sumitasr/OpenSearch that referenced this pull request Sep 4, 2024
* [ClusterManagerTaskThrottler Improvements] : Add shallow check in ClusterManagerTaskThrottler to fail fast before computeIfPresent to avoid lock when queue is full

Signed-off-by: Sumit Bansal <sumitsb@amazon.com>
sumitasr added a commit to sumitasr/OpenSearch that referenced this pull request Sep 4, 2024
* [ClusterManagerTaskThrottler Improvements] : Add shallow check in ClusterManagerTaskThrottler to fail fast before computeIfPresent to avoid lock when queue is full

Signed-off-by: Sumit Bansal <sumitsb@amazon.com>
sumitasr added a commit to sumitasr/OpenSearch that referenced this pull request Sep 4, 2024
* [ClusterManagerTaskThrottler Improvements] : Add shallow check in ClusterManagerTaskThrottler to fail fast before computeIfPresent to avoid lock when queue is full

Signed-off-by: Sumit Bansal <sumitsb@amazon.com>
shwetathareja pushed a commit that referenced this pull request Sep 5, 2024
…5671)

* ClusterManagerTaskThrottler Improvements (#15508)

* [ClusterManagerTaskThrottler Improvements] : Add shallow check in ClusterManagerTaskThrottler to fail fast before computeIfPresent to avoid lock when queue is full

Signed-off-by: Sumit Bansal <sumitsb@amazon.com>
shwetathareja pushed a commit that referenced this pull request Sep 9, 2024
* ClusterManagerTaskThrottler Improvements (#15508)
Add shallow check in ClusterManagerTaskThrottler to fail fast before computeIfPresent to avoid lock when queue is full

Signed-off-by: Sumit Bansal <sumitsb@amazon.com>
(cherry picked from commit 17b5f98)
akolarkunnu pushed a commit to akolarkunnu/OpenSearch that referenced this pull request Sep 10, 2024
* [ClusterManagerTaskThrottler Improvements] : Add shallow check in ClusterManagerTaskThrottler to fail fast before computeIfPresent to avoid lock when queue is full

Signed-off-by: Sumit Bansal <sumitsb@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport-failed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants