Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add log for ShardLockException in InternalTestCluster #13632

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

Hailong-am
Copy link

@Hailong-am Hailong-am commented May 13, 2024

Description

log out shardLockException for InternalTestCluster, it will helps to troubleshot failed IT test cases.

Related Issues

#13628

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • API changes companion pull request created.
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Hailong Cui <ihailong@amazon.com>
Copy link
Contributor

❌ Gradle check result for c6960c2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@@ -2770,6 +2770,7 @@ public synchronized void assertAfterTest() throws Exception {
try {
env.shardLock(id, "InternalTestCluster assert after test", TimeUnit.SECONDS.toMillis(5)).close();
} catch (ShardLockObtainFailedException ex) {
logger.error("Obtained shard lock failed", ex);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The below fail function should effectively fail the tests with the needed logging already?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fail function will log something like java.lang.AssertionError: Shard [.plugins-ml-config][0] is still locked after 5 sec waiting, it will have shard information, but lack of why it still been locked and who hold the lock? that's the main purpose of add this log. The exception will have these information.

org.opensearch.env.ShardLockObtainFailedException: [.plugins-ml-config][0]: obtaining shard lock for [InternalTestCluster assert after test] timed out after [5000ms], lock already held for [starting shard] with age [20365ms]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe improve the fail logging instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, changed to improve existing fail assertion message.

Copy link
Collaborator

@gaobinlong gaobinlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, you may merge the latest main branch into your branch to rerun the gradle checks, make sure all checks can pass.

Copy link
Contributor

❌ Gradle check result for 300ecc9: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 35df6aa: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

✅ Gradle check result for 0aed43a: SUCCESS

Copy link

codecov bot commented May 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.56%. Comparing base (b15cb0c) to head (0aed43a).
Report is 462 commits behind head on main.

Current head 0aed43a differs from pull request most recent head 35df6aa

Please upload reports for the commit 35df6aa to get more accurate results.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #13632      +/-   ##
============================================
+ Coverage     71.42%   71.56%   +0.14%     
- Complexity    59978    61158    +1180     
============================================
  Files          4985     5059      +74     
  Lines        282275   287522    +5247     
  Branches      40946    41646     +700     
============================================
+ Hits         201603   205773    +4170     
- Misses        63999    64792     +793     
- Partials      16673    16957     +284     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stalled Issues that have stalled
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants