Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] StableClusterManagerDisruptionIT.testStaleClusterManagerNotHijackingMajority (Random Test Failure) #1565

Closed
CEHENKLE opened this issue Nov 16, 2021 · 7 comments · Fixed by #13463
Assignees
Labels
bug Something isn't working Cluster Manager flaky-test Random test failure that succeeds on second run

Comments

@CEHENKLE
Copy link
Member

Describe the bug
Random Test Failure. Please dig in, and figure out what went wrong :(

https://ci.opensearch.org/logs/ci/workflow/OpenSearch_CI/PR_Checks/Gradle_Check/gradle_check_1066_reports.zip

@CEHENKLE CEHENKLE added bug Something isn't working untriaged flaky-test Random test failure that succeeds on second run and removed untriaged labels Nov 16, 2021
@tlfeng
Copy link
Collaborator

tlfeng commented Mar 13, 2022

Add more information:

https://ci.opensearch.org/logs/ci/workflow/OpenSearch_CI/PR_Checks/Gradle_Check/gradle_check_1066.log

> Task :server:internalClusterTest

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.discovery.StableMasterDisruptionIT.testStaleMasterNotHijackingMajority" -Dtests.seed=69CC1732A5C19596 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=sv -Dtests.timezone=America/Mexico_City -Druntime.java=17

org.opensearch.discovery.StableMasterDisruptionIT > testStaleMasterNotHijackingMajority FAILED
    java.lang.AssertionError: node_t2: [Tuple [v1=node_t1, v2=null]]
        at __randomizedtesting.SeedInfo.seed([69CC1732A5C19596:36CA5A3D841A4A9A]:0)
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.assertTrue(Assert.java:41)
        at org.opensearch.discovery.StableMasterDisruptionIT.lambda$testStaleMasterNotHijackingMajority$5(StableMasterDisruptionIT.java:253)
        at org.opensearch.test.OpenSearchTestCase.assertBusy(OpenSearchTestCase.java:1048)
        at org.opensearch.test.OpenSearchTestCase.assertBusy(OpenSearchTestCase.java:1021)
        at org.opensearch.discovery.StableMasterDisruptionIT.testStaleMasterNotHijackingMajority(StableMasterDisruptionIT.java:250)

@saratvemulapalli
Copy link
Member

#2541 (comment)

@Poojita-Raj Poojita-Raj changed the title [BUG] StableMasterDisruptionIT.testStaleMasterNotHijackingMajority (Random Test Failure) [BUG] StableClusterManagerDisruptionIT.testStaleClusterManagerNotHijackingMajority (Random Test Failure) Nov 12, 2022
@Poojita-Raj
Copy link
Contributor

Test renamed following new naming convention of cluster manager instead of master node.

@dreamer-89
Copy link
Member

One more occurrence #6838 (comment)

@rahulkarajgikar
Copy link
Contributor

Checking

@rahulkarajgikar
Copy link
Contributor

Ran 5000 iterations of the test locally and did not see any failures:

 $ ./gradlew ':server:internalClusterTest' --tests "org.opensearch.discovery.StableClusterManagerDisruptionIT.testStaleClusterManagerNotHijackingMajority" -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=sv -Dtests.timezone=America/Mexico_City -Druntime.java=17 -Dtests.iters=5000 -Dtests.timeoutSuite=180000000!
Starting a Gradle Daemon, 1 busy Daemon could not be reused, use --status for details

> Configure project :
========================= WARNING =========================
         Backwards compatibility tests are disabled!
See https://github.com/opensearch-project/OpenSearch/issues/4173
===========================================================
=======================================
OpenSearch Build Hamster says Hello!
  Gradle Version        : 8.4
  OS Info               : Mac OS X 14.3.1 (aarch64)
  Runtime JDK Version   : 17 (Amazon Corretto JDK)
  Runtime java.home     : /Library/Java/JavaVirtualMachines/amazon-corretto-17.jdk/Contents/Home
  Gradle JDK Version    : 21 (Amazon Corretto JDK)
  Gradle java.home      : /Library/Java/JavaVirtualMachines/amazon-corretto-21.jdk/Contents/Home
  Random Testing Seed   : 9F886D8E98DA3AB1
  In FIPS 140 mode      : false
=======================================
WARNING: A terminally deprecated method in java.lang.System has been called
WARNING: System::setSecurityManager has been called by org.opensearch.bootstrap.BootstrapForTesting (file:/Users/karajgik/workplace/OpenSearch_karajgik/OpenSearch/test/framework/build/distributions/framework-3.0.0-SNAPSHOT.jar)
WARNING: Please consider reporting this to the maintainers of org.opensearch.bootstrap.BootstrapForTesting
WARNING: System::setSecurityManager will be removed in a future release
WARNING: A terminally deprecated method in java.lang.System has been called
WARNING: System::setSecurityManager has been called by org.gradle.api.internal.tasks.testing.worker.TestWorker (file:/Users/karajgik/.gradle/wrapper/dists/gradle-8.4-all/56r6xik2f6skrm47et0ibifug/gradle-8.4/lib/plugins/gradle-testing-base-8.4.jar)
WARNING: Please consider reporting this to the maintainers of org.gradle.api.internal.tasks.testing.worker.TestWorker
WARNING: System::setSecurityManager will be removed in a future release

BUILD SUCCESSFUL in 17h 59m 55s
55 actionable tasks: 1 executed, 54 up-to-date

@rahulkarajgikar
Copy link
Contributor

rahulkarajgikar commented Apr 30, 2024

Test sets cluster publish timeout to 1s. Was able to reproduce only when setting cluster publish timeout to 10ms.

Although was not able to reproduce the error with default values, will raise PR to increase cluster publish timeout to 2s in the test to get rid of flakiness

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Cluster Manager flaky-test Random test failure that succeeds on second run
Projects
Status: ✅ Done
Development

Successfully merging a pull request may close this issue.

8 participants