Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AUTOCUT] Gradle Check Flaky Test Report for RemoteSegmentTransferTrackerTests #14325

Open
opensearch-ci-bot opened this issue Jun 13, 2024 · 4 comments · Fixed by #15187
Open
Assignees
Labels
autocut flaky-test Random test failure that succeeds on second run Storage:Remote >test-failure Test failure from CI, local build, etc.

Comments

@opensearch-ci-bot
Copy link
Collaborator

opensearch-ci-bot commented Jun 13, 2024

Flaky Test Report for RemoteSegmentTransferTrackerTests

Noticed the RemoteSegmentTransferTrackerTests has some flaky, failing tests that failed during post-merge actions.

Details

Git Reference Merged Pull Request Build Details Test Name
16c8806 13724 41044 org.opensearch.index.remote.RemoteSegmentTransferTrackerTests.testComputeTimeLagOnUpdate
21d3aaa 14399 41157 org.opensearch.index.remote.RemoteSegmentTransferTrackerTests.testComputeTimeLagOnUpdate
51af2e2 14660 42106 org.opensearch.index.remote.RemoteSegmentTransferTrackerTests.testComputeTimeLagOnUpdate
68bdb77 15143 44533 org.opensearch.index.remote.RemoteSegmentTransferTrackerTests.testComputeTimeLagOnUpdate
bd0deec 13932 39630 org.opensearch.index.remote.RemoteSegmentTransferTrackerTests.testComputeTimeLagOnUpdate
c92e125 14361 41039 org.opensearch.index.remote.RemoteSegmentTransferTrackerTests.testComputeTimeLagOnUpdate
c9ff7ce 14827 42817 org.opensearch.index.remote.RemoteSegmentTransferTrackerTests.testComputeTimeLagOnUpdate
e076e66 15348 44994 org.opensearch.index.remote.RemoteSegmentTransferTrackerTests.testGetInflightUploadBytes
ffa67f9 14963 43644 org.opensearch.index.remote.RemoteSegmentTransferTrackerTests.testComputeTimeLagOnUpdate

The other pull requests, besides those involved in post-merge actions, that contain failing tests with the RemoteSegmentTransferTrackerTests class are:

For more details on the failed tests refer to OpenSearch Gradle Check Metrics dashboard.

@lukas-vlcek
Copy link
Contributor

If no one is working on this one I would like to give it a try. Feel free to assign to me.

lukas-vlcek added a commit to lukas-vlcek/OpenSearch that referenced this issue Aug 9, 2024
Current implementation of [`RemoteSegmentTransferTrackerTests.testComputeTimeLagOnUpdate()`](https://github.com/opensearch-project/OpenSearch/blob/2b17902643738f0d2a75ade7c85cbca94d18ce49/server/src/test/java/org/opensearch/index/remote/RemoteSegmentTransferTrackerTests.java#L139) test rely on some assumptions about how fast the testing code will finish in JVM. Moreover it does not precisely control boundaries of the time span, specifically the start of the span because it is determined by internal implementation of [`RemoteSegmentTransferTracker.getTimeMsLag()`](https://github.com/opensearch-project/OpenSearch/blob/2b17902643738f0d2a75ade7c85cbca94d18ce49/server/src/main/java/org/opensearch/index/remote/RemoteSegmentTransferTracker.java#L262) which indirectly makes call to `System.nanoTime()`.

This commit loosens the assumption that the test code execution will finish within +/-20ms. Instead it only assumes that the execution time span won't be shorter than predefined (and controlled) thread sleep interval and any larger interval value is considered a success.

The whole point of this test is not to verify execution speed with defined precision. Instead the point is that the [`getTimeMsLag()`](https://github.com/opensearch-project/OpenSearch/blob/2b17902643738f0d2a75ade7c85cbca94d18ce49/server/src/main/java/org/opensearch/index/remote/RemoteSegmentTransferTracker.java#L262) method returns either 0 (for specific conditions) or possitive number (assuming that `remoteRefreshStartTimeMs` is not greater than `System.nanoTime()`).

Closes: opensearch-project#14325

Signed-off-by: Lukáš Vlček <lukas.vlcek@aiven.io>
lukas-vlcek added a commit to lukas-vlcek/OpenSearch that referenced this issue Aug 13, 2024
Current implementation of [`RemoteSegmentTransferTrackerTests.testComputeTimeLagOnUpdate()`](https://github.com/opensearch-project/OpenSearch/blob/2b17902643738f0d2a75ade7c85cbca94d18ce49/server/src/test/java/org/opensearch/index/remote/RemoteSegmentTransferTrackerTests.java#L139) test rely on some assumptions about how fast the testing code will finish in JVM. Moreover it does not precisely control boundaries of the time span, specifically the start of the span because it is determined by internal implementation of [`RemoteSegmentTransferTracker.getTimeMsLag()`](https://github.com/opensearch-project/OpenSearch/blob/2b17902643738f0d2a75ade7c85cbca94d18ce49/server/src/main/java/org/opensearch/index/remote/RemoteSegmentTransferTracker.java#L262) which indirectly makes call to `System.nanoTime()`.

This commit loosens the assumption that the test code execution will finish within +/-20ms. Instead it only assumes that the execution time span won't be shorter than predefined (and controlled) thread sleep interval and any larger interval value is considered a success.

The whole point of this test is not to verify execution speed with defined precision. Instead the point is that the [`getTimeMsLag()`](https://github.com/opensearch-project/OpenSearch/blob/2b17902643738f0d2a75ade7c85cbca94d18ce49/server/src/main/java/org/opensearch/index/remote/RemoteSegmentTransferTracker.java#L262) method returns either 0 (for specific conditions) or possitive number (assuming that `remoteRefreshStartTimeMs` is not greater than `System.nanoTime()`).

Closes: opensearch-project#14325

Signed-off-by: Lukáš Vlček <lukas.vlcek@aiven.io>
lukas-vlcek added a commit to lukas-vlcek/OpenSearch that referenced this issue Aug 13, 2024
Current implementation of [`RemoteSegmentTransferTrackerTests.testComputeTimeLagOnUpdate()`](https://github.com/opensearch-project/OpenSearch/blob/2b17902643738f0d2a75ade7c85cbca94d18ce49/server/src/test/java/org/opensearch/index/remote/RemoteSegmentTransferTrackerTests.java#L139) test rely on some assumptions about how fast the testing code will finish in JVM. Moreover it does not precisely control boundaries of the time span, specifically the start of the span because it is determined by internal implementation of [`RemoteSegmentTransferTracker.getTimeMsLag()`](https://github.com/opensearch-project/OpenSearch/blob/2b17902643738f0d2a75ade7c85cbca94d18ce49/server/src/main/java/org/opensearch/index/remote/RemoteSegmentTransferTracker.java#L262) which indirectly makes call to `System.nanoTime()`.

This commit loosens the assumption that the test code execution will finish within +/-20ms. Instead it only assumes that the execution time span won't be shorter than predefined (and controlled) thread sleep interval and any larger interval value is considered a success.

The whole point of this test is not to verify execution speed with defined precision. Instead the point is that the [`getTimeMsLag()`](https://github.com/opensearch-project/OpenSearch/blob/2b17902643738f0d2a75ade7c85cbca94d18ce49/server/src/main/java/org/opensearch/index/remote/RemoteSegmentTransferTracker.java#L262) method returns either 0 (for specific conditions) or possitive number (assuming that `remoteRefreshStartTimeMs` is not greater than `System.nanoTime()`).

Closes: opensearch-project#14325

Signed-off-by: Lukáš Vlček <lukas.vlcek@aiven.io>
lukas-vlcek added a commit to lukas-vlcek/OpenSearch that referenced this issue Aug 13, 2024
Current implementation of [`RemoteSegmentTransferTrackerTests.testComputeTimeLagOnUpdate()`](https://github.com/opensearch-project/OpenSearch/blob/2b17902643738f0d2a75ade7c85cbca94d18ce49/server/src/test/java/org/opensearch/index/remote/RemoteSegmentTransferTrackerTests.java#L139) test rely on some assumptions about how fast the testing code will finish in JVM. Moreover it does not precisely control boundaries of the time span, specifically the start of the span because it is determined by internal implementation of [`RemoteSegmentTransferTracker.getTimeMsLag()`](https://github.com/opensearch-project/OpenSearch/blob/2b17902643738f0d2a75ade7c85cbca94d18ce49/server/src/main/java/org/opensearch/index/remote/RemoteSegmentTransferTracker.java#L262) which indirectly makes call to `System.nanoTime()`.

This commit loosens the assumption that the test code execution will finish within +/-20ms. Instead it only assumes that the execution time span won't be shorter than predefined (and controlled) thread sleep interval and any larger interval value is considered a success.

The whole point of this test is not to verify execution speed with defined precision. Instead the point is that the [`getTimeMsLag()`](https://github.com/opensearch-project/OpenSearch/blob/2b17902643738f0d2a75ade7c85cbca94d18ce49/server/src/main/java/org/opensearch/index/remote/RemoteSegmentTransferTracker.java#L262) method returns either 0 (for specific conditions) or possitive number (assuming that `remoteRefreshStartTimeMs` is not greater than `System.nanoTime()`).

Closes: opensearch-project#14325

Signed-off-by: Lukáš Vlček <lukas.vlcek@aiven.io>
lukas-vlcek added a commit to lukas-vlcek/OpenSearch that referenced this issue Aug 14, 2024
Current implementation of [`RemoteSegmentTransferTrackerTests.testComputeTimeLagOnUpdate()`](https://github.com/opensearch-project/OpenSearch/blob/2b17902643738f0d2a75ade7c85cbca94d18ce49/server/src/test/java/org/opensearch/index/remote/RemoteSegmentTransferTrackerTests.java#L139) test rely on some assumptions about how fast the testing code will finish in JVM. Moreover it does not precisely control boundaries of the time span, specifically the start of the span because it is determined by internal implementation of [`RemoteSegmentTransferTracker.getTimeMsLag()`](https://github.com/opensearch-project/OpenSearch/blob/2b17902643738f0d2a75ade7c85cbca94d18ce49/server/src/main/java/org/opensearch/index/remote/RemoteSegmentTransferTracker.java#L262) which indirectly makes call to `System.nanoTime()`.

This commit loosens the assumption that the test code execution will finish within +/-20ms. Instead it only assumes that the execution time span won't be shorter than predefined (and controlled) thread sleep interval and any larger interval value is considered a success.

The whole point of this test is not to verify execution speed with defined precision. Instead the point is that the [`getTimeMsLag()`](https://github.com/opensearch-project/OpenSearch/blob/2b17902643738f0d2a75ade7c85cbca94d18ce49/server/src/main/java/org/opensearch/index/remote/RemoteSegmentTransferTracker.java#L262) method returns either 0 (for specific conditions) or possitive number (assuming that `remoteRefreshStartTimeMs` is not greater than `System.nanoTime()`).

Closes: opensearch-project#14325

Signed-off-by: Lukáš Vlček <lukas.vlcek@aiven.io>
@dblock dblock removed the untriaged label Sep 9, 2024
@dblock
Copy link
Member

dblock commented Sep 9, 2024

[Catch All Triage - 1, 2, 3, 4, 5]

@prachi-gaonkar
Copy link
Contributor

Hi Team, we are also getting the same error on ppc64le VM

RemoteSegmentTransferTrackerTests > testComputeTimeLagOnUpdate STANDARD_ERROR
REPRODUCE WITH: ./gradlew ':server:test' --tests "org.opensearch.index.remote.RemoteSegmentTransferTrackerTests.testComputeTimeLagOnUpdate" -Dtests.seed=1489833A008E8B2 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=bg-BG -Dtests.timezone=Indian/Kerguelen -Druntime.java=21

RemoteSegmentTransferTrackerTests > testComputeTimeLagOnUpdate FAILED
java.lang.AssertionError
at __randomizedtesting.SeedInfo.seed([1489833A008E8B2:531AB8310E2D8828]:0)
at org.junit.Assert.fail(Assert.java:87)
at org.junit.Assert.assertTrue(Assert.java:42)
at org.junit.Assert.assertTrue(Assert.java:53)
at org.opensearch.index.remote.RemoteSegmentTransferTrackerTests.testComputeTimeLagOnUpdate(RemoteSegmentTransferTrackerTests.java:157)

@prachi-gaonkar
Copy link
Contributor

Hi Team
is there any updated on this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autocut flaky-test Random test failure that succeeds on second run Storage:Remote >test-failure Test failure from CI, local build, etc.
Projects
Status: 🏗 In progress
7 participants