-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase timeouts in TimeSeriesLifecycleActionsIT#testWaitForSnapshot and testWaitForSnapshotSlmExecutedBefore test #50818
Increase timeouts in TimeSeriesLifecycleActionsIT#testWaitForSnapshot and testWaitForSnapshotSlmExecutedBefore test #50818
Conversation
This change adds some randomness and cleanup step to TimeSeriesLifecycleActionsIT#testWaitForSnapshot and testWaitForSnapshotSlmExecutedBefore tests in attempt to make them stable. Reletes to elastic#50781
Pinging @elastic/es-core-features (:Core/Features/ILM+SLM) |
@elasticmachine run elasticsearch-ci/2 |
@probakowski can you explain what makes this test fail maybe? It's hard for me to gauge this PR without an understanding of the failure. |
@probakowski I can't tell either how this fixes the test, is there a problem with the SLM policy not using the same ID or repository between tests? |
As I can't reproduce it locally (neither can @martijnvg or PR CI) this is in guess territory. |
@probakowski how much of the default timeout that you want to up to 2min here are we using on a normal test run, did you try that out? Maybe there is some cron issue here (just guessing because we had that) and we can reproduce this by moving to half the current timeout or so? |
@original-brownbear default is 10s, in my local env I can go to as low as 2s without errors and to 1s with ~50% success rate. Initially I wanted to use 20s for starters but then I found multiple places in the same class that use 2min with the same context (waiting for snapshot to happen) introduced by @dakrone. So I went with the same value for consistency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for timeout increases. I think if it keeps failing we should change the ILM completed
step to also include the output of the SLM policy retrieval (since that would hopefully include the state of the last manually executed snapshot)
… and testWaitForSnapshotSlmExecutedBefore test (elastic#50818) * Fix flaky TimeSeriesLifecycleActionsIT#testWaitForSnapshot test This change adds some randomness and cleanup step to TimeSeriesLifecycleActionsIT#testWaitForSnapshot and testWaitForSnapshotSlmExecutedBefore tests in attempt to make them stable. Reletes to elastic#50781 * Formatting changes * Longer timeout
This change adds some randomness and cleanup step to TimeSeriesLifecycleActionsIT#testWaitForSnapshot and testWaitForSnapshotSlmExecutedBefore tests in attempt to make them stable.
Reletes to #50781