Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase timeouts in TimeSeriesLifecycleActionsIT#testWaitForSnapshot and testWaitForSnapshotSlmExecutedBefore test #50818

Merged

Conversation

probakowski
Copy link
Contributor

This change adds some randomness and cleanup step to TimeSeriesLifecycleActionsIT#testWaitForSnapshot and testWaitForSnapshotSlmExecutedBefore tests in attempt to make them stable.

Reletes to #50781

This change adds some randomness and cleanup step to TimeSeriesLifecycleActionsIT#testWaitForSnapshot and testWaitForSnapshotSlmExecutedBefore tests in attempt to make them stable.

Reletes to elastic#50781
@probakowski probakowski added >test Issues or PRs that are addressing/adding tests v7.6.0 v8.0.0 :Data Management/ILM+SLM Index and Snapshot lifecycle management labels Jan 9, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/ILM+SLM)

@probakowski
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/2

@original-brownbear
Copy link
Member

@probakowski can you explain what makes this test fail maybe? It's hard for me to gauge this PR without an understanding of the failure.

@dakrone
Copy link
Member

dakrone commented Jan 9, 2020

@probakowski I can't tell either how this fixes the test, is there a problem with the SLM policy not using the same ID or repository between tests?

@probakowski
Copy link
Contributor Author

As I can't reproduce it locally (neither can @martijnvg or PR CI) this is in guess territory.
My idea is that maybe hammering the same repo multiple times in the row can lengthen the time it takes to create a snapshot or that taking snapshot in intake job just takes longer.
I've seen some longer timeout (2 minutes) in testDeleteDuringSnapshot so I assume it can be connected. I've extended timeout to be the same in my tests.
All changes from this PR should be safe in the sense that they shouldn't make things worse but they can improve stability.

@original-brownbear
Copy link
Member

@probakowski how much of the default timeout that you want to up to 2min here are we using on a normal test run, did you try that out? Maybe there is some cron issue here (just guessing because we had that) and we can reproduce this by moving to half the current timeout or so?

@probakowski
Copy link
Contributor Author

probakowski commented Jan 10, 2020

@original-brownbear default is 10s, in my local env I can go to as low as 2s without errors and to 1s with ~50% success rate. Initially I wanted to use 20s for starters but then I found multiple places in the same class that use 2min with the same context (waiting for snapshot to happen) introduced by @dakrone. So I went with the same value for consistency.

@probakowski probakowski changed the title Fix flaky TimeSeriesLifecycleActionsIT#testWaitForSnapshot test Increase timeouts in TimeSeriesLifecycleActionsIT#testWaitForSnapshot and testWaitForSnapshotSlmExecutedBefore test Jan 13, 2020
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for timeout increases. I think if it keeps failing we should change the ILM completed step to also include the output of the SLM policy retrieval (since that would hopefully include the state of the last manually executed snapshot)

@probakowski probakowski merged commit 36079d4 into elastic:master Jan 14, 2020
@probakowski probakowski deleted the fix-ilm-wait-for-snapshot-tests branch January 14, 2020 00:08
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020
… and testWaitForSnapshotSlmExecutedBefore test (elastic#50818)

* Fix flaky TimeSeriesLifecycleActionsIT#testWaitForSnapshot test

This change adds some randomness and cleanup step to TimeSeriesLifecycleActionsIT#testWaitForSnapshot and testWaitForSnapshotSlmExecutedBefore tests in attempt to make them stable.

Reletes to elastic#50781

* Formatting changes

* Longer timeout
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management >test Issues or PRs that are addressing/adding tests v7.6.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants