Increase timeouts in TimeSeriesLifecycleActionsIT#testWaitForSnapshot and testWaitForSnapshotSlmExecutedBefore test #50818

probakowski · 2020-01-09T20:14:49Z

This change adds some randomness and cleanup step to TimeSeriesLifecycleActionsIT#testWaitForSnapshot and testWaitForSnapshotSlmExecutedBefore tests in attempt to make them stable.

Reletes to #50781

This change adds some randomness and cleanup step to TimeSeriesLifecycleActionsIT#testWaitForSnapshot and testWaitForSnapshotSlmExecutedBefore tests in attempt to make them stable. Reletes to elastic#50781

elasticmachine · 2020-01-09T20:17:47Z

Pinging @elastic/es-core-features (:Core/Features/ILM+SLM)

probakowski · 2020-01-09T20:20:40Z

@elasticmachine run elasticsearch-ci/2

original-brownbear · 2020-01-09T20:22:45Z

@probakowski can you explain what makes this test fail maybe? It's hard for me to gauge this PR without an understanding of the failure.

dakrone · 2020-01-09T20:48:43Z

@probakowski I can't tell either how this fixes the test, is there a problem with the SLM policy not using the same ID or repository between tests?

probakowski · 2020-01-09T23:18:37Z

As I can't reproduce it locally (neither can @martijnvg or PR CI) this is in guess territory.
My idea is that maybe hammering the same repo multiple times in the row can lengthen the time it takes to create a snapshot or that taking snapshot in intake job just takes longer.
I've seen some longer timeout (2 minutes) in testDeleteDuringSnapshot so I assume it can be connected. I've extended timeout to be the same in my tests.
All changes from this PR should be safe in the sense that they shouldn't make things worse but they can improve stability.

original-brownbear · 2020-01-10T10:48:44Z

@probakowski how much of the default timeout that you want to up to 2min here are we using on a normal test run, did you try that out? Maybe there is some cron issue here (just guessing because we had that) and we can reproduce this by moving to half the current timeout or so?

probakowski · 2020-01-10T21:33:25Z

@original-brownbear default is 10s, in my local env I can go to as low as 2s without errors and to 1s with ~50% success rate. Initially I wanted to use 20s for starters but then I found multiple places in the same class that use 2min with the same context (waiting for snapshot to happen) introduced by @dakrone. So I went with the same value for consistency.

…apshot-tests

dakrone

LGTM for timeout increases. I think if it keeps failing we should change the ILM completed step to also include the output of the SLM policy retrieval (since that would hopefully include the state of the last manually executed snapshot)

… and testWaitForSnapshotSlmExecutedBefore test (elastic#50818) * Fix flaky TimeSeriesLifecycleActionsIT#testWaitForSnapshot test This change adds some randomness and cleanup step to TimeSeriesLifecycleActionsIT#testWaitForSnapshot and testWaitForSnapshotSlmExecutedBefore tests in attempt to make them stable. Reletes to elastic#50781 * Formatting changes * Longer timeout

probakowski added 2 commits January 9, 2020 21:09

Fix flaky TimeSeriesLifecycleActionsIT#testWaitForSnapshot test

149e129

This change adds some randomness and cleanup step to TimeSeriesLifecycleActionsIT#testWaitForSnapshot and testWaitForSnapshotSlmExecutedBefore tests in attempt to make them stable. Reletes to elastic#50781

Merge branch 'master' into fix-ilm-wait-for-snapshot-tests

d485c43

probakowski requested review from dakrone and original-brownbear January 9, 2020 20:14

Formatting changes

ba3ed66

probakowski added >test Issues or PRs that are addressing/adding tests v7.6.0 v8.0.0 :Data Management/ILM+SLM Index and Snapshot lifecycle management labels Jan 9, 2020

Longer timeout

8ebff17

probakowski changed the title ~~Fix flaky TimeSeriesLifecycleActionsIT#testWaitForSnapshot test~~ Increase timeouts in TimeSeriesLifecycleActionsIT#testWaitForSnapshot and testWaitForSnapshotSlmExecutedBefore test Jan 13, 2020

Merge remote-tracking branch 'origin/master' into fix-ilm-wait-for-sn…

f2e7fa2

…apshot-tests

dakrone approved these changes Jan 13, 2020

View reviewed changes

probakowski merged commit 36079d4 into elastic:master Jan 14, 2020

probakowski deleted the fix-ilm-wait-for-snapshot-tests branch January 14, 2020 00:08

probakowski mentioned this pull request Jan 14, 2020

[CI] TimeSeriesLifecycleActionsIT testWaitForSnapshot failure #50781

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase timeouts in TimeSeriesLifecycleActionsIT#testWaitForSnapshot and testWaitForSnapshotSlmExecutedBefore test #50818

Increase timeouts in TimeSeriesLifecycleActionsIT#testWaitForSnapshot and testWaitForSnapshotSlmExecutedBefore test #50818

probakowski commented Jan 9, 2020

elasticmachine commented Jan 9, 2020

probakowski commented Jan 9, 2020

original-brownbear commented Jan 9, 2020

dakrone commented Jan 9, 2020

probakowski commented Jan 9, 2020

original-brownbear commented Jan 10, 2020

probakowski commented Jan 10, 2020 •

edited

Loading

dakrone left a comment

Increase timeouts in TimeSeriesLifecycleActionsIT#testWaitForSnapshot and testWaitForSnapshotSlmExecutedBefore test #50818

Increase timeouts in TimeSeriesLifecycleActionsIT#testWaitForSnapshot and testWaitForSnapshotSlmExecutedBefore test #50818

Conversation

probakowski commented Jan 9, 2020

elasticmachine commented Jan 9, 2020

probakowski commented Jan 9, 2020

original-brownbear commented Jan 9, 2020

dakrone commented Jan 9, 2020

probakowski commented Jan 9, 2020

original-brownbear commented Jan 10, 2020

probakowski commented Jan 10, 2020 • edited Loading

dakrone left a comment

Choose a reason for hiding this comment

probakowski commented Jan 10, 2020 •

edited

Loading