Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] org.elasticsearch.xpack.slm.SLMSnapshotBlockingIntegTests.testRetentionWhileSnapshotInProgress failing in master #48441

Closed
dliappis opened this issue Oct 24, 2019 · 7 comments
Assignees
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management >test-failure Triaged test failures from CI

Comments

@dliappis
Copy link
Contributor

Example build failure

https://gradle-enterprise.elastic.co/s/7suf37f4fip6c

This test had been failing again in the past and #48219 ought to have fixed all failures.

Reproduction line

Unable to reproduce locally and on OpenSUSE 15-1 CI worker with 100 iterations.

./gradlew ':x-pack:plugin:ilm:test' --tests "org.elasticsearch.xpack.slm.SLMSnapshotBlockingIntegTests.testRetentionWhileSnapshotInProgress" \
  -Dtests.seed=6383DEA8FA5C40E0 \
  -Dtests.security.manager=true \
  -Dtests.locale=ro \
  -Dtests.timezone=US/Mountain \
  -Dcompiler.java=12 \
  -Druntime.java=11

Example relevant log:

09:17:17   1> [2019-10-24T00:17:14,903][ERROR][o.e.x.s.SLMSnapshotBlockingIntegTests] [testRetentionWhileSnapshotInProgress] expected a snapshot but it was missing
09:17:17   1> org.elasticsearch.snapshots.SnapshotMissingException: [my-repo:snap-uqq_izxhqsw3pwrl3sfntg] is missing
09:17:17   1> 	at org.elasticsearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction.buildResponse(TransportSnapshotsStatusAction.java:219) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
09:17:17   1> 	at org.elasticsearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction.masterOperation(TransportSnapshotsStatusAction.java:105) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
09:17:17   1> 	at org.elasticsearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction.masterOperation(TransportSnapshotsStatusAction.java:65) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
09:17:17   1> 	at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.lambda$doStart$3(TransportMasterNodeAction.java:166) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
09:17:17   1> 	at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
09:17:17   1> 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:769) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
09:17:17   1> 	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
09:17:17   1> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
09:17:17   1> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
09:17:17   1> 	at java.lang.Thread.run(Thread.java:834) [?:?]

Frequency

3 times since Oct 24 2019

@dliappis dliappis added >test-failure Triaged test failures from CI :Data Management/ILM+SLM Index and Snapshot lifecycle management labels Oct 24, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/ILM+SLM)

@dliappis
Copy link
Contributor Author

cc @gwbrown since you worked on this in #48219

dliappis added a commit to dliappis/elasticsearch that referenced this issue Oct 24, 2019
dliappis added a commit that referenced this issue Oct 24, 2019
@dliappis
Copy link
Contributor Author

Looking at it further this seems to have been triggered since #47766 got merged; cc @russcam

@gwbrown
Copy link
Contributor

gwbrown commented Oct 24, 2019

From the build scan, it looks like the error that's causing the failure is:

org.elasticsearch.repositories.RepositoryException: [my-repo] Could not determine repository generation from root blobsClose stacktrace
at __randomizedtesting.SeedInfo.seed([6383DEA8FA5C40E0:1E59C0B220286E4B]:0)
at org.elasticsearch.repositories.blobstore.BlobStoreRepository.getRepositoryData(BlobStoreRepository.java:906)
at org.elasticsearch.snapshots.SnapshotsService.getRepositoryData(SnapshotsService.java:163)
at org.elasticsearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction.buildResponse(TransportSnapshotsStatusAction.java:201)
at org.elasticsearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction.masterOperation(TransportSnapshotsStatusAction.java:105)
at org.elasticsearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction.masterOperation(TransportSnapshotsStatusAction.java:65)
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.lambda$doStart$3(TransportMasterNodeAction.java:166)
at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:769)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.lang.Thread.run(Thread.java:834)

There are several SnapshotMissingExceptions in the logs from the test, but that's to be expected as this test waits for a snapshot to be created, and if that takes time we'll see some of those exceptions in the logs.

This looks like it's due to the problem fixed in #48433, so I'm going to unmute this in master now that #48433 has been merged. I'll leave this issue open as that fix hasn't yet been backported to all impacted branches.

@alpar-t
Copy link
Contributor

alpar-t commented Oct 25, 2019

It seems that it failed again after that fix was merged: https://gradle-enterprise.elastic.co/s/3fy7jhk2xxi4a ( sorry this is just based on timing, didn't get the chance to look at commits )

@gwbrown
Copy link
Contributor

gwbrown commented Oct 26, 2019

It looks like there was another call that needed RepositoryException in this test, I've opened #48548 which addresses that and similar issues in another test (see #46021).

@gwbrown
Copy link
Contributor

gwbrown commented Oct 28, 2019

Now that #48548 has been merged and back ported, this failure should be fixed. I'm going to close this, please reopen if the failure shows up again.

@gwbrown gwbrown closed this as completed Oct 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

4 participants