-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] Frequent failures in MixedClusterClientYamlTestSuiteIT running snapshot yaml tests #59986
Labels
:Distributed Coordination/Snapshot/Restore
Anything directly related to the `_snapshot/*` APIs
Team:Distributed (Obsolete)
Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
>test-failure
Triaged test failures from CI
Comments
Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore) |
There's a real bug in the |
original-brownbear
added a commit
to original-brownbear/elasticsearch
that referenced
this issue
Jul 21, 2020
There were two subtle bugs here from backporting elastic#56911 to 7.x. 1. We passed `null` for the `shards` map which isn't nullable any longer when creating `SnapshotsInProgress.Entry`, fixed by just passing an empty map like the `null` handling did in the past. 2. The removal of a failed `INIT` state snapshot from the cluster state tried removing it from the finalization loop (the set of repository names that are currently finalizing). This will trip an assertion since the snapshot failed before its repository was put into the set. I made the logic ignore the set in case we remove a failed `INIT` state snapshot to restore the old logic to exactly as it was before the concurrent snapshots backport to be on the safe side here. Also, added tests that explicitly call the old code paths because as can be seen from initially missing this, the BwC tests will only run in the configuration new version master, old version nodes ever so often and having a deterministic test for the old state machine seems the safest bet here. Closes elastic#59986
original-brownbear
added a commit
that referenced
this issue
Jul 22, 2020
There were two subtle bugs here from backporting #56911 to 7.x. 1. We passed `null` for the `shards` map which isn't nullable any longer when creating `SnapshotsInProgress.Entry`, fixed by just passing an empty map like the `null` handling did in the past. 2. The removal of a failed `INIT` state snapshot from the cluster state tried removing it from the finalization loop (the set of repository names that are currently finalizing). This will trip an assertion since the snapshot failed before its repository was put into the set. I made the logic ignore the set in case we remove a failed `INIT` state snapshot to restore the old logic to exactly as it was before the concurrent snapshots backport to be on the safe side here. Also, added tests that explicitly call the old code paths because as can be seen from initially missing this, the BwC tests will only run in the configuration new version master, old version nodes ever so often and having a deterministic test for the old state machine seems the safest bet here. Closes #59986
original-brownbear
added a commit
to original-brownbear/elasticsearch
that referenced
this issue
Jul 22, 2020
There were two subtle bugs here from backporting elastic#56911 to 7.x. 1. We passed `null` for the `shards` map which isn't nullable any longer when creating `SnapshotsInProgress.Entry`, fixed by just passing an empty map like the `null` handling did in the past. 2. The removal of a failed `INIT` state snapshot from the cluster state tried removing it from the finalization loop (the set of repository names that are currently finalizing). This will trip an assertion since the snapshot failed before its repository was put into the set. I made the logic ignore the set in case we remove a failed `INIT` state snapshot to restore the old logic to exactly as it was before the concurrent snapshots backport to be on the safe side here. Also, added tests that explicitly call the old code paths because as can be seen from initially missing this, the BwC tests will only run in the configuration new version master, old version nodes ever so often and having a deterministic test for the old state machine seems the safest bet here. Closes elastic#59986
Fixed by #60006 |
original-brownbear
added a commit
that referenced
this issue
Jul 22, 2020
There were two subtle bugs here from backporting #56911 to 7.x. 1. We passed `null` for the `shards` map which isn't nullable any longer when creating `SnapshotsInProgress.Entry`, fixed by just passing an empty map like the `null` handling did in the past. 2. The removal of a failed `INIT` state snapshot from the cluster state tried removing it from the finalization loop (the set of repository names that are currently finalizing). This will trip an assertion since the snapshot failed before its repository was put into the set. I made the logic ignore the set in case we remove a failed `INIT` state snapshot to restore the old logic to exactly as it was before the concurrent snapshots backport to be on the safe side here. Also, added tests that explicitly call the old code paths because as can be seen from initially missing this, the BwC tests will only run in the configuration new version master, old version nodes ever so often and having a deterministic test for the old state machine seems the safest bet here. Closes #59986
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore
Anything directly related to the `_snapshot/*` APIs
Team:Distributed (Obsolete)
Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
>test-failure
Triaged test failures from CI
Build scan:
https://gradle-enterprise.elastic.co/s/xbmzozo4o7tgw
https://gradle-enterprise.elastic.co/s/hymnioyws47co
https://gradle-enterprise.elastic.co/s/ixigz3nn3jsmg
Repro lines:
REPRODUCE WITH: ./gradlew ':qa:mixed-cluster:v7.1.1#mixedClusterTest' --tests "org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT"
-Dtests.method="test {p0=snapshot.get/10_basic/Get snapshot info when verbose is false}"
-Dtests.seed=41E70BDA7F09EFAD
-Dtests.security.manager=true
-Dtests.locale=ja-JP
-Dtests.timezone=Europe/Dublin
-Dtests.distribution=default
-Druntime.java=8
REPRODUCE WITH: ./gradlew ':qa:mixed-cluster:v7.1.1#mixedClusterTest' --tests "org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT"
-Dtests.method="test {p0=snapshot.create/10_basic/Create a snapshot}"
-Dtests.seed=41E70BDA7F09EFAD
-Dtests.security.manager=true
-Dtests.locale=ja-JP
-Dtests.timezone=Europe/Dublin
-Dtests.distribution=default
-Druntime.java=8
REPRODUCE WITH: ./gradlew ':qa:mixed-cluster:v7.1.1#mixedClusterTest' --tests "org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT"
-Dtests.method="test {p0=snapshot.create/10_basic/Create a snapshot for missing index}"
-Dtests.seed=41E70BDA7F09EFAD
-Dtests.security.manager=true
-Dtests.locale=ja-JP
-Dtests.timezone=Europe/Dublin
-Dtests.distribution=default
-Druntime.java=8
REPRODUCE WITH: ./gradlew ':qa:mixed-cluster:v7.1.1#mixedClusterTest' --tests "org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT"
-Dtests.method="test {p0=cat.snapshots/10_basic/Test cat snapshots output}"
-Dtests.seed=41E70BDA7F09EFAD
-Dtests.security.manager=true
-Dtests.locale=ja-JP
-Dtests.timezone=Europe/Dublin
-Dtests.distribution=default
-Druntime.java=8
REPRODUCE WITH: ./gradlew ':qa:mixed-cluster:v7.1.1#mixedClusterTest' --tests "org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT"
-Dtests.method="test {p0=snapshot.restore/10_basic/Create a snapshot and then restore it}"
-Dtests.seed=41E70BDA7F09EFAD
-Dtests.security.manager=true
-Dtests.locale=ja-JP
-Dtests.timezone=Europe/Dublin
-Dtests.distribution=default
-Druntime.java=8
REPRODUCE WITH: ./gradlew ':qa:mixed-cluster:v7.1.1#mixedClusterTest' --tests "org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT"
-Dtests.method="test {p0=snapshot.status/10_basic/Get snapshot status}"
-Dtests.seed=41E70BDA7F09EFAD
-Dtests.security.manager=true
-Dtests.locale=ja-JP
-Dtests.timezone=Europe/Dublin
-Dtests.distribution=default
-Druntime.java=8
REPRODUCE WITH: ./gradlew ':qa:mixed-cluster:v7.1.1#mixedClusterTest' --tests "org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT"
-Dtests.method="test {p0=snapshot.get/10_basic/Get snapshot info}"
-Dtests.seed=41E70BDA7F09EFAD
-Dtests.security.manager=true
-Dtests.locale=ja-JP
-Dtests.timezone=Europe/Dublin
-Dtests.distribution=default
-Druntime.java=8
REPRODUCE WITH: ./gradlew ':qa:mixed-cluster:v7.1.1#mixedClusterTest' --tests "org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT"
-Dtests.method="test {p0=snapshot.get/10_basic/Get snapshot info contains include_global_state}"
-Dtests.seed=41E70BDA7F09EFAD
-Dtests.security.manager=true
-Dtests.locale=ja-JP
-Dtests.timezone=Europe/Dublin
-Dtests.distribution=default
-Druntime.java=8
REPRODUCE WITH: ./gradlew ':qa:mixed-cluster:v7.1.1#mixedClusterTest' --tests "org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT"
-Dtests.method="test {p0=snapshot.get/10_basic/Get snapshot info when verbose is false}"
-Dtests.seed=41E70BDA7F09EFAD
-Dtests.security.manager=true
-Dtests.locale=ja-JP
-Dtests.timezone=Europe/Dublin
-Dtests.distribution=default
-Druntime.java=8
REPRODUCE WITH: ./gradlew ':qa:mixed-cluster:v7.1.1#mixedClusterTest' --tests "org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT"
-Dtests.method="test {p0=snapshot.create/10_basic/Create a snapshot}"
-Dtests.seed=41E70BDA7F09EFAD
-Dtests.security.manager=true
-Dtests.locale=ja-JP
-Dtests.timezone=Europe/Dublin
-Dtests.distribution=default
-Druntime.java=8
Reproduces locally?:
No
Applicable branches:
Seen across all active 7 branches.
Failure history:
Several similar looking failures over the last days, recent three ones linked in the build scans above.
Failure excerpt:
The text was updated successfully, but these errors were encountered: