Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] DedicatedClusterSnapshotRestoreIT.testMasterShutdownDuringSnapshot failure #51253

Closed
andreidan opened this issue Jan 21, 2020 · 2 comments · Fixed by #51270
Closed

[CI] DedicatedClusterSnapshotRestoreIT.testMasterShutdownDuringSnapshot failure #51253

andreidan opened this issue Jan 21, 2020 · 2 comments · Fixed by #51270
Assignees
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI

Comments

@andreidan
Copy link
Contributor

Encountered this failure on a feature branch https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+pull-request-1/14196/

A build scan is available here https://gradle-enterprise.elastic.co/s/y3sp3pow27tp2

@andreidan andreidan added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI labels Jan 21, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@andreidan andreidan changed the title DedicatedClusterSnapshotRestoreIT.testMasterShutdownDuringSnapshot failure [CI] DedicatedClusterSnapshotRestoreIT.testMasterShutdownDuringSnapshot failure Jan 21, 2020
@original-brownbear original-brownbear self-assigned this Jan 21, 2020
@original-brownbear
Copy link
Member

Looks like this may have been introduced by #50788 ... will create a fix shortly

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Jan 21, 2020
On master failover we have to resent all the shard failed messages,
but the transport requests remain the same in the eyes of `equals`.
If the master failover is registered and the requests to the new master
are sent before all the callbacks have executed and the request to the
old master removed from the deduplicator then the requuests to the new
master will incorrectly fail and the snapshot get stuck.

Closes elastic#51253
original-brownbear added a commit that referenced this issue Jan 22, 2020
On master failover we have to resent all the shard failed messages,
but the transport requests remain the same in the eyes of `equals`.
If the master failover is registered and the requests to the new master
are sent before all the callbacks have executed and the request to the
old master removed from the deduplicator then the requuests to the new
master will incorrectly fail and the snapshot get stuck.

Closes #51253
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Jan 22, 2020
On master failover we have to resent all the shard failed messages,
but the transport requests remain the same in the eyes of `equals`.
If the master failover is registered and the requests to the new master
are sent before all the callbacks have executed and the request to the
old master removed from the deduplicator then the requuests to the new
master will incorrectly fail and the snapshot get stuck.

Closes elastic#51253
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Jan 22, 2020
On master failover we have to resent all the shard failed messages,
but the transport requests remain the same in the eyes of `equals`.
If the master failover is registered and the requests to the new master
are sent before all the callbacks have executed and the request to the
old master removed from the deduplicator then the requuests to the new
master will incorrectly fail and the snapshot get stuck.

Closes elastic#51253
original-brownbear added a commit that referenced this issue Jan 22, 2020
On master failover we have to resent all the shard failed messages,
but the transport requests remain the same in the eyes of `equals`.
If the master failover is registered and the requests to the new master
are sent before all the callbacks have executed and the request to the
old master removed from the deduplicator then the requuests to the new
master will incorrectly fail and the snapshot get stuck.

Closes #51253
original-brownbear added a commit that referenced this issue Jan 22, 2020
On master failover we have to resent all the shard failed messages,
but the transport requests remain the same in the eyes of `equals`.
If the master failover is registered and the requests to the new master
are sent before all the callbacks have executed and the request to the
old master removed from the deduplicator then the requuests to the new
master will incorrectly fail and the snapshot get stuck.

Closes #51253
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants