Shard state transitions should be edge-triggered rather than level-triggered #82185

DaveCTurner · 2022-01-04T09:39:07Z

Today when each node receives a cluster state update it compares the states of its shards in the new routing table to their expected states, and triggers a shard-started or shard-failed transition if they don't match. We then capture the transition and suppress it if a duplicate request is already in flight (#31313 for shard-failed transitions, #82089 for shard-started ones).

This is pretty ugly. These transitions may be a long way down the master's queue so we may trigger (and then suppress) many duplicate requests. I think the reasons for this mechanism date back to a time when cluster state updates could occasionally be lost, but these problems are fixed today so we should move to a system that triggers the state update request only at the shard state transition and then relies on the fact that this request will eventually complete (possibly unsuccessfully, requiring a retry).

The text was updated successfully, but these errors were encountered:

elasticmachine · 2022-01-04T09:39:20Z

Pinging @elastic/es-distributed (Team:Distributed)

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jan 4, 2022

DaveCTurner added the >enhancement label Jan 4, 2022

DaveCTurner mentioned this issue Jan 4, 2022

Deduplicate Shard Started Requests #82089

Merged

DaveCTurner mentioned this issue Jan 19, 2022

Make Shard Started Response Handling only Return after the CS Update Completes #82790

Merged

DaveCTurner added the >tech debt label Jul 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shard state transitions should be edge-triggered rather than level-triggered #82185

Shard state transitions should be edge-triggered rather than level-triggered #82185

DaveCTurner commented Jan 4, 2022

elasticmachine commented Jan 4, 2022

Shard state transitions should be edge-triggered rather than level-triggered #82185

Shard state transitions should be edge-triggered rather than level-triggered #82185

Comments

DaveCTurner commented Jan 4, 2022

elasticmachine commented Jan 4, 2022