Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shard state transitions should be edge-triggered rather than level-triggered #82185

Open
DaveCTurner opened this issue Jan 4, 2022 · 1 comment
Labels
:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >tech debt

Comments

@DaveCTurner
Copy link
Contributor

Today when each node receives a cluster state update it compares the states of its shards in the new routing table to their expected states, and triggers a shard-started or shard-failed transition if they don't match. We then capture the transition and suppress it if a duplicate request is already in flight (#31313 for shard-failed transitions, #82089 for shard-started ones).

This is pretty ugly. These transitions may be a long way down the master's queue so we may trigger (and then suppress) many duplicate requests. I think the reasons for this mechanism date back to a time when cluster state updates could occasionally be lost, but these problems are fixed today so we should move to a system that triggers the state update request only at the shard state transition and then relies on the fact that this request will eventually complete (possibly unsuccessfully, requiring a retry).

@DaveCTurner DaveCTurner added >enhancement needs:triage Requires assignment of a team area label :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) and removed >enhancement needs:triage Requires assignment of a team area label labels Jan 4, 2022
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jan 4, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >tech debt
Projects
None yet
Development

No branches or pull requests

2 participants