Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harden ClusterSpec CoordinatedShutdown #6034

Conversation

Arkatufus
Copy link
Contributor

@Arkatufus Arkatufus commented Jul 5, 2022

The probable cause for failure is a bit convoluted, but it started here:

else if (s.Members.Any(m => m.UniqueAddress.Equals(_cluster.SelfUniqueAddress)
&&
(m.Status == MemberStatus.Leaving || m.Status == MemberStatus.Exiting ||
m.Status == MemberStatus.Down)))

The cluster shutdown watcher actor allows for MemberStatus.Leaving to signal a complete cluster shutdown, and if CoordinatedShutdown reaches its completion faster than the cluster exit flow, it will trigger the Shutdown method registered here:

system.RegisterOnTermination(Shutdown);

invoking a hard ClusterDaemon stop here:

System.Stop(_clusterDaemons);

stopping the ClusterDomainEventPublisher, which publishes a hard coded membership state change here:

PublishChanges(_emptyMembershipState);

This will publish a MemberRemoved with MemberStatus.Leaving as its previous state if the status is still at MemberStatus.Leaving at this point.

I'm not sure if this is the real intended behaviour, but this is what it is as of today.

@Aaronontheweb
Copy link
Member

I'm not sure if this is the real intended behaviour, but this is what it is as of today.

Yes, this is designed to allow the cluster to leave quickly in the event that the node shuts down ahead of the Exiting gossip being fully circulated.

Copy link
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Aaronontheweb Aaronontheweb merged commit 0efdc18 into akkadotnet:dev Jul 6, 2022
@Arkatufus Arkatufus deleted the async_testkit/fix_ClusterSpec_CoordinatedShutdown branch February 27, 2023 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants