Akka.Cluster: unable to mark member as Leaving
if another instance of member with same Address
is being marked as DOWN
#7370
Labels
Milestone
Version Information
Version of Akka.NET? v1.5.30
Which Akka.NET Modules? Akka.Cluster, Akka.Cluster.Sharding
Describe the bug
It's possible for multiple
Member
s in the cluster to have the sameAddress
- usually after a node is rebooted and the old incarnation of the node hasn't been evicted from the cluster yet. This is why Akka.Cluster uses a separateUniqueAddress
construct - to help us identify when these types of situations occur and to distinguish between two instances of nodes with the sameAddress
.We just recently fixed one error impacted by this non-uniqueness constraint in Akka.Cluster.Sharding with #7367 and it also looks like there are issues with Akka.Cluster's
ClusterDaemon
code itself where it's susceptible to these types of problems - for instance:akka.net/src/core/Akka.Cluster/ClusterDaemon.cs
Lines 1635 to 1655 in 5e2bd0e
This is a really subtle issue, but basically: we should be iterating through EACH of these members, THEN apply the condition, and THEN remove them. The way the loop is designed right now is guaranteed to produce a
["System.InvalidOperationException: Invalid member status transition Down -> Leaving
error if there are multiple instances of the node in the gossip at this time.Expected behavior
Should be able to change status of
Member
s without error even if there are multiple instances of the sameAddress
in-use inside Akka.Cluster.Actual behavior
The daemons crash and Akka.Cluster destabilizes.
Environment
Environments with stable addresses (i.e.
StatefulSet
s in Kubernetes or bare metal) are susceptible to this problem - dynamically addressed environments are not.The text was updated successfully, but these errors were encountered: