Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of #5967 - Allow PersistentShardCoordinator to tolerate duplicate ShardHomeAllocated messages #5970

Conversation

Arkatufus
Copy link
Contributor

Backport of #5967

…erate duplicate ShardHomeAllocated messages

// per https://github.com/akkadotnet/akka.net/issues/5604
// we're going to allow new value to overwrite previous
var newRegions = Regions;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple questions: 1) Is this safe all the times?; 2) Will there be a way to disable this new behavior?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Not sure - the alternative is defaulting the current behavior, which is that the ShardCoordinator blows up and the cluster needs to be restarted and all of the ShardCoordinator data deleted. Given that, I think this probably is safer than the current defaults. The worst case scenario I can imagine here is that the Shard was actually allocated to two different nodes, but that would if that were the case then the sharding system would already be in very bad shape (i.e. violating its consistency rules) and this would allow the newest home for the shard to supersede the old one, which would stop receiving message traffic. In the logs where I saw this occuring it was duplicate records for the same Shard in the same ShardRegion IActorRef - so this change would just make the recovery idempotent.

  2. We could add it, but I don't think it should be necessary.

@Aaronontheweb
Copy link
Member

@ismaelhamed I think I can make this change safer by narrowing the conditions during which we de-duplicate ShardHomeAllocated messages - which is we don't throw when the exact data is already inside the PersistentShardCoordinator.State since that's a no-op. That should open this system up to less unpredictable behavior since the shard locations are identical.

@Aaronontheweb Aaronontheweb mentioned this pull request May 31, 2022
4 tasks
@Arkatufus
Copy link
Contributor Author

Could be superseeded by #5976

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants