You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After using cluster sharding for a while in our CI environment we occasionally see the following errors (which prevent the cluster from functioning)
Exception in ReceiveRecover when replaying event type ["Akka.Cluster.Sharding.PersistentShardCoordinator+ShardHomeAllocated"] with sequence number [105] for persistenceId ["/system/sharding/customerCoordinator/singleton/coordinator"]
{
"Depth": 0,
"ClassName": "",
"Message": "Region [akka.tcp://imburse@10.240.0.108:8081/system/sharding/customer#1665482693] not registered\nParameter name: e",
"Source": "Akka.Cluster.Sharding",
"StackTraceString": " at Akka.Cluster.Sharding.PersistentShardCoordinator.State.Updated(IDomainEvent e)\n at Akka.Cluster.Sharding.PersistentShardCoordinator.ReceiveRecover(Object message)\n at Akka.Actor.ActorBase.AroundReceive(Receive receive, Object message)\n at Akka.Persistence.Eventsourced.<>c__DisplayClass91_0.<Recovering>b__1(Receive receive, Object message)",
"RemoteStackTraceString": "",
"RemoteStackIndex": -1,
"HResult": -2147024809,
"HelpURL": null
}
We're unable to reproduce this error consistently however it appears to happen during a release. It's our assumption that one or many of our pods may become unavailable during the release before the sharding event journal is written to in a good state. This then creates the corrupted event journal which causes is problems.
Currently the only remedy from this situation is to drop all sharding records from the event journal and let the system start from scratch.
I have the data from the EventJournal table below.
@ctrlaltdan do you even need to use Akka.Persistence here? There's an alternative mode which utilized Akka.DistributedData for sharding. The only downside is that it doesn't let you use remember-entities option (yet).
You can set it up with akka.cluster.sharding.state-store-mode = ddata.
@Horusiath Yeah I'll give that a go when we schedule some time to upgrade our projects to the 1.3.11 release. We are using Akka.Persistence for saving our own state but we have no requirement to use remember-entities. Thanks for the tip.
Do you have any documentation weighing up the pros/cons of these two options. I'm pretty sold on avoiding SQL Server/external storage where possible. Would be good to understand any implications on the system if we use the ddata option.
After using cluster sharding for a while in our CI environment we occasionally see the following errors (which prevent the cluster from functioning)
We're unable to reproduce this error consistently however it appears to happen during a release. It's our assumption that one or many of our pods may become unavailable during the release before the sharding event journal is written to in a good state. This then creates the corrupted event journal which causes is problems.
Currently the only remedy from this situation is to drop all sharding records from the event journal and let the system start from scratch.
I have the data from the
EventJournal
table below.System specs
netcoreapp2.0
, specifically themicrosoft/dotnet:2.0.9-runtime
image to side-step dotnetty issuesPotentially related issues
#3414
#3204
The text was updated successfully, but these errors were encountered: