Cluster sharding deserialization issue #3664

ctrlaltdan · 2018-11-28T11:05:56Z

After using cluster sharding for a while in our CI environment we occasionally see the following errors (which prevent the cluster from functioning)

Exception in ReceiveRecover when replaying event type ["Akka.Cluster.Sharding.PersistentShardCoordinator+ShardHomeAllocated"] with sequence number [105] for persistenceId ["/system/sharding/customerCoordinator/singleton/coordinator"]

{
  "Depth": 0,
  "ClassName": "",
  "Message": "Region [akka.tcp://imburse@10.240.0.108:8081/system/sharding/customer#1665482693] not registered\nParameter name: e",
  "Source": "Akka.Cluster.Sharding",
  "StackTraceString": "   at Akka.Cluster.Sharding.PersistentShardCoordinator.State.Updated(IDomainEvent e)\n   at Akka.Cluster.Sharding.PersistentShardCoordinator.ReceiveRecover(Object message)\n   at Akka.Actor.ActorBase.AroundReceive(Receive receive, Object message)\n   at Akka.Persistence.Eventsourced.<>c__DisplayClass91_0.<Recovering>b__1(Receive receive, Object message)",
  "RemoteStackTraceString": "",
  "RemoteStackIndex": -1,
  "HResult": -2147024809,
  "HelpURL": null
}

We're unable to reproduce this error consistently however it appears to happen during a release. It's our assumption that one or many of our pods may become unavailable during the release before the sharding event journal is written to in a good state. This then creates the corrupted event journal which causes is problems.

Currently the only remedy from this situation is to drop all sharding records from the event journal and let the system start from scratch.

I have the data from the EventJournal table below.

Ordering	PersistenceId	SequenceNr	Timestamp	Manifest	Payload	Tags	SerializerId
136	/system/sharding/customerCoordinator/singleton/coordinator	105	636783291061582824	AF	0x0A0233361248616B6B612E7463703A2F2F696D62757273654031302E3234302E302E3130383A383038312F73797374656D2F7368617264696E672F637573746F6D65722331363635343832363933	NULL	13
140	/system/sharding/customerCoordinator/singleton/coordinator	106	636784965012362487	AB	0x0A47616B6B612E7463703A2F2F696D62757273654031302E3234302E302E3132323A383038312F73797374656D2F7368617264696E672F637573746F6D657223323432333036343237	NULL	13
141	/system/sharding/customerCoordinator/singleton/coordinator	107	636784965012552987	AC	0x0A4C616B6B612E7463703A2F2F696D62757273654031302E3234302E302E35333A383038312F73797374656D2F7368617264696E672F637573746F6D657250726F78792332313137333131313037	NULL	13
142	/system/sharding/customerCoordinator/singleton/coordinator	108	636784965012782458	AC	0x0A4D616B6B612E7463703A2F2F696D62757273654031302E3234302E302E3132363A383038312F73797374656D2F7368617264696E672F637573746F6D657250726F78792331323135343337333535	NULL	13
143	/system/sharding/customerCoordinator/singleton/coordinator	109	636784965126035293	AD	0x0A46616B6B612E7463703A2F2F696D62757273654031302E3234302E302E33303A383038312F73797374656D2F7368617264696E672F637573746F6D657223393630383739383036	NULL	13
144	/system/sharding/customerCoordinator/singleton/coordinator	110	636784965229904859	AB	0x0A46616B6B612E7463703A2F2F696D62757273654031302E3234302E302E31343A383038312F73797374656D2F7368617264696E672F637573746F6D657223373231343333303930	NULL	13

System specs

Deployed to Kubernetes in docker containers (as pods)
Using netcoreapp2.0, specifically the microsoft/dotnet:2.0.9-runtime image to side-step dotnetty issues
Using the following package dependencies:
- Akka 1.3.10
- Akka.Bootstrap.Docker 0.1.3
- Akka.Cluster.Sharding 1.3.10-beta
- Akka.Cluster.Tools 1.3.10
- Akka.Logger.Serilog 1.3.9
- Akka.Persistence.SqlServer 1.3.7

Potentially related issues

#3414

#3204

The text was updated successfully, but these errors were encountered:

Aaronontheweb · 2018-12-14T21:37:45Z

Thanks @ctrlaltdan - we'll take a look at this. That's extremely annoying that you have to do a re-dump of all of that data. We'll fix that.

Horusiath · 2018-12-15T09:07:07Z

@ctrlaltdan do you even need to use Akka.Persistence here? There's an alternative mode which utilized Akka.DistributedData for sharding. The only downside is that it doesn't let you use remember-entities option (yet).

You can set it up with akka.cluster.sharding.state-store-mode = ddata.

ctrlaltdan · 2018-12-18T09:27:34Z

@Horusiath Yeah I'll give that a go when we schedule some time to upgrade our projects to the 1.3.11 release. We are using Akka.Persistence for saving our own state but we have no requirement to use remember-entities. Thanks for the tip.

Do you have any documentation weighing up the pros/cons of these two options. I'm pretty sold on avoiding SQL Server/external storage where possible. Would be good to understand any implications on the system if we use the ddata option.

Aaronontheweb · 2019-02-27T22:54:08Z

This issue and #3414 are definitely the same bug.

Aaronontheweb · 2019-07-18T02:05:16Z

This is now resolved as of #3744

Aaronontheweb added potential bug akka-cluster-sharding labels Dec 14, 2018

Aaronontheweb self-assigned this Dec 14, 2018

Aaronontheweb closed this as completed Jul 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster sharding deserialization issue #3664

Cluster sharding deserialization issue #3664

ctrlaltdan commented Nov 28, 2018 •

edited

Loading

Aaronontheweb commented Dec 14, 2018

Horusiath commented Dec 15, 2018

ctrlaltdan commented Dec 18, 2018

Aaronontheweb commented Feb 27, 2019

Aaronontheweb commented Jul 18, 2019

Cluster sharding deserialization issue #3664

Cluster sharding deserialization issue #3664

Comments

ctrlaltdan commented Nov 28, 2018 • edited Loading

System specs

Potentially related issues

Aaronontheweb commented Dec 14, 2018

Horusiath commented Dec 15, 2018

ctrlaltdan commented Dec 18, 2018

Aaronontheweb commented Feb 27, 2019

Aaronontheweb commented Jul 18, 2019

ctrlaltdan commented Nov 28, 2018 •

edited

Loading