-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exception in PersistentShardCoordinator ReceiveRecover #3414
Comments
Going to look into this while I'm at it with #3455 |
The issue is that the Three possible causes of this:
Going to eliminate number 2 first since that's the simplest - will look into the others next. |
Manually verified the output of this spec: akka.net/src/contrib/cluster/Akka.Cluster.Sharding.Tests/ClusterShardingMessageSerializerSpec.cs Lines 54 to 78 in e15b935
Can vouch for its accuracy - the sharding serializer appears to be working correctly. |
This is probably the issue #3204 Going to create some reproduction specs and then see where things go. |
Working on a fun reproduction of this using an actual integration test against SQL Server spun up via |
I was able to reproduce this issue on my end with with my copy of the above project: https://github.com/Aaronontheweb/AkkaClusterSharding3414Repro. And received the same failing to recover error message:
I've attached the data from the EvenJournal database in hopes to find more information on what is causing this behavior. |
@izavala I'll deserialize the data gathered from the repo here and see what's up - that should paint a clearer picture as to what's going on. |
Wrote a custom tool using Akka.Persistence.Query using the dataset that created this error: https://github.com/Aaronontheweb/Cluster.Sharding.Viewer Attached is the output. Haven't analyzed it yet, but this is the same data from @izavala's reproduction. |
Worth noting in these logs: no Snapshots were ever saved for the |
So the logs we've produced confirm that #3204 is the issue - the exception in recovery only occurs when it's the same node with the same address trying to deserialize its own |
…o par with JVM
Moving this to 1.4.0 - changes are too big to put into a point release. We're going to need to make a lot of changes to the serialization system for |
Hey cool, I've run into this one too. Still in a prototype phase but it was on my mind for issues to address in moving to a more production preparation phase. @Aaronontheweb Since you're making serialization system changes, just a heads up that with your netstandard2.0 update in #3668 the difference between Framework and Core disappear. See my commit referencing the issue for the code that removes the difference. |
I've been able to verify via Aaronontheweb/AkkaClusterSharding3414Repro#10 that #3744 resolves this issue. I'm note done with #3744 yet - still need to make sure this works with |
* fixed typo in RemoteActorRefProvider comment * Working on #3414 - bringing SerializeWithTransport API up to par with JVM * added spec to help validate CurrentTransportInformation issues Based on the equivalent JVM spec * working on bringing serialization up to snuff * brought serialization class up to snuff * wrapping up RmeoteActorRefProvider implementation * WIP * cleaning up Serialization class * looks like there's a Lazy<SerializationInfo> translation from Scala to C# that we haven't quite done * fixed Serialization class * fixed bug with Akka.Remote.Serialization.SerializationTransportInformationSpec * forced a couple of specs using default akka.remote configs to run sequentially This was done in order to avoid the two specs trying to bind on the same port at the same time. * added serialization verification to the Akka.Persistence.TCK * fixed issues with default Akka.Perisstence.TCK specs * fixed IActorRef serialziation support in Akka.Persistence journals and snapshot stores * fixed compilation issuyes * fixed Akka.Sql.Common serialization in a backwards-compatible fashion * had to disable serialization specs for Sql Journals * Added API approvals * updated creator and serialize-all-messages serialization * added ITestOutputHelper to Akka.Cluster.Sharding.Tests.SupervisionSpec * made changes to LocalSnapshotSerializer * fixed bug in WithTransport method * updated Akka.Remote MessageSerializer
This is now resolved as of #3744 |
…et#3744) * fixed typo in RemoteActorRefProvider comment * Working on akkadotnet#3414 - bringing SerializeWithTransport API up to par with JVM * added spec to help validate CurrentTransportInformation issues Based on the equivalent JVM spec * working on bringing serialization up to snuff * brought serialization class up to snuff * wrapping up RmeoteActorRefProvider implementation * WIP * cleaning up Serialization class * looks like there's a Lazy<SerializationInfo> translation from Scala to C# that we haven't quite done * fixed Serialization class * fixed bug with Akka.Remote.Serialization.SerializationTransportInformationSpec * forced a couple of specs using default akka.remote configs to run sequentially This was done in order to avoid the two specs trying to bind on the same port at the same time. * added serialization verification to the Akka.Persistence.TCK * fixed issues with default Akka.Perisstence.TCK specs * fixed IActorRef serialziation support in Akka.Persistence journals and snapshot stores * fixed compilation issuyes * fixed Akka.Sql.Common serialization in a backwards-compatible fashion * had to disable serialization specs for Sql Journals * Added API approvals * updated creator and serialize-all-messages serialization * added ITestOutputHelper to Akka.Cluster.Sharding.Tests.SupervisionSpec * made changes to LocalSnapshotSerializer * fixed bug in WithTransport method * updated Akka.Remote MessageSerializer
…et#3744) * fixed typo in RemoteActorRefProvider comment * Working on akkadotnet#3414 - bringing SerializeWithTransport API up to par with JVM * added spec to help validate CurrentTransportInformation issues Based on the equivalent JVM spec * working on bringing serialization up to snuff * brought serialization class up to snuff * wrapping up RmeoteActorRefProvider implementation * WIP * cleaning up Serialization class * looks like there's a Lazy<SerializationInfo> translation from Scala to C# that we haven't quite done * fixed Serialization class * fixed bug with Akka.Remote.Serialization.SerializationTransportInformationSpec * forced a couple of specs using default akka.remote configs to run sequentially This was done in order to avoid the two specs trying to bind on the same port at the same time. * added serialization verification to the Akka.Persistence.TCK * fixed issues with default Akka.Perisstence.TCK specs * fixed IActorRef serialziation support in Akka.Persistence journals and snapshot stores * fixed compilation issuyes * fixed Akka.Sql.Common serialization in a backwards-compatible fashion * had to disable serialization specs for Sql Journals * Added API approvals * updated creator and serialize-all-messages serialization * added ITestOutputHelper to Akka.Cluster.Sharding.Tests.SupervisionSpec * made changes to LocalSnapshotSerializer * fixed bug in WithTransport method * updated Akka.Remote MessageSerializer
…et#3744) * fixed typo in RemoteActorRefProvider comment * Working on akkadotnet#3414 - bringing SerializeWithTransport API up to par with JVM * added spec to help validate CurrentTransportInformation issues Based on the equivalent JVM spec * working on bringing serialization up to snuff * brought serialization class up to snuff * wrapping up RmeoteActorRefProvider implementation * WIP * cleaning up Serialization class * looks like there's a Lazy<SerializationInfo> translation from Scala to C# that we haven't quite done * fixed Serialization class * fixed bug with Akka.Remote.Serialization.SerializationTransportInformationSpec * forced a couple of specs using default akka.remote configs to run sequentially This was done in order to avoid the two specs trying to bind on the same port at the same time. * added serialization verification to the Akka.Persistence.TCK * fixed issues with default Akka.Perisstence.TCK specs * fixed IActorRef serialziation support in Akka.Persistence journals and snapshot stores * fixed compilation issuyes * fixed Akka.Sql.Common serialization in a backwards-compatible fashion * had to disable serialization specs for Sql Journals * Added API approvals * updated creator and serialize-all-messages serialization * added ITestOutputHelper to Akka.Cluster.Sharding.Tests.SupervisionSpec * made changes to LocalSnapshotSerializer * fixed bug in WithTransport method * updated Akka.Remote MessageSerializer
…et#3744) * fixed typo in RemoteActorRefProvider comment * Working on akkadotnet#3414 - bringing SerializeWithTransport API up to par with JVM * added spec to help validate CurrentTransportInformation issues Based on the equivalent JVM spec * working on bringing serialization up to snuff * brought serialization class up to snuff * wrapping up RmeoteActorRefProvider implementation * WIP * cleaning up Serialization class * looks like there's a Lazy<SerializationInfo> translation from Scala to C# that we haven't quite done * fixed Serialization class * fixed bug with Akka.Remote.Serialization.SerializationTransportInformationSpec * forced a couple of specs using default akka.remote configs to run sequentially This was done in order to avoid the two specs trying to bind on the same port at the same time. * added serialization verification to the Akka.Persistence.TCK * fixed issues with default Akka.Perisstence.TCK specs * fixed IActorRef serialziation support in Akka.Persistence journals and snapshot stores * fixed compilation issuyes * fixed Akka.Sql.Common serialization in a backwards-compatible fashion * had to disable serialization specs for Sql Journals * Added API approvals * updated creator and serialize-all-messages serialization * added ITestOutputHelper to Akka.Cluster.Sharding.Tests.SupervisionSpec * made changes to LocalSnapshotSerializer * fixed bug in WithTransport method * updated Akka.Remote MessageSerializer
Hey guys, since this issue has been fixed I recommend updating README at https://github.com/petabridge/akkadotnet-cluster-workshop, since at end of it still point to this issue as an active one |
Akka 1.3.5
This morning while making some provisioning changes, we ended up in a state where two single node clusters were running that pointed to the same database. After fixing the error and starting only a single node, the underlying akka code was failing to recover.
My expectation is in the case of two nodes attempting to own the same data, that one would eventually see a journal write error as the journal sequence number would not be unique, and that the ActorSystem would then shut itself down. On recovery it should always be able to get back into a consistent state.
In our case, it was caused by a user error, but this could easily occur in the case of a network partition where two nodes claim to own the same underlying dataset.
The text was updated successfully, but these errors were encountered: