Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading snapshot error #57

Closed
ctrlaltdan opened this issue Aug 23, 2018 · 11 comments
Closed

Loading snapshot error #57

ctrlaltdan opened this issue Aug 23, 2018 · 11 comments
Labels

Comments

@ctrlaltdan
Copy link

After running my application for a while I reach the 1000 journal limit and a coordinator snapshot is taken.

Every subsequent time I load the application from this point I receive the error below.

I'm wondering if this is a hocon configuration issue?

Many thanks

Setup:

  • Akka 1.3.8
  • Akka.Persistence.PostgreSql 1.3.8
[12:02:19 ERR] Persistence failure when replaying events for persistenceId [/system/sharding/collectCoordinator/singleton/coordinator]. Last known sequence number [0]
System.TypeLoadException: Could not load type 'AA' from assembly 'Akka.Persistence.PostgreSql, Version=1.3.8.0, Culture=neutral, PublicKeyToken=null'.
   at System.RuntimeTypeHandle.GetTypeByName(String name, Boolean throwOnError, Boolean ignoreCase, Boolean reflectionOnly, StackCrawlMarkHandle stackMark, IntPtr pPrivHostBinder, Boolean loadTypeFromPartialName, ObjectHandleOnStack type, ObjectHandleOnStack keepalive)
   at System.RuntimeTypeHandle.GetTypeByName(String name, Boolean throwOnError, Boolean ignoreCase, Boolean reflectionOnly, StackCrawlMark& stackMark, IntPtr pPrivHostBinder, Boolean loadTypeFromPartialName)
   at System.RuntimeType.GetType(String typeName, Boolean throwOnError, Boolean ignoreCase, Boolean reflectionOnly, StackCrawlMark& stackMark)
   at System.Type.GetType(String typeName, Boolean throwOnError)
   at Akka.Persistence.PostgreSql.Snapshot.PostgreSqlQueryExecutor.ReadSnapshot(DbDataReader reader)
   at Akka.Persistence.Sql.Common.Snapshot.AbstractQueryExecutor.SelectSnapshotAsync(DbConnection connection, CancellationToken cancellationToken, String persistenceId, Int64 maxSequenceNr, DateTime maxTimestamp)
   at Akka.Persistence.Sql.Common.Snapshot.SqlSnapshotStore.LoadAsync(String persistenceId, SnapshotSelectionCriteria criteria)
   at Akka.Util.Internal.AtomicState.CallThrough[T](Func`1 task)
   at Akka.Util.Internal.AtomicState.CallThrough[T](Func`1 task)

Hocon:

akka {

    loglevel = INFO,
	
    loggers = ["Akka.Logger.Serilog.SerilogLogger, Akka.Logger.Serilog"],

    actor {
        provider = cluster
    }
    
    remote {
        dot-netty.tcp {
            hostname = "127.0.0.1"
            port = 0
        }
    }

    cluster {

        ...

        auto-down-unreachable-after = 5s
		
        run-coordinated-shutdown-when-down = on

        sharding {
            remember-entities = on
            journal-plugin-id = "akka.persistence.journal.sharding"
            snapshot-plugin-id = "akka.persistence.snapshot-store.sharding"
        }
    }

    persistence {

        journal {
            plugin = "akka.persistence.journal.postgresql"
            postgresql {
                class = "Akka.Persistence.PostgreSql.Journal.PostgreSqlJournal, Akka.Persistence.PostgreSql"
                plugin-dispatcher = "akka.actor.default-dispatcher"
                connection-string = "User ID=example;Password=example;Host=localhost;Port=5432;Database=example;"
                connection-timeout = 30s
                schema-name = public
                table-name = event_journal
                auto-initialize = off
                timestamp-provider = "Akka.Persistence.Sql.Common.Journal.DefaultTimestampProvider, Akka.Persistence.Sql.Common"
                metadata-table-name = metadata
                stored-as = BYTEA
            }
            sharding {
                class = "Akka.Persistence.PostgreSql.Journal.PostgreSqlJournal, Akka.Persistence.PostgreSql"
                plugin-dispatcher = "akka.actor.default-dispatcher"
                connection-string = "User ID=example;Password=example;Host=localhost;Port=5432;Database=example;"
                connection-timeout = 30s
                schema-name = public
                table-name = sharding_event_journal
                auto-initialize = off
                timestamp-provider = "Akka.Persistence.Sql.Common.Journal.DefaultTimestampProvider, Akka.Persistence.Sql.Common"
                metadata-table-name = sharding_metadata
                stored-as = BYTEA
            }
        }

        snapshot-store {
            plugin = "akka.persistence.snapshot-store.postgresql"
            postgresql {
                class = "Akka.Persistence.PostgreSql.Snapshot.PostgreSqlSnapshotStore, Akka.Persistence.PostgreSql"
                plugin-dispatcher = "akka.actor.default-dispatcher"
                connection-string = "User ID=example;Password=example;Host=localhost;Port=5432;Database=example;"
                connection-timeout = 30s
                schema-name = public
                table-name = snapshot_store
                auto-initialize = off
                stored-as = BYTEA
            }
            sharding {
                class = "Akka.Persistence.PostgreSql.Snapshot.PostgreSqlSnapshotStore, Akka.Persistence.PostgreSql"
                plugin-dispatcher = "akka.actor.default-dispatcher"
                connection-string = "User ID=example;Password=example;Host=localhost;Port=5432;Database=example;"
                connection-timeout = 30s
                schema-name = public
                table-name = sharding_snapshot_store
                auto-initialize = off
                stored-as = BYTEA
            }
        }
    }
}
@Horusiath
Copy link
Contributor

Exception points to problems with deserializing type AA which is trying to get located in Akka.Persistence.PostgreSQL. I guess this is your custom type. What serializer have you picked up for it?

@ctrlaltdan
Copy link
Author

Hey @Horusiath

I should have mentioned this is the system shard coordinator which is failing to start.

From what I can see it's correctly stored the key AA as its pseudo "type" https://github.com/akkadotnet/akka.net/blob/dev/src/contrib/cluster/Akka.Cluster.Sharding/Serialization/ClusterShardingMessageSerializer.cs#L28 however it looks to me like Akka.Persistence.PostgreSQL is actually trying to create a new type of this constant value rather than follow the deserialization code path.

My setup doesn't make use of any custom serialization - everything should be using the defaults which I believe is Protobuf.

Thanks for your help!

@Horusiath
Copy link
Contributor

Horusiath commented Aug 24, 2018

@dubs999 Cluster sharding has its own dedicated serializer, which interprets manifests differently than akka default message serializer. Having that error message message means, that default is tried to being used on message deserialization.

Most probably you haven't included ClusterSharding.DefaultConfig() as a fallback configuration in your actor system:

var system = ActorSystem.Create("system", myConfig.WithFallback(ClusterSharding.DefaultConfig()));

@ctrlaltdan
Copy link
Author

ctrlaltdan commented Aug 24, 2018

Hey @Horusiath

I've copied and pasted the setup I'm using below. I have the fallback configuration already declared. Sorry if this is going off topic but we didnt cover persistence module setup in quite as much detail during Aaron's course.

Is there any additional fallback modules that are required for persistence/sharding?

var config = ConfigurationFactory
    .ParseString(Hocon())
    .WithFallback(ClusterSharding.DefaultConfig());

var system = ActorSystem.Create("my-system", config);

var sharding = ClusterSharding.Get(system);
var settings = ClusterShardingSettings
    .Create(system)
    .WithRole("transaction-v1");

ShardRegion = sharding.Start(
    "collect",
    Props.Create<CollectTransactionActor>(),
    settings,
    new MessageExtractor());

@ctrlaltdan
Copy link
Author

@ctrlaltdan
Copy link
Author

ctrlaltdan commented Aug 24, 2018

I've got a simple app which can repro this bug fairly easily. I've lowered the threshold at which a snapshot is taken to 10 to make the turnaround a bit more rapid.

https://github.com/dubs999/akka.net-cluster-sharding

Steps

  1. Load app
  2. Hit localhost:4010/[any-string]
  3. Repeat step 2 a few times to cause enough coordinator to reach 10+ journal entries (restarts seem to also cause more journal entries - I'm not 100% sure what the criteria is for creating these entries).
  4. Stop the app, restart it and see the Cluster application error on startup.

If there is anything obvious that's missing with this example then that would be super helpful.

Let me know if I can help in any other way.

Dan

@Horusiath Horusiath added the bug label Aug 24, 2018
@Horusiath
Copy link
Contributor

You're right, @dubs999 . This is a bug.

@ctrlaltdan
Copy link
Author

Thanks @Horusiath

I guess I have a couple of options.

  1. I push up the limit at which a snapshot occurs to a really high number and run my application off of journal entries (until such time as the bug is fixed).
  2. I use an alternative persistence plugin.

Do you have customers which are using persistent sharding in production? If so, do you know which persistence plugin they are using?

We may also have some time to contribute a fix however our release deadline is fairly tight and I assume a public build wont be available in the very near term.

Many thanks

@Horusiath
Copy link
Contributor

I know about several projects using Akka.Persistence.SqlServer. If I'm right redis plugin also should work, and mongodb plugin was modified some time ago specifically to also expose cluster sharding compatibility. Another option is to use cluster sharding in akka.cluster.sharding.state-store-mode = ddata mode, which doesn't need any persistence, however it won't allow you to use akka.cluster.sharding.remember-entities option.

@ctrlaltdan
Copy link
Author

I've submitted a PR which fixes the issue (when tested locally). I've not got a great deal of exposure to the Akka framework so apologies if the fix isn't robust enough.

#58

Thank you for all your help guys!

@Aaronontheweb
Copy link
Member

Closed via #60

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants