-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EventStream subscription does not get automatically removed on actor death causing memory leak #5717
Comments
We automatically do this today akka.net/src/core/Akka/Event/EventBusUnsubscriber.cs Lines 29 to 149 in 7123d0f
At least we are supposed to. I'll go take a look. |
Reproduced `IActorRef` leak inside the `EventStream`
Reproduced here: #5720 |
Great! Yes, I should have said the documentation says you do it automatically but it doesn't work...as you've now reproduced too. I couldn't figure out how the code I included above works for sending the registration message, just that it doesn't appear to ever send it to the unsubscriber...I did verify the unsubscriber actor itself is running, but it doesn't react to eventstream subscriber actor's death |
I believe the answer for that is this code: akka.net/src/core/Akka/Event/EventStream.cs Lines 131 to 159 in d055f46
I've been debugging concurrent + distributed systems full-time in .NET for about a decade - I have no idea how on earth that code is supposed to be remotely safe, let alone how it's supposed to work in a linear non-concurrent flow. I'm going to simplify it. |
akka.net/src/core/Akka/Event/EventBusUnsubscriber.cs Lines 110 to 128 in d055f46
The unsubscriber is handling its own custom |
Have this all fixed now in #5720 |
* Moved the `akka core` configuration page into `modules`` (#5664) * Fix link issue with `xref` (#5666) Co-authored-by: Aaron Stannard <aaron@petabridge.com> * [Docs]: Fix Metadata Reference (#5668) * Turn on `ProduceReferenceAssembly` * Add `ProduceReferenceAssembly` to `common.props` * Fix build failures in Akka.FSharp * Revert `<ProduceReferenceAssembly>$(ProduceReferenceAssembly)</ProduceReferenceAssembly>` added to affected projects because it already exists in `common.props' * Resolved invalid links (#5669) * Check for possible broken documentation by failing on DocFX warning (#5542) * Add --warningsAsErrors flag to DocFX * Add check in AzDo pr validation yaml * Use windows image for docfx test * Fix build script name * disable incremental builds for DocFx Co-authored-by: Aaron Stannard <aaron@petabridge.com> * Make sure DocFX warning check step only executed in PR (#5674) * Make sure DocFX warning check step only executed in PR * Add run_if parameter to template Co-authored-by: Gregorius Soedharmo <gregorius.soedharmo@petabridge.com> * [Docs] Improve coordinated shutdown doc hygiene (#5677) * Extract hocon settings * Import coordinated shutdown phases using docfx code reference * Correct the usage of `integration` (#5679) * Improve code reference hygiene with exisiting code block (#5680) * Updated developement scenario (#5533) * Updated developement scenario * Fix markdown linting * Fix linting * Fix linting * Fix linting * Fix linting * Fix linting * Improve `toc.yml` for all sections * Console deployment * Sleep for 5 seconds * Add ASP.NET Core page * Fix startup issue * Fix typo * Fix linting issue * Fix trailing space * Added Headless Service * Fixed typo * Fix linting * * Resolves #5533 * Resolves #5533 * Fix linting * update solution projects * Fix invalid path * Solution project auto updated itself Co-authored-by: Aaron Stannard <aaron@petabridge.com> * Fix ORSet.Merge with AddDeltaOperation takes too long to complete (#5686) * Fix ORSet.Merge with AddDeltaOperation takes too long to complete * Make sure that insertion is still correct after the changes * Increase update timeout setting Co-authored-by: Gregorius Soedharmo <gregorius.soedharmo@petabridge.com> * Make PipeTo ConfigureAwait() optional (#5684) * Remove ConfigureAwait() from PipeTo() * Remove ConfigureAwait() from PipeTo() * Add ConfigureAwait back to PipeTo, make it configurable instead * Update API Approval list * Add function overload for backward compatibility * Update API Approval list Co-authored-by: Gregorius Soedharmo <gregorius.soedharmo@petabridge.com> * Optimize PipeTo ConfigureAwait call (#5688) * Optimize PipeTo ConfigureAwait call * Inverse the double negative parameter Co-authored-by: Gregorius Soedharmo <gregorius.soedharmo@petabridge.com> * Fixed broken toc.yml (#5694) * Fix Serialization documentation discoverability (#5699) * Fix Serialization documentation discoverability * Rename Akka.IO * fixed name of serializer id table Co-authored-by: Aaron Stannard <aaron@petabridge.com> * Call `base.AfterAll()` to kill TestKit ActorSystem (#5705) Co-authored-by: Gregorius Soedharmo <gregorius.soedharmo@petabridge.com> * Fix confusing logging when receiving gossip from unknown (#5706) * Reorder Source/FlowWithContext type parameters (#5648) Co-authored-by: Aaron Stannard <aaron@petabridge.com> * Add Stateful methods for circuitbreaker (#5650) * Add Stateful methods for circuitbreaker * api docs * fix api docs Co-authored-by: Gregorius Soedharmo <arkatufus@yahoo.com> Co-authored-by: Aaron Stannard <aaron@petabridge.com> * Update RELEASE_NOTES.md for 1.4.34 release (#5707) Co-authored-by: Gregorius Soedharmo <gregorius.soedharmo@petabridge.com> * Added placeholder for 1.4.35 (#5709) * Fix PersistenceId Query and Sqlite unit tests (#5715) * Fix PersistenceId Query and Sqlite unit tests * Fix unit test assert * [DocFx] custom Akka.NET theme (#5659) * added updated Akka.NET DocFx theme * added "Show and Tell" page * added v1.5 notes * fix video embeds on mobile (#5719) * Fixed `IActorRef` leak inside `EventStream` (#5720) * reproduced #5717 Reproduced `IActorRef` leak inside the `EventStream` * cleaned up the `EventBusUnsubscriber` * close #5719 - cleaned up `EventStream` subscription management * added API approval For `Obsolete` attribute. * need to capture more data on why failures happen * harden bugfix5717specs * supress InvalidOperationException in xUnit OutputLogger (#5722) * Bump Google.Protobuf from 3.17.3 to 3.19.4 (#5555) Bumps [Google.Protobuf](https://github.com/protocolbuffers/protobuf) from 3.17.3 to 3.19.4. - [Release notes](https://github.com/protocolbuffers/protobuf/releases) - [Changelog](https://github.com/protocolbuffers/protobuf/blob/master/generate_changelog.py) - [Commits](protocolbuffers/protobuf@v3.17.3...v3.19.4) --- updated-dependencies: - dependency-name: Google.Protobuf dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Add MapMaterializedValue for Source/Flow WithContext (#5711) Co-authored-by: Gregorius Soedharmo <arkatufus@yahoo.com> * close #5728 fix `ActorSystemSetup.And` (#5729) * Update RELEASE_NOTES.md for 1.4.35 release (#5726) * Update RELEASE_NOTES.md for 1.4.35 release * Update RELEASE_NOTES.md with the latest changes * v1.4.36 placeholder for nightlies (#5732) * Update PersistAsync to match docs. (#5736) This is supposed to be the example for persistAsync, but it doesn't get used at all. https://getakka.net/articles/persistence/event-sourcing.html#relaxed-local-consistency-requirements-and-high-throughput-use-cases I changed the code to match what is here: https://doc.akka.io/docs/akka/current/persistence.html#relaxed-local-consistency-requirements-and-high-throughput-use-cases * [DI] DI fails to throw an exception when DI tried to create an actor with missing constructor parameter (#5735) * Add bug spec for DI bug * Fix unit test to reflect the correct failure behaviour * Fix spelling (#5745) * Lock cspell on version `5.18.5` (#5744) * Lock cspell on version `5.18.5` * Use version `5.17.0` Co-authored-by: Aaron Stannard <aaron@petabridge.com> * cleanup XUnit TestKit output logger (#5741) * marked `EventBusUnsubscriber` messages as `INoSerializationVerificationNeeded` (#5743) Eliminates issues with some `MinimalActorRef` actors subscribing to the `EventStream` when `akka.actor.serialize-messages = on`. * Fix MSBuild does not copy xunit.runner dlls correctly (#5747) * Add documentation on how to override serializer ids. (#5749) * Add `Member Roles` doc (#5742) * Add `Node Roles` doc Co-authored-by: Aaron Stannard <aaron@petabridge.com> * [DOCS]: Add `Examples` section to Akka.NET Doc (#5739) * Add initial commit * Fix linting and spell checks * Update examples * Fix lint issues * Fix spell check - American English * fix examples href * Update page title * Updated page with more examples * Update examples.md * Fix lint and markdown errors. * Fix docs conflict * Fix blanks * Delete examples.md.orig Co-authored-by: Aaron Stannard <aaron@petabridge.com> * Bump Hyperion to 0.12.2 (#5805) * Bump Hyperion to 0.12.2 * Add HyperionSerializerSettings immutable modifier methods * Fix unit tests * Fix leaky coordinated shutdown (#5816) * Fix CoordinatedShutdown infinite loop * Fix circular reference memory leak * Fix memory leak * Couple of fixes for the TcpConnection (#5817) * Fix TcpConnection error handling * Try not to get stopped by death pact before Unregistration is complete * Update RELEASE_NOTES.md for 1.4.36 release (#5820) * Revert changes that are unrelated to the memory leak (#5822) * Be more explicit that a node is shutting down as it was marked as DOWN (#5821) Co-authored-by: Aaron Stannard <aaron@petabridge.com> * Change the failure log inside `AllEventPublisher` from Debug to Error (#5835) * Fix LeaseProvider instance Activator exception handling (#5838) * Make ActorSystemImpl.Abort skip the CoordinatedShutdown check (#5839) * Bump Microsoft.Data.SQLite from 5.0.11 to 6.0.4 (#5837) Bumps [Microsoft.Data.SQLite](https://github.com/dotnet/efcore) from 5.0.11 to 6.0.4. - [Release notes](https://github.com/dotnet/efcore/releases) - [Commits](dotnet/efcore@v5.0.11...v6.0.4) --- updated-dependencies: - dependency-name: Microsoft.Data.SQLite dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update RELEASE_NOTES.md for 1.4.37 release (#5841) * Add specs to test disallow-unsafe-type (#5746) * Add spec to test disallow-unsafe-type * Fix Hyperion disallow-unsafe-type spec * move from ApprovalTests to Verify for api tests (#5846) * approved to verified * move to verify * Update CoreAPISpec.cs * Update Akka.API.Tests.csproj * Update Akka.API.Tests.csproj * Update CoreAPISpec.cs * Bump FsCheckVersion from 2.16.3 to 2.16.4 (#5724) Bumps `FsCheckVersion` from 2.16.3 to 2.16.4. Updates `FsCheck` from 2.16.3 to 2.16.4 - [Release notes](https://github.com/fsharp/FsCheck/releases) - [Changelog](https://github.com/fscheck/FsCheck/blob/master/FsCheck%20Release%20Notes.md) - [Commits](fscheck/FsCheck@2.16.3...2.16.4) Updates `FsCheck.Xunit` from 2.16.3 to 2.16.4 - [Release notes](https://github.com/fsharp/FsCheck/releases) - [Changelog](https://github.com/fscheck/FsCheck/blob/master/FsCheck%20Release%20Notes.md) - [Commits](fscheck/FsCheck@2.16.3...2.16.4) --- updated-dependencies: - dependency-name: FsCheck dependency-type: direct:production update-type: version-update:semver-patch - dependency-name: FsCheck.Xunit dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Fix Persistence.TCK specs (#5849) * DeleteMessagesFailure message should log its failure stack trace * Make JournalSpec facts overridable * Make error reporting standardized. * change `dev` version number to 1.5-x (#5856) * Zbynek001 sharding update2 (#5857) * Add Dropped to Akka.Actor (migrated partially from akka/akka#27160) Log Dropped from DeadLetterListener * Logging of UnhandledMessage (migrated from akka/akka#28414) * make use of the existing logging of dead letter also for UnhandledMessage Supress ActorSelectionMessage with DeadLetterSuppression (migrated from akka/akka#28341) * for example the Cluster InitJoin message is marked with DeadLetterSuppression but was anyway logged because sent with actorSelection * for other WrappedMessage than ActorSelectionMessage we shouldn't unwrap and publish the inner in SuppressedDeadLetter because that might loose some information * therefore those are silenced in the DeadLetterListener instead Better deadLetter logging of wrapped messages (migrated from akka/akka#28253) * MessageBuffer implementations * TestKit logger with prefix * sharding update * sharding tests * sharding multinode tests * api approval * replace sqlite with MemoryJournalShared and local snapshot store * tests * snapshot inmem * backwards compatible PersistenceId for PersistentShardCoordinator * test fix * SnapshotStoreProxy & MemorySnapshotStoreShared * test snapshot store switched to shared inmem * ExternalShardAllocationStrategy & tests * ExternalShardAllocationStrategy API approval * test timing fix * review comments addressed * IEquatable removed for singleton messages * test fixes * cleanup * test cleanup * protobuf generated * cleanup * cleanup * Race condition in DeprecatedSupervisionSpec fixed (migrated from akka/akka#29914) * cleanup * Small clarification of recovery strategy in config (migrated from akka/akka#30167) * Resolve snapshot check skipped for some events (migrated from akka/akka#30226) * additional sharding messages serialization, tests * api approval update * disable durable storage on ShardRegionSpec * extend timeout for ExternalShardAllocationSpec * naming conventions * missing readonly added, updated syntax * renaming conventions * Defer coordinator stop until region graceful stop has completed (migrated from akka/akka#30338) * sharding: actively signal 'region stopped' to the coordinator (migrated from akka/akka#30402) * racy test fix * racy test verbose logging * test update * merge fix * sharding ddata coordinator switch to ReadMajorityPlus/WriteMajorityPlus * more logs to debug tests * more logs * fix MultiNodeClusterSpec default timeouts * revert additional logs * override single-expect-default only for sharding tests * revert unrelated protobuf serializers * Fix StartEntitySpec instability (migrated from akka/akka#30537) The old logic allowed a race condition where the 'StartEntity' from the test arrived at the ShardRegion before the termination of the actor did, causing it to ignore the `StartEntity`. * Quieter logging for ShardCoordinator initialization (migrated from akka/akka#30488) Log the first retry on 'info', then 'warning', and finally 'error' * reduce default write-majority-plus for sharding (migrated from akka/akka#30328) * merge fix * rebalance log fix * fixed compilation error from rebase * switch RememberEntitiesShardIdExtractorChangeSpec from ddata to persistence * disable durable storage on PersistentShardingMigrationSpec * clean up leveldb configuration * fix XML-DOC warnings Co-authored-by: zbynek001 <zbynek001@gmail.com> Co-authored-by: Gregorius Soedharmo <arkatufus@yahoo.com> * Bump Swashbuckle.AspNetCore from 6.2.3 to 6.3.0 (#5848) Bumps [Swashbuckle.AspNetCore](https://github.com/domaindrivendev/Swashbuckle.AspNetCore) from 6.2.3 to 6.3.0. - [Release notes](https://github.com/domaindrivendev/Swashbuckle.AspNetCore/releases) - [Commits](domaindrivendev/Swashbuckle.AspNetCore@v6.2.3...v6.3.0) --- updated-dependencies: - dependency-name: Swashbuckle.AspNetCore dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * cleaned up some allocations and styling (#5855) * fix sharding recovery (#5863) * Update MNTR to 1.1.1 and update build script to suit (#5866) * GroupBy fixes (#5874) * Target incrementalist against v1.4 branch for v1.4 * Update MNTR to 1.1.1 and update build script to suit (#5867) (cherry picked from commit 2b4267e) * GroupBy pulls upstream when a substream materialization is waiting * Cancel GroupBy when all substreams cancel * Allow GroupBy to recreate already closed substreams * Fixes GroupBy does not invoke decider * Avoids memory being retained for GroupBy * Revert v1.4 merge * Fix markdownlint error Co-authored-by: Aaron Stannard <aaron@petabridge.com> Co-authored-by: Gregorius Soedharmo <arkatufus@yahoo.com> * Fix Source.ActorRef not completing (#5875) * Fix typo in `akka.remote.dot-netty.ssl.certificate` (#5895) * enable `ChannelTaskScheduler` to work inside Akka.Cluster without causing errors inside `/system` actors (#5861) (#5920) * close #5498 enable `ChannelTaskScheduler` to work inside Akka.Cluster without causing errors inside `/system` actors * fix `HeartbeatSender` * cleaned up SBR internals (style) * cleaned up some comments * asynchronously attempt to acquire `Cluster` inside SBR * fixed SBR compilation * Update SplitBrainResolver.cs * subscribe on PreStart * added .NET 6 dual targeting to all assemblies (#5926) * added .NET 6 dual targeting to all assemblies have not added and .NET 6-specific code yet, just added support for it in the build system * adding verify files per-runtime * added all .NET 6 files * moved files to their own folder * added all Verified files * Renamed method `SetHandler` accepting both callbacks to `SetHandlers` (#5931) * Rewrite actor ref sink as a graph stage (#5930) Co-authored-by: Aaron Stannard <aaron@petabridge.com> * Post-merge cleanup * Update API verify list * Post-merge fix and update API verify list * Fix XML doc * Post merge fix, docs * Fix XML Doc * Post merge fix * Post merge fix Co-authored-by: Ebere Abanonu <eaba@users.noreply.github.com> Co-authored-by: Aaron Stannard <aaron@petabridge.com> Co-authored-by: Gregorius Soedharmo <gregorius.soedharmo@petabridge.com> Co-authored-by: Ismael Hamed <1279846+ismaelhamed@users.noreply.github.com> Co-authored-by: Drew <laingas@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Adrian Leonhard <adrianleonhard@gmail.com> Co-authored-by: Simon Cropp <simon.cropp@gmail.com> Co-authored-by: zbynek001 <zbynek001@gmail.com>
Akka.Net v1.4.34 (was discovered in v1.4.17 as that's what we were using but verified to still exist in latest)
Running on .Net Core 3.1
If an Actor subscribes to a message Type on the EventStream and doesn't unsubscribe before dying the actor is garbage collected but the subscription remains and it keeps the ActorCell alive forever along with the Props used to create it.
In our app we have a set of Job actors that are loaded on demand and stay in memory until idle for a time at which point they kill themselves. This works great except the Props (ours are unique per instance due to a GUID in the parameters) and the ActorCell stay in memory forever due to the EventStream subscription never getting freed up. The EventStream subscription dictionaries continue to grow in size as well. We have 1000s of Job actors churning 24/7 so in a 12 hour period, I measured on average 2000 live Job actors at any given moment and 68,000 dead and gone at the end of the period. However 70,000 ActorCells and Props were still alive resulting in about 30MB/24 hours memory leakage at the current churn rate. The total memory leakage depends on the ActorCell overhead and the size of the Props as the contents of the Props are kept alive too - we had 7000 extra GUIDs alive after 12 hours too due to this as they are in the Props.
To repro this I wrote a test program that slowly creates and then slowly kills 100 actors (slow so I can see what's going on and snapshot it easily) and can be configured to either not subscribe at all, subscribe in PreStart and unsubscribe in PostStop or to only subscribe and not unsubscribe. I monitored the memory with SciTech Memory Profiler snapshotting before, during and after for each case.
If set to either not subscribe at all or sub and unsub all is well. All Props and ActorCells are gone except for a few ones used by Akka internally and my Manager Actor that creates my 100 instances.
If instances subscribe and are killed without unsubscribing (like our app), all ActorCells and Props stay around forever although the Actors are gone. It's the EventStream subscriptions holding the reference to the IActorRef (ActorCell) that's doing it. As you can see below I've got 107 ActorCells left. There's just the expected 7 when I unsub manually or don't sub at all.
I ran my test app linked directly with the Akka.Net source code so I could debug into the Akka code and the message to Register to unsubscribe is never received by the EventBusUnsubscriber so it never Context.Watch the actors therefore it doesn't care if they die and are still subscribed. I verified the EventBusUnsubscriber is running it just never does anything. I am unable to figure out why the Register message is never received but I think it never gets sent due to some issue with the code in EventStream.cs RegisterWithUnsubscriber. The IF never seems to evaluate to TRUE that I've seen. I can't figure out what this code is trying to avoid doing but it seems to avoid it all the time...it's too clever for me :-)
Explicitly unsubscribing Self in the PostStop of the Actor 100% fixes the issue.
Here is the code I believe the problem lies in.
Thanks
Tom
*edit - added Repro
AkkaMemoryLeakRepro.zip
The text was updated successfully, but these errors were encountered: