Increase Server Memory When Client program repeats ActorSelection and Terminated #3430

iris4865 · 2018-05-02T12:01:30Z

Akka version: 1.3.6
Akka.Remote: 1.3.6 (with DotNetty 0.4.6)
Platform: .Net Framework 4.7
Project link: https://github.com/iris4865/IssueStorages/tree/master/AkkaRemote_1.3.6/ActorSelection

Server memory usage has increased When Client program connects and disconnects. At the start of the program, the memory was 35.85MB. ActorSelection test result increased to 1.88GB. I did profiling with JetBrains DotMemory 2018.1. I think that Dotnetty.Buffers is not garbage collected.

Start Server program with Akka.Remote.
Client program connects to Server through Akka.Remote.
Client program sends the message to Server.
Server program reply to the Client program after processing the message.
Client program prints the result and exit.
Multiple clients repeat 2~5.
Memory usage has increased.

dotMemory profiling: https://github.com/iris4865/IssueStorages/raw/master/AkkaRemote_1.3.6/ActorSelection/ActorSelection-Akka1.3.6-DotNetty0.4.6.dmw

server code

namespace AkkaSelectionNettyEnvServer
{
    class Program
    {
        static void Main(string[] args)
        {
            ActorSystem system = ActorSystem.Create("Server");

            system.ActorOf(Props.Create(() => new HelloActor()), "HelloActor");

            system.WhenTerminated.Wait();
        }
    }

    public class HelloActor : ReceiveActor
    {
        public HelloActor()
        {
            Receive<string>(value =>
            {
                Console.WriteLine(value);
            });
        }
    }
}

client code

namespace AkkaSelectionNettyEnvClient
{
    class Program
    {
        static void Main(string[] args)
        {
            try
            {

                ActorSystem system = ActorSystem.Create("Client");

                var actor = system.ActorSelection("akka.tcp://Server@192.168.100.99:8083/user/HelloActor")
                    .ResolveOne(TimeSpan.FromSeconds(10))
                    .Result;

                actor.Tell("Hello");

                system.Terminate();
            }
            catch(Exception)
            {

            }
        }
    }
}

client executer code

    class Program
    {
        static void Main(string[] args)
        {
            if (args.Length == 0)
                ExecuteLoop("AkkaSelectionNettyEnvClient.exe");
            else
                ExecuteLoop(args[0]);
        }

        public static void ExecuteLoop(string filename)
        {
            ProcessStartInfo info = new ProcessStartInfo()
            {
                FileName = filename,
                CreateNoWindow = true,
                WindowStyle = ProcessWindowStyle.Hidden,
            };
            while (true)
            {
                List<Process> pList = new List<Process>
                {
                    Process.Start(info),
                };

                foreach (var p in pList)
                {
                    p.WaitForExit();
                    p.Dispose();
                }

                pList = null;
            }
        }
    }

p.s
I think Akka version 1.1.3 (with Akka.Remote 1.1.3 and Helios 2.1.2) is no problem.
so I am using version 1.1.3 temporarily.

The text was updated successfully, but these errors were encountered:

Danthar · 2018-05-02T12:39:26Z

Thx for profiling dump. Its very helpfull. We are investigating this issue.

Danthar · 2018-05-02T17:23:29Z

Ah i see you have updated your starting post. Yeah I came to the conclusion, massive amounts of byte[] are being rooted inside the dotnetty codebase. And at first glance their PoolThreadCache seems to be the culprit.

Although im still doing some experimentation to see if its really being rooted. Or if it just needs some time to be GC'ed. For example, i wonder what happens when you do a manual GC.collect.

Danthar · 2018-05-02T18:02:52Z

@iris4865 how long did you let your sample run before getting in the GB memory range ?
I have been running your sample on an windows 10 Intell I7 machine for over 10 minutes. And memory usage is stable at around 60mb.
At this point in don't expect anything spectacular to happen. So this means I cannot reproduce your sample.

What i have done is clone your repo and then:

Changed endpoint addresses to localhost and let it run for 15 mins.
Changed endpoint addresses to my local ip and let it run for 15 mins
Ran with debugger attached.
Ran without debugger attached.

In all cases memory usage was stable.

Can you provide more specifics about your test environment?

iris4865 · 2018-05-03T01:27:16Z

@Danthar
Running Environments

Intel i7-7700 3.60GHz
Memory 16GB
Window 10 Pro 64bit 1709 16299.371

I tried to manual GC.collect. GC.collect calls has increased, but memory has increased.
I have been testing program for total 41 hours.
After over 2 hours of starting program, Memory usage begins to increase rapidly.

Profiling result image(after 1 hour 55 minutes)

Total profiling image
(1) = 2 hours 11 minutes
(2) = 9 hours 52 minutes

last snapshot detail infos

Aaronontheweb · 2018-05-03T02:03:33Z

This is very detailed, thank you! Looks to me like an issue with buffers being released in DotNetty... I think this may have been fixed in their v0.4.8 branch but I'll need to confirm.

Danthar · 2018-05-03T06:34:00Z

Ah i didn't dig deep enough in that memory dump. Totally missed it ran for over 2 hours. Very odd behavior though, for it to balloon like that after 2 hours.

Danthar · 2018-05-03T09:53:50Z

Its interesting to note, that at a certain point in time, we start seeing these type of errors. And it seems to strongly correlate with the sudden increase in memory consumption. If its the cause of, or a symptom of, I don't know. /cc @Aaronontheweb

Danthar · 2018-05-03T16:30:48Z

Alright @iris4865 after some internal debate, we decided we are sticking with the conclusion that its something in DotNetty.
This leaves you with 2 options:

Wait for the next DotNetty release, since it sports some promising fixes regarding this issue.
https://github.com/Azure/DotNetty/blob/5eee925b7597c6b07689f25f328966e330ff58f9/RELEASE_NOTES.md
Use a new config flag we introduced in 1.3.6 to turn of buffer pooling in dotnetty.
https://github.com/akkadotnet/akka.net/blob/dev/src/core/Akka.Remote/Configuration/Remote.conf#L431

iris4865 · 2018-05-04T00:29:55Z

Thank you for the suggestion.

I wonder about your 2 options

When does DotNetty 0.4.8 release date?
Did you test it with new config flag more than 2 hours?

I also try to test it with new config flag.

Aaronontheweb · 2018-05-04T00:40:51Z

@iris4865 haven't tested it myself yet - I'm working on a custom DotNettyTransport implementation that uses the ResourceLeakDetector built into DotNetty to see if something isn't being garbage collected correctly. I'll let you know what I find out!

Aaronontheweb · 2018-05-04T17:48:07Z

https://github.com/Aaronontheweb/Akka.Remote.DiagnosticDotNettyTransport - finished the first cut of my diagnostic transport. I'm running the code sample you sent me with it now and I'm going to leave it for a couple of hours.

So far, not seeing anything that looks like a memory leak - the dumps we're taking of the buffer pools look like they're empty most of the time. I'll let you know what I turn up though. The large object heap remains steady at 32mb, which is where all of the pooled buffers live.

FYI, I'm running on .NET 4.6.1 since I don't have 4.7.1 installed.

Aaronontheweb · 2018-05-04T19:51:48Z

Alrighty, so the system was totally stable for about the first hour or so, but then something happened, randomly by the looks of it, which created a massive spike in memory consumption quickly. That memory usage jumped to about 1.8GB and then leveled off.

[INFO][5/4/2018 7:47:35 PM][Thread 0016][Akka.Remote.DiagnosticDotNettyTransport.Handlers.PoolAllocatorDumpHandler] [Channel:71f2d3db][WRITE] BufferStats:32 heap arena(s):
Chunk(s) at 0~25%:
Chunk(208E81D: 1%, 24576/16777216)
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
1: 2089: 1/512, offset: 8192, length: 8192, elemSize: 16)
small subpages:
1: 2088: 1/8, offset: 0, length: 8192, elemSize: 1024)

Took about 10 minutes to go from ~100mb total memory to 1.8GB total memory. Looks like nearly all of it went into the large object heap:

I'm going to analyze the snapshot in a second, but here's the key bit: it took a random event of some kind to trigger this massive increase in memory consumption suddenly. This tells me that we're looking at a race condition inside the IByteBufAllocator inside DotNetty somewhere.

Aaronontheweb · 2018-05-04T19:59:46Z

The relevant DotNetty metric reporting code that produced the data I captured in the previous comment can be found here:

https://github.com/Azure/DotNetty/blob/b8370d7b34aeb6c882999ffe1534a7fc651dcde7/src/DotNetty.Buffers/PooledByteBufferAllocator.cs#L363-L394

        public string DumpStats()
        {
            int heapArenasLen = this.heapArenas?.Length ?? 0;
            StringBuilder buf = new StringBuilder(512)
                    .Append(heapArenasLen)
                    .Append(" heap arena(s):")
                    .Append(StringUtil.Newline);
            if (heapArenasLen > 0)
            {
                // ReSharper disable once PossibleNullReferenceException
                foreach (PoolArena<byte[]> a in this.heapArenas)
                {
                    buf.Append(a);
                }
            }


            int directArenasLen = this.directArenas?.Length ?? 0;
            buf.Append(directArenasLen)
                .Append(" direct arena(s):")
                .Append(StringUtil.Newline);
            if (directArenasLen > 0)
            {
                // ReSharper disable once PossibleNullReferenceException
                foreach (PoolArena<byte[]> a in this.directArenas)
                {
                    buf.Append(a);
                }
            }


            return buf.ToString();
        }
    }

Looks to me like the issue that is, rather suddenly, DotNetty started allocating much larger heap pool buffers than it had been doing before. The average pool size jumped up to about 16mb at the mark when memory consumption started climbing...

Aaronontheweb · 2018-05-04T20:05:22Z

When I start the app clean the very first time, the total amount of buffer capacity is meant to be capped at 16mb total, roughly.

INFO][5/4/2018 8:03:26 PM][Thread 0014][Akka.Remote.DiagnosticDotNettyTransport.Handlers.PoolAllocatorDumpHandler] [Channel:2e50b69e][WRITE] BufferStats:32 heap arena(s):
Chunk(s) at 0~25%:
Chunk(19251CF: 1%, 8192/16777216)
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
1: 2088: 1/8, offset: 0, length: 8192, elemSize: 1024)
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
none
Chunk(s) at 25~75%:
none
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
none
Chunk(s) at 100%:
none
tiny subpages:
small subpages:

Aaronontheweb · 2018-05-04T20:08:02Z

That clip from above is the readout for all 32 of the individual heap buffers... When the memory spike happened, we were reading about 3x the amount of data inside the first chunk than we did in that same chunk at startup. There's also a correlation with the number of allocated objects too - it's roughly 3x during its period of high memory consumption than it was during its "steady state" for the first hour of execution or so.

Aaronontheweb · 2018-05-04T20:27:39Z

When I took my first memory snapshot earlier...

There were only two PoolChunk<byte[]> instances, which is where most of the memory consumption is now coming from. So that appears to be the difference here: there's a bunch of additional chunks being allocated and being retained by something inside the app.

Aaronontheweb · 2018-05-04T20:28:23Z

I have a theory that this might be something we introduced in 1.3.7 as a result of my trying to work around some other DotNetty buffer-related issues in 1.3.6. I'm going to test it now using my instrumented transport and let you know what I find.

kimbyungeun · 2018-05-05T00:18:34Z

@iris4865 and I are the same team.
We appreciate your team's active response.

We tested the proposed option.
But memory still grows.

Use a new config flag we introduced in 1.3.6 to turn of buffer pooling in dotnetty.
https://github.com/akkadotnet/akka.net/blob/dev/src/core/Akka.Remote/Configuration/Remote.conf#L431
enable-pooling = true

Aaronontheweb · 2018-05-05T14:12:15Z

Ok, this is caused by a bug I introduced in v1.3.6. I was able to submit a fix to it in my DiagnosticDotNettyTransport project and demonstrate that the issue did not occur again within 15-16 hours of continuous use. PR to come to Akka.NET shortly.

Aaronontheweb · 2018-05-05T17:09:23Z

Welp, I'm not so sure my original theory was right. I left a comment on#3436 about why.

I'm running a new test using your reproduction sample with the latest DotNetty v0.4.8 bits. Going to let it run for 24 hours and let you know what I find!

iris4865 · 2018-05-06T04:41:18Z

I tested with pooling config.

Dotnetty 0.4.6
enable-pooling = false

Memory usage is stable when the pooling setting is disabled. But I think this is a temporary solution.

Profiling Link: https://raw.githubusercontent.com/iris4865/IssueStorages/master/AkkaRemote_1.3.6/ActorSelection/ActorSelection-Akka_1.3.6-DotNetty-0.4.6-pool%3Dfalse.dmw

snapshot#7 info.

@Aaronontheweb
I expect that testing with DotNetty v0.4.8 will solve the increase in memory consumption problems.

iris4865 · 2018-05-06T05:20:49Z

@Aaronontheweb
I upgraded DotNetty.Handler from 0.4.6 to 0.4.8 and tested the samples. My test results still increase in memory consumption.

Link: https://github.com/iris4865/IssueStorages/raw/master/AkkaRemote_1.3.6/ActorSelection/ActorSelection-Akka_1.3.6-DotNetty-0.4.8.dmw

Snapshot#4 info.

Aaronontheweb · 2018-05-07T14:21:17Z

@iris4865 looks like my fixes in #3436 really did help - I do see some slight increases in memory pool usage over time from DotNetty, but we're talking like 32mb to 76mb over the span of 36 hours. Not the 1.6GB we were looking at earlier. Was able to reproduce this using both v0.4.6 and v0.4.8.

I think this is because the singleton I'm using in #3436 actually behaves differently than the allocator that I'm new-ing up:

https://github.com/Azure/DotNetty/blob/dev/src/DotNetty.Buffers/PooledByteBufferAllocator.cs#L132 - what we currently use (in 1.3.6 only)

What we are reverting back to using in 1.3.7 if #3436 gets merged in:

https://github.com/Azure/DotNetty/blob/dev/src/DotNetty.Buffers/PooledByteBufferAllocator.cs#L116

This configuration prefers direct buffers, when available, and thus utilizes the pool less frequently, depending on the runtime.

Aaronontheweb · 2018-05-07T14:42:54Z

I've been taking a look at the DotNetty single thread event executor and wondering if between what we do with Akka.Remote and what DotNetty does internally, if it's possible that we're churning threads - i.e. killing off less-used threads and creating new threads later when there's more work to be done.

https://github.com/Azure/DotNetty/blob/dev/src/DotNetty.Common/Concurrency/SingleThreadEventExecutor.cs - source for the STEE in DotNetty. Looks like they keep the threads and park them, periodically, with timeouts.

Reason I mention this is that DotNetty's internal shared resources, such as buffer pools, operate using ThreadLocal internally. It's possible that if we churn threads (i.e. kill off a thread such that it can't be re-used again) that this could actually cause a gradual creep in memory consumption over time, given that ThreadLocal greedily allocates objects upon invocation inside a new thread for the first time... and I would have assumed that ThreadLocal objects are released from memory when the thread exits, but there's some evidence that the CLR GC doesn't work that way: https://stackoverflow.com/questions/7567088/threadlocal-and-memory-leak

Reason I mention this is because I'm pretty sure that the custom ThreadPool implementation we use for the dispatchers that the Akka.Remote /system actors run on does churn threads: https://github.com/akkadotnet/akka.net/blob/dev/src/core/Akka/Helios.Concurrency.DedicatedThreadPool.cs

So it's possible those actors could drive up memory consumption if those old ThreadLocal instances held onto by DotNetty are never properly collected. I should also note that I believe the main global static ThreadPool in the CLR behaves the same way: that it will periodically release some threads during periods of low activity (if there's no work available in the queue.) I'd need to look at the reference source to be sure.

Compounding this to some extent... DotNetty also uses a custom ThreadLocal implementation, FastThreadLocal, the source of which you can see here: https://github.com/Azure/DotNetty/blob/7aef9c75001ecd926da025540cd2c968510c7dd1/src/DotNetty.Common/FastThreadLocal.cs

When using direct buffers, we avoid all of this - because nothing is pooled into thread-local storage. I think I'm going to need to do some testing against some these threading scenarios and see if I can demonstrate a leak with the thread-local storage one way or another.

Aaronontheweb · 2018-05-07T19:07:30Z

Closed it because I forgot to update my commit message :p

Aaronontheweb · 2018-05-08T01:39:14Z

Ok, this makes me feel like I'm not crazy - looks like gRPC ran into a similar problem with ThreadLocal pools of resources being leaked when threads churned: grpc/grpc#14021

That's exactly the sort of behavior I'm seeing during these long-running memory tests. It's not as severe when running with the changes from #3436, which BTW are available in the latest nightlies now @iris4865 - you can get them here: http://getakka.net/community/getting-access-to-nightly-builds.html

iris4865 · 2018-05-09T03:37:09Z

@Aaronontheweb
I upgraded Akka.Remote from 1.3.6 to 1.3.7-beta503 and tested the samples. My test results slowly increase 22mb to 28mb over the span of 10 hours in memory consumption.

test project with nightly version
Link: https://github.com/iris4865/IssueStorages.git

Aaronontheweb · 2018-05-09T20:38:20Z

@iris4865 so it looks like the changes there had a positive effect. That's probably going to be the best we can do in the short run - I think we're going to want to write some tests to check for leaks in DotNetty's pooling code, but there's not much that we can do to influence their release cycle other than submit pull requests and ask nicely.

Worth noting: it's possible that memory consumption will increase over time when using Akka.Remote since we do things like cache deserialized ActorPaths and RemoteActorRefs for performance reasons. We never flush that cache, but the size of it is capped at ~2048 or 4096 entries (I'd need to look at our config to remember exactly) and we use an LRU algorithm to purge the least-frequently used entities once the cache is full.

When you're churning through new connections like how your reproduction sample does, that cache will slowly fill up over time. That could also be responsible for a small gradual memory increase (2-4mb) but nothing dramatic like a big 16mb chunk (which is the size of the heap arenas DotNetty allocates.)

I'm going to open some issues on the DotNetty project about this, but I think we've resolved the major issues in the short run - do you agree?

kimbyungeun · 2018-05-10T00:26:04Z

@Aaronontheweb
Yes agree. Thank you, and very helpful.
Our team will decide whether to use the beta version or wait for a stable version.
Please close the issue. @iris4865 He is on vacation.

Let's meet again in the next issue. ^^

Aaronontheweb · 2018-05-10T02:54:45Z

Let's meet again in the next issue. ^^

Sounds like a plan! Closing.

Aaronontheweb added akka-remote confirmed bug dotnetty labels May 2, 2018

Danthar removed the confirmed bug label May 2, 2018

This was referenced May 4, 2018

experimenting with singleton allocators Aaronontheweb/Akka.Remote.DiagnosticDotNettyTransport#3

Merged

Akka.Remote: cache client and server bootstrappers for DotNetty #3434

Closed

Aaronontheweb added a commit to Aaronontheweb/akka.net that referenced this issue May 5, 2018

close akkadotnet#3430 - use singleton allocators for DotNetty channels.

c2279db

This was referenced May 5, 2018

Akka.Remote memory leak fix: use singleton allocators for DotNetty channels. #3436

Merged

Upgrade to DotNetty v0.4.8 #3437

Closed

Aaronontheweb closed this as completed in #3436 May 7, 2018

Aaronontheweb added a commit that referenced this issue May 7, 2018

close #3430 - use singleton allocators for DotNetty channels. (#3436)

843ecc2

Aaronontheweb reopened this May 7, 2018

Aaronontheweb closed this as completed May 10, 2018

Aaronontheweb added this to the 1.3.7 milestone May 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase Server Memory When Client program repeats ActorSelection and Terminated #3430

Increase Server Memory When Client program repeats ActorSelection and Terminated #3430

iris4865 commented May 2, 2018 •

edited

Loading

Danthar commented May 2, 2018

Danthar commented May 2, 2018

Danthar commented May 2, 2018 •

edited

Loading

iris4865 commented May 3, 2018 •

edited

Loading

Aaronontheweb commented May 3, 2018

Danthar commented May 3, 2018

Danthar commented May 3, 2018

Danthar commented May 3, 2018

iris4865 commented May 4, 2018 •

edited

Loading

Aaronontheweb commented May 4, 2018

Aaronontheweb commented May 4, 2018

Aaronontheweb commented May 4, 2018

Aaronontheweb commented May 4, 2018

Aaronontheweb commented May 4, 2018

Aaronontheweb commented May 4, 2018 •

edited

Loading

Aaronontheweb commented May 4, 2018

Aaronontheweb commented May 4, 2018

kimbyungeun commented May 5, 2018 •

edited

Loading

Aaronontheweb commented May 5, 2018

Aaronontheweb commented May 5, 2018

iris4865 commented May 6, 2018 •

edited

Loading

iris4865 commented May 6, 2018

Aaronontheweb commented May 7, 2018

Aaronontheweb commented May 7, 2018

Aaronontheweb commented May 7, 2018

Aaronontheweb commented May 8, 2018

iris4865 commented May 9, 2018

Aaronontheweb commented May 9, 2018

kimbyungeun commented May 10, 2018

Aaronontheweb commented May 10, 2018

Increase Server Memory When Client program repeats ActorSelection and Terminated #3430

Increase Server Memory When Client program repeats ActorSelection and Terminated #3430

Comments

iris4865 commented May 2, 2018 • edited Loading

Danthar commented May 2, 2018

Danthar commented May 2, 2018

Danthar commented May 2, 2018 • edited Loading

iris4865 commented May 3, 2018 • edited Loading

Aaronontheweb commented May 3, 2018

Danthar commented May 3, 2018

Danthar commented May 3, 2018

Danthar commented May 3, 2018

iris4865 commented May 4, 2018 • edited Loading

Aaronontheweb commented May 4, 2018

Aaronontheweb commented May 4, 2018

Aaronontheweb commented May 4, 2018

Aaronontheweb commented May 4, 2018

Aaronontheweb commented May 4, 2018

Aaronontheweb commented May 4, 2018 • edited Loading

Aaronontheweb commented May 4, 2018

Aaronontheweb commented May 4, 2018

kimbyungeun commented May 5, 2018 • edited Loading

Aaronontheweb commented May 5, 2018

Aaronontheweb commented May 5, 2018

iris4865 commented May 6, 2018 • edited Loading

iris4865 commented May 6, 2018

Aaronontheweb commented May 7, 2018

Aaronontheweb commented May 7, 2018

Aaronontheweb commented May 7, 2018

Aaronontheweb commented May 8, 2018

iris4865 commented May 9, 2018

Aaronontheweb commented May 9, 2018

kimbyungeun commented May 10, 2018

Aaronontheweb commented May 10, 2018

iris4865 commented May 2, 2018 •

edited

Loading

Danthar commented May 2, 2018 •

edited

Loading

iris4865 commented May 3, 2018 •

edited

Loading

iris4865 commented May 4, 2018 •

edited

Loading

Aaronontheweb commented May 4, 2018 •

edited

Loading

kimbyungeun commented May 5, 2018 •

edited

Loading

iris4865 commented May 6, 2018 •

edited

Loading