Memory Leak in 1.1.0 #1260

ghost · 2016-12-15T15:13:40Z

Since updating to 1.1.0 (on Win10 64bit) my server keeps using memory at about ~0.5 Mb per minute. Is there a new setting that must be set or is there a memory leak in 1.1.0?

It's very easy to reproduce:

Create a new "ASP.NET Core Web Application" in VS2015 (The wizard will create a 1.0.1 application). Start the application, the "Diagnostics Tools" in VS2015 will show a usage of about ~245.2Mb and will stay there.
Now update the app to netcoreapp1.1 (as a further test, leave Microsoft.AspNetCore.Server.Kestrel at version 1.0.0 and it will still not use more memory)
Use the new 1.1.0 of Microsoft.AspNetCore.Server.Kestrel and start the "Diagnostic Tool" again: It will start at ~245.2Mb again, but then the "Process Memory" keeps increasing

pakrym · 2016-12-15T16:43:18Z

I wasn't able to reproduce it's locally, can you provide your project.json please?

pakrym · 2016-12-15T17:12:54Z

Okay, I think I can see it, much slower on my machine and it looks like it leaks native memory because managed heap do not change in between memory snapshots.

@CesarBS could it be something related to timers?

/cc @halter73

pakrym · 2016-12-15T17:25:58Z

Yep, there is a leak and it is related to timers, changing public const long HeartbeatMilliseconds = 1000; to 10 makes it really obvious:

cesarblum · 2016-12-15T18:26:41Z

@pakrym I'll look into it when I'm in the office.

agoretsky · 2016-12-15T18:48:36Z

After upgrading all our app dependencies to 1.1.0 we also got memory leak problem on Linux (the process growing after start), but I was not able to reproduce it on Windows and unfortunately there is no tools to profile memory like dotMemory on Linux, so we rolled back to 1.0.1. I hope it's related and will be fixed

benaadams · 2016-12-16T13:23:22Z

@agoretsky is this from cert use? (Releated bug and repo you raised in coreclr https://github.com/dotnet/coreclr/issues/8660); or unreleated issue?

agoretsky · 2016-12-16T14:47:09Z

@benaadams, yes it is due to raised issue

ghost · 2016-12-19T08:13:14Z

I build the 1.1 server with the code changes from PR 1261 and the leak is gone.

pakrym · 2016-12-19T16:08:14Z

@meichtf thank you for testing.

ghost · 2016-12-19T17:28:24Z

I hope to see this in the 1.1.1 release.

markvincze · 2017-05-19T08:57:33Z

Hey all,
Was this actually fixed in 1.1.1? The same thing is happening with my api (which is by the way completely idle, except its status endpoint—which is only returning a hard-coded 200—being polled periodically) on Linux, its memory usage is slowly but steadily increasing.

avezenkov · 2017-06-21T13:06:24Z

.net core 1.1.1, a self-contained package, kestrel, linux - no requests, just leaving the process running - memory grows slowly until it's all eaten by the dotnet core process.

markvincze · 2017-08-03T09:51:14Z

Hi @pakrym,

Do you have any hints how to avoid this problem? We are running some ASP.NET Core APIs in docker images (based on Debian), and they all have a constantly increasing memory usage, even in our development and staging environments, where they are—except for health check calls—basically idle.

We are seeing memory usage patterns like these (these are from 2 different APIs, and they are constantly being restarted because they run out of memory):

Is this a tracked issue on Linux which will be solved eventually? Might the 2.0 release bring any remedy to this? Or this should not happen and we're doing something wrong?

Thanks in advance!

markvincze · 2017-08-03T10:03:43Z

And just one additional piece of info: we have one single API which is not using C# and ASP.NET Core MVC, but it's using F# and Suave (although still on top of Kestrel and ASP.NET Core). And its memory usage is completely stable, even in production, where it has some (although not huge) load.

avezenkov · 2017-08-03T13:45:05Z

REOPEN!

halter73 · 2017-08-03T18:51:44Z

@avezenkov Feel free to open a new issue with more details. It should have information about what the project looks like. Is it a template project? If so, which one and did you make any modifications? If not, could you upload the project to a github repo? What packages and runtime are you using? Are you running a debug or release configuration? Are you using dotnet run or dotnet myAlreadyBuilt.dll? At what rate is the usage memory growing? What type of memory is growing (e.g. size/resident/shared/etc...)? Which distro and version are you running on?

After the memory has grown for a while, could you take a memory dump using gcore? If I'm lucky I might be able to use that to see what's on the managed heap. If you want to take a stab at it yourself, you can read guides here and here that go into detail on how to debug dotnet linux processes using lldb and libsosplugin.so.

I don't want to overwhelm you. If any of these details are too difficult to come by, you can leave them out for now and maybe I'll still be able to get a repro. But right now, I have almost nothing to go on.

We investigated this issue, fixed it, and verified the fix prior to releasing 1.1.1. I haven't seen the memory leak issue on linux since then which I test on a lot, but that's without running MVC or other higher-level services since I primarily work on lower levels of the stack. Just now, I reinstalled 1.1.1 on on my Debain 8.9 VM and printed out the /proc/<PID>/statm statistics of an idle ASP.NET Core/Kestrel process for over 30 minutes. The numbers haven't budged since a minute after the process started.

avezenkov · 2017-08-06T18:03:28Z

@halter73 I see. Thanks for your efforts. I'll try to isolate the problem and describe the details, upload a demo project etc. Net core is our hope for a minimal, modular, stable, multiplatform framework.

avezenkov · 2017-08-06T18:06:06Z

@halter73 And thanks for the linux debugging guides. I'll create a new issue once I have more details.

markvincze · 2017-08-10T17:39:23Z

Btw. I have some new information about our experience, I'll share it for future reference, or to see if someone might have an idea about it.

It seems that the APIs we have don't increase their memory usage infinitely, it's just that their usage is unexpectedly high. We are running these APIs in Kubernetes in Google Cloud. Previously we had their memory limit set to 500MB, and they were exceeding it and were being constantly restarted.
Then we increased the limit to 1000MB, and since then they work fine, their memory usage seems to peak around 650MB, and is not increasing much further.

This is an example of the recent memory usage of some of the instances.

This usage still seems to be excessive, and I couldn't find an explanation for it. We can only reproduce it in Kubernetes in GCE, and nowhere else. I tried running and load testing the same API

in dev configuration in VS
on Windows with a production build
on Ubuntu with a production build
in Docker, using the actual production image

But in none of these tests have I seen the same excessive memory usage, in all the tests the memory consumption peaked at around ~150MB.

Did anyone else encounter something similar? (The last thing I can think of which is a difference between the tests and the production scenario is that Google Cloud is using a specialized OS, the Container Optimized OS to run the production docker images, but I haven't got around to investigate how I could test with that.)

benaadams · 2017-08-10T19:46:57Z

Sounds like the memory cap on the container is not being communicated to the GC? (So it goes over 500MB thinking it has more?)

markvincze · 2017-08-10T20:14:50Z

@benaadams I was thinking about that.

In Kubernetes the specified memory limit is passed to Docker in the --memory flag, so when we used 500MB as the Kubernetes limit, that should've affected the Docker image.
And I've tried locally running this API with Docker and passing in --memory 100M, then the app nicely stayed exactly under 100MB and was still working fine. (And if I did --memory 4096M, then it still only used ~150MB.)

Do you know how we could verify that this is not the problem?

Drawaes · 2017-08-10T20:31:24Z

I presume docker locally and on the cloud test are x64 so I will ignore that for now. What is the reported core count in both local and remote tests? It has a decent impact on memory held by the GC

davidfowl · 2017-08-10T20:44:21Z

The change was here dotnet/coreclr#10064

markvincze · 2017-08-13T12:52:12Z

Hi @Drawaes,

I tried printing Environment.ProcessorCount on all environments, I see the following:

On my machine, just doing dotnet run, the value is 4.
On my machine, with Docker, it's 1.
On Google Cloud Kubernetes it's 8.

Can this cause the issue? Is there any other diagnostics I should print?

Btw. what I tried in the meantime is to use different base Docker images. Originally I was using a custom image on top of debian with a self-contained app (published with the --runtime flag). Then I also tried microsoft/dotnet:1.1.2-runtime and gcr.io/google-appengine/aspnetcore:1.1 (and publish without --runtime and just run the dll with dotnet), but all of them produced the same memory usage.

Drawaes · 2017-08-13T13:08:19Z

This makes a big difference. By default asp.net core runs on the server GC. This tends to allow memory use to grow a lot more, preferring throughput over memory use. However it doesn't matter what you set the garbage collector to (workstation or server) it will go to workstation mode if there is a single CPU. Try forcing your app to workstation and re-running on google cloud, or resource limiting your container to a single CPU. If you want to force your application you can do it in the csproj with this setting

<PropertyGroup> 
    <ServerGarbageCollection>true</ServerGarbageCollection>
</PropertyGroup>

That should give you the same behaviour as you see locally. (You need to make sure your launchsettings.json isn't overriding this value as well).

Try that and see if the results match what you see on docker locally.

Drawaes · 2017-08-13T13:11:54Z

For your reference for the difference between the server and workstation GC just look at the size of the ephemeral segments it allocates.

Your docker because of the single cpu will have the Workstation one running, and the google cloud one the Server one running with the category > 4 cpus (2gb). Using workstation GC on multi procs will use a bit more cpu but will keep your memory use lower (more collections etc).
https://docs.microsoft.com/en-us/dotnet/standard/garbage-collection/fundamentals#the_managed_heap

benaadams · 2017-08-13T13:35:50Z

The container memory cap was merged 23 Mar 2017
1.1.0 was released 16 Nov 2016

Guess you might have to use Net CoreApp 2,0 to have the the caps work? (Can still use same version of ASP.NET Core)

Drawaes · 2017-08-13T13:52:47Z

Sure, but the behaviour was okay on containers locally, where it was a single cpu, but not on the google containers where there were 8 "cpus" visible to the container. So while you may still have that issue "the cap is not seen correctly" the workstation CPU is saving your bacon locally and keeping the memory limit low.

markvincze · 2017-08-13T14:32:31Z

@Drawaes @benaadams,

Thanks a lot for the help guys, I didn't know about the difference the number of CPUs cause.
I'll try the csproj change first, and then also an update to 2.0 preview, I'll keep this thread updated how it went.

Btw. @Drawaes if I understand correctly the problem might be that in Google Cloud, .NET is using the Server GC, thus allocating much more memory. And what I'd have to try is to force Workstation GC, right? Then shouldn't the flag in the csproj be false?

<PropertyGroup> 
    <ServerGarbageCollection>false</ServerGarbageCollection>
</PropertyGroup>

Drawaes · 2017-08-13T14:33:55Z

Yes sorry... it should be false... typo :) good luck and tell us how you get on for sure

markvincze · 2017-08-13T20:12:32Z

@Drawaes,

The change to Workstation GC seems to have helped! This is what the memory usage looks like after the deployment.

(This is still on the development deployment, so this API is not getting any real load. I'll test it on production soon.)

Btw. before doing the csproj change, I printed System.Runtime.GCSettings.IsServerGC, and I got true on all environments (locally debugging with VS, locally running with docker, running in Kubernetes). So it didn't use workstation mode even on local Docker, where the reported count of CPUs was 1.

Do I understand correctly that the problem was caused by the combination of these two things?

Server GC tends to be more "greedy" and allocates more aggressively, so the memory usage increases to a much higher value than it does with Workstation GC (which happened to be around 600MB on my environment).
.NET 1.1 does not adhere to the memory limits on Linux, but this will be fixed in 2.0.

So I should be able to switch back Server GC once I upgraded to .NET 2.0, and it'll respect the memory limit coming with --memory, right?

benaadams · 2017-08-13T21:08:19Z

Users allocate, GC doesn't 😉

I'd phrase it more as

Server GC trades memory in favor of throughput; you will have a lower max (app) throughput with Workstation GC - but use less memory
.NET 1.1 does not adhere to the memory limits set by the docker container, this is fixed in 2.0

This might be a good primer to show the difference: https://docs.microsoft.com/en-us/dotnet/standard/garbage-collection/fundamentals

Generally you want to be using Server GC (the defaults) for a server app.

I have a request in for manual limits https://github.com/dotnet/coreclr/issues/11338

markvincze · 2017-08-14T13:02:23Z

@benaadams thanks for the clarification!

Just a quick update: I also deployed to production, where the API has some load (~2 requests per second), and the memory usage seems to be stable so far:

Drawaes · 2017-08-14T13:35:27Z

I am not 100% sure that server GC with 8 cores but a 1gb limit is ever a good idea. The core to memory ratio is out of wack. You get heaps per core in server gc.

JorritSalverda · 2017-08-14T13:58:46Z

I'm trying to figure out if the cpu limit in Kubernetes actually limits the amount of cores it runs on or is just a cap on how much time a process is allowed to spend on all cores of the host combined.

The particular application @markvincze mentions has a Kubernetes resource limit of 250m, which is a quarter of a core. However the host is an 8-core machine, hence the Environment.ProcessorCount property shows 8 cores, even though it will never be able to use 8 cores to the full max. It might be able to a fraction of each core though, making it crucial that it's treated by the .NET framework as an 8-core.

Drawaes · 2017-08-14T14:14:06Z

It will be treated as 8 core. The issue is 8 core with 1gb of ram. You have 125mb per core and with the server gc that is never going to be a good situation. I would recommend the workstation GC always in that scenario.

benaadams · 2017-08-14T19:21:51Z

@markvincze .NET Core 2.0 which has the fix has just been released https://blogs.msdn.microsoft.com/dotnet/2017/08/14/announcing-net-core-2-0/

(Also ASP.NET Core 2.0 has also just been released https://blogs.msdn.microsoft.com/webdev/2017/08/14/announcing-asp-net-core-2-0/)

markvincze · 2017-08-14T19:30:32Z

@benaadams thanks for the info! (I'm working on upgrading my API already 🙂)

Mookker · 2017-08-15T16:07:56Z

Hello @markvincze! Did
<PropertyGroup> <ServerGarbageCollection>false</ServerGarbageCollection> </PropertyGroup>

Helped in 1.1.0 or you had to upgrade to 2.0?

Drawaes · 2017-08-15T16:12:03Z

It helped according to his report back.

Mookker · 2017-08-15T16:14:36Z

@Drawaes thank you, will try it

markvincze · 2017-08-15T16:29:35Z

@Mookker,

Yes, it helped. Since then I upgraded to 2.0, but I still had to keep <ServerGarbageCollection>false</ServerGarbageCollection, otherwise the memory usage was still very high.

tmds · 2017-08-16T11:37:22Z

Detecting docker memory limit is in 2.0 (dotnet/coreclr@b511095)
Detecting docker cpu limit isn't in 2.0 (dotnet/coreclr@df214e6).

Drawaes · 2017-08-16T15:10:09Z

The memory limit was potentially an issue, but I think google was actually reporting 8 cores to the container. If that is the case then the server GC is never going to be comfortable with 500mb memory limit. Would be interesting to hear back what the Environment reports for CPU count now.

benaadams · 2017-08-18T09:43:22Z

Yes, it helped. Since then I upgraded to 2.0, but I still had to keep false</ServerGarbageCollection, otherwise the memory usage was still very high.

@markvincze does it hit the memory threshold with 2.0 and get restarted or just use all the memory with 2.0? Hopefully it doesn't hit it; forcing a restart (prob via OutOfMemoryException).

Else that's probably a bug 😄

markvincze · 2017-08-18T21:41:31Z

@benaadams After upgrading to 2.0, if I switch to Server GC, it still runs over the limit and gets restarted. For example with setting the limit to 300MB it looks like this (graph is a bit flaky):

benaadams · 2017-08-18T21:46:43Z

Probably should raise an issue in https://github.com/dotnet/coreclr about that

markvincze · 2017-08-18T22:39:48Z

The interesting thing is that I can't reproduce the same issue with local Docker on my machine. Even if I pass in a small value with --memory, like 150m, the container doesn't crash with an out of memory error, and if I check docker stats, I see that the memory usage only increases until it reaches the specified limit, and it stops exactly there, and the API keeps working.

I'm trying to find now a way to verify if Kubernetes is actually passing in the memory resource limit as --memory.

benaadams · 2017-08-20T22:30:33Z

Looks like someone else might be having container issue dotnet/core#871

ghost changed the title ~~Memmory Leak in 1.1.0~~ Memory Leak in 1.1.0 Dec 15, 2016

pakrym mentioned this issue Dec 15, 2016

Prevent closure allocations in OnHeartbeat #1261

Merged

pakrym closed this as completed Dec 19, 2016

cesarblum assigned pakrym Dec 19, 2016

cesarblum added bug 3 - Done labels Dec 19, 2016

This was referenced Dec 19, 2016

Memory Leak in 1.1.0 #1264

Closed

Virtual Memory consumption on CentOS #1266

Closed

tmds mentioned this issue Jan 31, 2020

cgroup limits are not detected in docker dotnet/runtime#8777

Closed

Memory Leak in 1.1.0 #1260

Memory Leak in 1.1.0 #1260

Comments

ghost commented Dec 15, 2016 • edited by ghost Loading

pakrym commented Dec 15, 2016

pakrym commented Dec 15, 2016

pakrym commented Dec 15, 2016

cesarblum commented Dec 15, 2016

agoretsky commented Dec 15, 2016

benaadams commented Dec 16, 2016

agoretsky commented Dec 16, 2016 • edited Loading

ghost commented Dec 19, 2016

pakrym commented Dec 19, 2016

ghost commented Dec 19, 2016

markvincze commented May 19, 2017 • edited Loading

avezenkov commented Jun 21, 2017

markvincze commented Aug 3, 2017

markvincze commented Aug 3, 2017

avezenkov commented Aug 3, 2017

halter73 commented Aug 3, 2017 • edited Loading

avezenkov commented Aug 6, 2017

avezenkov commented Aug 6, 2017

markvincze commented Aug 10, 2017 • edited Loading

benaadams commented Aug 10, 2017

markvincze commented Aug 10, 2017

Drawaes commented Aug 10, 2017

davidfowl commented Aug 10, 2017

markvincze commented Aug 13, 2017

Drawaes commented Aug 13, 2017

Drawaes commented Aug 13, 2017

benaadams commented Aug 13, 2017 • edited Loading

Drawaes commented Aug 13, 2017 • edited Loading

markvincze commented Aug 13, 2017

Drawaes commented Aug 13, 2017

markvincze commented Aug 13, 2017 • edited Loading

benaadams commented Aug 13, 2017 • edited Loading

markvincze commented Aug 14, 2017

Drawaes commented Aug 14, 2017

JorritSalverda commented Aug 14, 2017

Drawaes commented Aug 14, 2017

benaadams commented Aug 14, 2017

markvincze commented Aug 14, 2017

Mookker commented Aug 15, 2017

Drawaes commented Aug 15, 2017

Mookker commented Aug 15, 2017

markvincze commented Aug 15, 2017

tmds commented Aug 16, 2017

Drawaes commented Aug 16, 2017

benaadams commented Aug 18, 2017 • edited Loading

markvincze commented Aug 18, 2017

benaadams commented Aug 18, 2017

markvincze commented Aug 18, 2017

benaadams commented Aug 20, 2017

ghost commented Dec 15, 2016 •

edited by ghost

Loading

agoretsky commented Dec 16, 2016 •

edited

Loading

markvincze commented May 19, 2017 •

edited

Loading

halter73 commented Aug 3, 2017 •

edited

Loading

markvincze commented Aug 10, 2017 •

edited

Loading

benaadams commented Aug 13, 2017 •

edited

Loading

Drawaes commented Aug 13, 2017 •

edited

Loading

markvincze commented Aug 13, 2017 •

edited

Loading

benaadams commented Aug 13, 2017 •

edited

Loading

benaadams commented Aug 18, 2017 •

edited

Loading