Skip to content
This repository has been archived by the owner on Dec 18, 2018. It is now read-only.

Memory Leak in 1.1.0 #1260

Closed
ghost opened this issue Dec 15, 2016 · 49 comments
Closed

Memory Leak in 1.1.0 #1260

ghost opened this issue Dec 15, 2016 · 49 comments
Assignees

Comments

@ghost
Copy link

ghost commented Dec 15, 2016

Since updating to 1.1.0 (on Win10 64bit) my server keeps using memory at about ~0.5 Mb per minute. Is there a new setting that must be set or is there a memory leak in 1.1.0?

It's very easy to reproduce:

  1. Create a new "ASP.NET Core Web Application" in VS2015 (The wizard will create a 1.0.1 application). Start the application, the "Diagnostics Tools" in VS2015 will show a usage of about ~245.2Mb and will stay there.
  2. Now update the app to netcoreapp1.1 (as a further test, leave Microsoft.AspNetCore.Server.Kestrel at version 1.0.0 and it will still not use more memory)
  3. Use the new 1.1.0 of Microsoft.AspNetCore.Server.Kestrel and start the "Diagnostic Tool" again: It will start at ~245.2Mb again, but then the "Process Memory" keeps increasing
@ghost ghost changed the title Memmory Leak in 1.1.0 Memory Leak in 1.1.0 Dec 15, 2016
@pakrym
Copy link
Contributor

pakrym commented Dec 15, 2016

I wasn't able to reproduce it's locally, can you provide your project.json please?

@pakrym
Copy link
Contributor

pakrym commented Dec 15, 2016

Okay, I think I can see it, much slower on my machine and it looks like it leaks native memory because managed heap do not change in between memory snapshots.

@CesarBS could it be something related to timers?

/cc @halter73

@pakrym
Copy link
Contributor

pakrym commented Dec 15, 2016

Yep, there is a leak and it is related to timers, changing public const long HeartbeatMilliseconds = 1000; to 10 makes it really obvious:

image

@cesarblum
Copy link
Contributor

@pakrym I'll look into it when I'm in the office.

@agoretsky
Copy link

After upgrading all our app dependencies to 1.1.0 we also got memory leak problem on Linux (the process growing after start), but I was not able to reproduce it on Windows and unfortunately there is no tools to profile memory like dotMemory on Linux, so we rolled back to 1.0.1. I hope it's related and will be fixed

@benaadams
Copy link
Contributor

@agoretsky is this from cert use? (Releated bug and repo you raised in coreclr https://github.com/dotnet/coreclr/issues/8660); or unreleated issue?

@agoretsky
Copy link

agoretsky commented Dec 16, 2016

@benaadams, yes it is due to raised issue

@ghost
Copy link
Author

ghost commented Dec 19, 2016

I build the 1.1 server with the code changes from PR 1261 and the leak is gone.

@pakrym
Copy link
Contributor

pakrym commented Dec 19, 2016

@meichtf thank you for testing.

@pakrym pakrym closed this as completed Dec 19, 2016
@ghost
Copy link
Author

ghost commented Dec 19, 2016

I hope to see this in the 1.1.1 release.

@markvincze
Copy link

markvincze commented May 19, 2017

Hey all,
Was this actually fixed in 1.1.1? The same thing is happening with my api (which is by the way completely idle, except its status endpoint—which is only returning a hard-coded 200—being polled periodically) on Linux, its memory usage is slowly but steadily increasing.

@avezenkov
Copy link

.net core 1.1.1, a self-contained package, kestrel, linux - no requests, just leaving the process running - memory grows slowly until it's all eaten by the dotnet core process.

@markvincze
Copy link

Hi @pakrym,

Do you have any hints how to avoid this problem? We are running some ASP.NET Core APIs in docker images (based on Debian), and they all have a constantly increasing memory usage, even in our development and staging environments, where they are—except for health check calls—basically idle.

We are seeing memory usage patterns like these (these are from 2 different APIs, and they are constantly being restarted because they run out of memory):

image

Is this a tracked issue on Linux which will be solved eventually? Might the 2.0 release bring any remedy to this? Or this should not happen and we're doing something wrong?

Thanks in advance!

@markvincze
Copy link

And just one additional piece of info: we have one single API which is not using C# and ASP.NET Core MVC, but it's using F# and Suave (although still on top of Kestrel and ASP.NET Core). And its memory usage is completely stable, even in production, where it has some (although not huge) load.

image

@avezenkov
Copy link

REOPEN!

@halter73
Copy link
Member

halter73 commented Aug 3, 2017

@avezenkov Feel free to open a new issue with more details. It should have information about what the project looks like. Is it a template project? If so, which one and did you make any modifications? If not, could you upload the project to a github repo? What packages and runtime are you using? Are you running a debug or release configuration? Are you using dotnet run or dotnet myAlreadyBuilt.dll? At what rate is the usage memory growing? What type of memory is growing (e.g. size/resident/shared/etc...)? Which distro and version are you running on?

After the memory has grown for a while, could you take a memory dump using gcore? If I'm lucky I might be able to use that to see what's on the managed heap. If you want to take a stab at it yourself, you can read guides here and here that go into detail on how to debug dotnet linux processes using lldb and libsosplugin.so.

I don't want to overwhelm you. If any of these details are too difficult to come by, you can leave them out for now and maybe I'll still be able to get a repro. But right now, I have almost nothing to go on.

We investigated this issue, fixed it, and verified the fix prior to releasing 1.1.1. I haven't seen the memory leak issue on linux since then which I test on a lot, but that's without running MVC or other higher-level services since I primarily work on lower levels of the stack. Just now, I reinstalled 1.1.1 on on my Debain 8.9 VM and printed out the /proc/<PID>/statm statistics of an idle ASP.NET Core/Kestrel process for over 30 minutes. The numbers haven't budged since a minute after the process started.

@avezenkov
Copy link

@halter73 I see. Thanks for your efforts. I'll try to isolate the problem and describe the details, upload a demo project etc. Net core is our hope for a minimal, modular, stable, multiplatform framework.

@avezenkov
Copy link

@halter73 And thanks for the linux debugging guides. I'll create a new issue once I have more details.

@markvincze
Copy link

markvincze commented Aug 10, 2017

Btw. I have some new information about our experience, I'll share it for future reference, or to see if someone might have an idea about it.

It seems that the APIs we have don't increase their memory usage infinitely, it's just that their usage is unexpectedly high. We are running these APIs in Kubernetes in Google Cloud. Previously we had their memory limit set to 500MB, and they were exceeding it and were being constantly restarted.
Then we increased the limit to 1000MB, and since then they work fine, their memory usage seems to peak around 650MB, and is not increasing much further.

This is an example of the recent memory usage of some of the instances.

image

This usage still seems to be excessive, and I couldn't find an explanation for it. We can only reproduce it in Kubernetes in GCE, and nowhere else. I tried running and load testing the same API

  • in dev configuration in VS
  • on Windows with a production build
  • on Ubuntu with a production build
  • in Docker, using the actual production image

But in none of these tests have I seen the same excessive memory usage, in all the tests the memory consumption peaked at around ~150MB.

Did anyone else encounter something similar? (The last thing I can think of which is a difference between the tests and the production scenario is that Google Cloud is using a specialized OS, the Container Optimized OS to run the production docker images, but I haven't got around to investigate how I could test with that.)

@benaadams
Copy link
Contributor

Sounds like the memory cap on the container is not being communicated to the GC? (So it goes over 500MB thinking it has more?)

@markvincze
Copy link

@benaadams I was thinking about that.

In Kubernetes the specified memory limit is passed to Docker in the --memory flag, so when we used 500MB as the Kubernetes limit, that should've affected the Docker image.
And I've tried locally running this API with Docker and passing in --memory 100M, then the app nicely stayed exactly under 100MB and was still working fine. (And if I did --memory 4096M, then it still only used ~150MB.)

Do you know how we could verify that this is not the problem?

@Drawaes
Copy link
Contributor

Drawaes commented Aug 10, 2017

I presume docker locally and on the cloud test are x64 so I will ignore that for now. What is the reported core count in both local and remote tests? It has a decent impact on memory held by the GC

@davidfowl
Copy link
Member

The change was here dotnet/coreclr#10064

@markvincze
Copy link

Hi @Drawaes,

I tried printing Environment.ProcessorCount on all environments, I see the following:

  • On my machine, just doing dotnet run, the value is 4.
  • On my machine, with Docker, it's 1.
  • On Google Cloud Kubernetes it's 8.

Can this cause the issue? Is there any other diagnostics I should print?

Btw. what I tried in the meantime is to use different base Docker images. Originally I was using a custom image on top of debian with a self-contained app (published with the --runtime flag). Then I also tried microsoft/dotnet:1.1.2-runtime and gcr.io/google-appengine/aspnetcore:1.1 (and publish without --runtime and just run the dll with dotnet), but all of them produced the same memory usage.

@Drawaes
Copy link
Contributor

Drawaes commented Aug 13, 2017

This makes a big difference. By default asp.net core runs on the server GC. This tends to allow memory use to grow a lot more, preferring throughput over memory use. However it doesn't matter what you set the garbage collector to (workstation or server) it will go to workstation mode if there is a single CPU. Try forcing your app to workstation and re-running on google cloud, or resource limiting your container to a single CPU. If you want to force your application you can do it in the csproj with this setting

<PropertyGroup> 
    <ServerGarbageCollection>true</ServerGarbageCollection>
</PropertyGroup>

That should give you the same behaviour as you see locally. (You need to make sure your launchsettings.json isn't overriding this value as well).

Try that and see if the results match what you see on docker locally.

@Drawaes
Copy link
Contributor

Drawaes commented Aug 13, 2017

For your reference for the difference between the server and workstation GC just look at the size of the ephemeral segments it allocates.

Your docker because of the single cpu will have the Workstation one running, and the google cloud one the Server one running with the category > 4 cpus (2gb). Using workstation GC on multi procs will use a bit more cpu but will keep your memory use lower (more collections etc).
https://docs.microsoft.com/en-us/dotnet/standard/garbage-collection/fundamentals#the_managed_heap

image

@benaadams
Copy link
Contributor

benaadams commented Aug 13, 2017

The container memory cap was merged 23 Mar 2017
1.1.0 was released 16 Nov 2016

Guess you might have to use Net CoreApp 2,0 to have the the caps work? (Can still use same version of ASP.NET Core)

@Drawaes
Copy link
Contributor

Drawaes commented Aug 13, 2017

Sure, but the behaviour was okay on containers locally, where it was a single cpu, but not on the google containers where there were 8 "cpus" visible to the container. So while you may still have that issue "the cap is not seen correctly" the workstation CPU is saving your bacon locally and keeping the memory limit low.

@markvincze
Copy link

@Drawaes @benaadams,

Thanks a lot for the help guys, I didn't know about the difference the number of CPUs cause.
I'll try the csproj change first, and then also an update to 2.0 preview, I'll keep this thread updated how it went.

Btw. @Drawaes if I understand correctly the problem might be that in Google Cloud, .NET is using the Server GC, thus allocating much more memory. And what I'd have to try is to force Workstation GC, right? Then shouldn't the flag in the csproj be false?

<PropertyGroup> 
    <ServerGarbageCollection>false</ServerGarbageCollection>
</PropertyGroup>

@Drawaes
Copy link
Contributor

Drawaes commented Aug 13, 2017

Yes sorry... it should be false... typo :) good luck and tell us how you get on for sure

@markvincze
Copy link

markvincze commented Aug 13, 2017

@Drawaes,

The change to Workstation GC seems to have helped! This is what the memory usage looks like after the deployment.

image

(This is still on the development deployment, so this API is not getting any real load. I'll test it on production soon.)

Btw. before doing the csproj change, I printed System.Runtime.GCSettings.IsServerGC, and I got true on all environments (locally debugging with VS, locally running with docker, running in Kubernetes). So it didn't use workstation mode even on local Docker, where the reported count of CPUs was 1.

Do I understand correctly that the problem was caused by the combination of these two things?

  • Server GC tends to be more "greedy" and allocates more aggressively, so the memory usage increases to a much higher value than it does with Workstation GC (which happened to be around 600MB on my environment).
  • .NET 1.1 does not adhere to the memory limits on Linux, but this will be fixed in 2.0.

So I should be able to switch back Server GC once I upgraded to .NET 2.0, and it'll respect the memory limit coming with --memory, right?

@benaadams
Copy link
Contributor

benaadams commented Aug 13, 2017

Users allocate, GC doesn't 😉

I'd phrase it more as

  • Server GC trades memory in favor of throughput; you will have a lower max (app) throughput with Workstation GC - but use less memory
  • .NET 1.1 does not adhere to the memory limits set by the docker container, this is fixed in 2.0

This might be a good primer to show the difference: https://docs.microsoft.com/en-us/dotnet/standard/garbage-collection/fundamentals

Generally you want to be using Server GC (the defaults) for a server app.

I have a request in for manual limits https://github.com/dotnet/coreclr/issues/11338

@markvincze
Copy link

@benaadams thanks for the clarification!

Just a quick update: I also deployed to production, where the API has some load (~2 requests per second), and the memory usage seems to be stable so far:

image

@Drawaes
Copy link
Contributor

Drawaes commented Aug 14, 2017

I am not 100% sure that server GC with 8 cores but a 1gb limit is ever a good idea. The core to memory ratio is out of wack. You get heaps per core in server gc.

@JorritSalverda
Copy link

I'm trying to figure out if the cpu limit in Kubernetes actually limits the amount of cores it runs on or is just a cap on how much time a process is allowed to spend on all cores of the host combined.

The particular application @markvincze mentions has a Kubernetes resource limit of 250m, which is a quarter of a core. However the host is an 8-core machine, hence the Environment.ProcessorCount property shows 8 cores, even though it will never be able to use 8 cores to the full max. It might be able to a fraction of each core though, making it crucial that it's treated by the .NET framework as an 8-core.

@Drawaes
Copy link
Contributor

Drawaes commented Aug 14, 2017

It will be treated as 8 core. The issue is 8 core with 1gb of ram. You have 125mb per core and with the server gc that is never going to be a good situation. I would recommend the workstation GC always in that scenario.

@benaadams
Copy link
Contributor

@markvincze .NET Core 2.0 which has the fix has just been released https://blogs.msdn.microsoft.com/dotnet/2017/08/14/announcing-net-core-2-0/

(Also ASP.NET Core 2.0 has also just been released https://blogs.msdn.microsoft.com/webdev/2017/08/14/announcing-asp-net-core-2-0/)

@markvincze
Copy link

@benaadams thanks for the info! (I'm working on upgrading my API already 🙂)

@Mookker
Copy link

Mookker commented Aug 15, 2017

Hello @markvincze! Did
<PropertyGroup> <ServerGarbageCollection>false</ServerGarbageCollection> </PropertyGroup>

Helped in 1.1.0 or you had to upgrade to 2.0?

@Drawaes
Copy link
Contributor

Drawaes commented Aug 15, 2017

It helped according to his report back.

@Mookker
Copy link

Mookker commented Aug 15, 2017

@Drawaes thank you, will try it

@markvincze
Copy link

@Mookker,

Yes, it helped. Since then I upgraded to 2.0, but I still had to keep <ServerGarbageCollection>false</ServerGarbageCollection, otherwise the memory usage was still very high.

@tmds
Copy link
Contributor

tmds commented Aug 16, 2017

Detecting docker memory limit is in 2.0 (dotnet/coreclr@b511095)
Detecting docker cpu limit isn't in 2.0 (dotnet/coreclr@df214e6).

@Drawaes
Copy link
Contributor

Drawaes commented Aug 16, 2017

The memory limit was potentially an issue, but I think google was actually reporting 8 cores to the container. If that is the case then the server GC is never going to be comfortable with 500mb memory limit. Would be interesting to hear back what the Environment reports for CPU count now.

@benaadams
Copy link
Contributor

benaadams commented Aug 18, 2017

Yes, it helped. Since then I upgraded to 2.0, but I still had to keep false</ServerGarbageCollection, otherwise the memory usage was still very high.

@markvincze does it hit the memory threshold with 2.0 and get restarted or just use all the memory with 2.0? Hopefully it doesn't hit it; forcing a restart (prob via OutOfMemoryException).

Else that's probably a bug 😄

@markvincze
Copy link

@benaadams After upgrading to 2.0, if I switch to Server GC, it still runs over the limit and gets restarted. For example with setting the limit to 300MB it looks like this (graph is a bit flaky):

image

@benaadams
Copy link
Contributor

Probably should raise an issue in https://github.com/dotnet/coreclr about that

@markvincze
Copy link

The interesting thing is that I can't reproduce the same issue with local Docker on my machine. Even if I pass in a small value with --memory, like 150m, the container doesn't crash with an out of memory error, and if I check docker stats, I see that the memory usage only increases until it reaches the specified limit, and it stops exactly there, and the API keeps working.

I'm trying to find now a way to verify if Kubernetes is actually passing in the memory resource limit as --memory.

@benaadams
Copy link
Contributor

Looks like someone else might be having container issue dotnet/core#871

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests