Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.NET 6 container throws OutOfMemoryException #58974

Open
Bio2hazard opened this issue Sep 10, 2021 · 39 comments
Open

.NET 6 container throws OutOfMemoryException #58974

Bio2hazard opened this issue Sep 10, 2021 · 39 comments

Comments

@Bio2hazard
Copy link

Description

When running inside a docker container that has a hard memory limit set, GCMemoryInfo reports using a higher value for HighMemoryLoadThresholdBytes than is available.

Reproduction Steps

  1. Use default multistage docker build described at https://aka.ms/containerfastmode - I noticed the issue with mcr.microsoft.com/dotnet/runtime:6.0-bullseye-slim
  2. Create a simple net6.0 console app
  3. Add
var gcMemInfo = GC.GetGCMemoryInfo();
Console.WriteLine($"Total Available:{gcMemInfo.TotalAvailableMemoryBytes}");
Console.WriteLine($"High Memory Threshold:{gcMemInfo.HighMemoryLoadThreshold}");
  1. Build container, e.g. docker build -t highmemtest -f .\Dockerfile .
  2. Run container with memory limit, e.g.: docker run --name highmemtest -it --memory=3g --rm highmemtest

With a limit of 3g I observe the following values:

   "totalAvailableMemoryBytes":2415919104,
"highMemoryLoadThresholdBytes":2899102924

Expected behavior

I would expect the high memory load threshold to be less than the available memory, so it has a chance to kick in and run the GC more aggressively prior to going OOM.

To that end, I would expect to see values similar to it running outside a container in windows, where I observe:

   "totalAvailableMemoryBytes":16909012992,
"highMemoryLoadThresholdBytes":15218111692

Or running it inside a container without a memory limit, where I observe:

   "totalAvailableMemoryBytes":13191815168,
"highMemoryLoadThresholdBytes":11872633651

Configuration

.NET Version: net6.0
OS: Observed on both Windows 10 and Amazon Linux 2 when running the mcr.microsoft.com/dotnet/runtime:6.0-bullseye-slim docker image
Arch: x64
Specific to: cgroup memory limit when running docker container

Regression?

I did not confirm it, but I suspect .NET 5 exhibits the same behavior.

Other information

https://docs.microsoft.com/en-us/dotnet/core/run-time-config/garbage-collector#heap-limit

The section for heap limit states:

The default value, which only applies in certain cases, is the greater of 20 MB or 75% of the memory limit on the container. The default value applies if:
-    The process is running inside a container that has a specified memory limit.
-    System.GC.HeapHardLimitPercent is not set.

When running the container with --memory=3g, /sys/fs/cgroup/memory/memory.limit_in_bytes is 3221225472
75% of that is 2415919104, which exactly matches the value of TotalAvailableMemoryBytes on GCMemoryInfo

https://docs.microsoft.com/en-us/dotnet/core/run-time-config/garbage-collector#high-memory-percent

The section for High Memory Percent states:

By default, when the physical memory load reaches 90%, garbage collection becomes more aggressive about doing full, compacting garbage collections to avoid paging.

90% of the aforementioned 3221225472 is 2899102924 which matches the value of HighMemoryLoadThresholdBytes on GCMemoryInfo

I think the disconnect here is that HighMemoryLoadThresholdBytes does not take into consideration that the Heap Limit is 75% when running inside a container with memory limit and instead assumes a Heap Limit of 100%. It should either be updated to have the same container-awareness logic that TotalAvailableMemoryBytes ( e.g. When running inside a container that has a specified memory limit set, the default value is 68% ), or just default to 90% of the calculated TotalAvailableMemoryBytes value.

@dotnet-issue-labeler dotnet-issue-labeler bot added area-GC-coreclr untriaged New issue has not been triaged by the area owner labels Sep 10, 2021
@ghost
Copy link

ghost commented Sep 10, 2021

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

When running inside a docker container that has a hard memory limit set, GCMemoryInfo reports using a higher value for HighMemoryLoadThresholdBytes than is available.

Reproduction Steps

  1. Use default multistage docker build described at https://aka.ms/containerfastmode - I noticed the issue with mcr.microsoft.com/dotnet/runtime:6.0-bullseye-slim
  2. Create a simple net6.0 console app
  3. Add
var gcMemInfo = GC.GetGCMemoryInfo();
Console.WriteLine($"Total Available:{gcMemInfo.TotalAvailableMemoryBytes}");
Console.WriteLine($"High Memory Threshold:{gcMemInfo.HighMemoryLoadThreshold}");
  1. Build container, e.g. docker build -t highmemtest -f .\Dockerfile .
  2. Run container with memory limit, e.g.: docker run --name highmemtest -it --memory=3g --rm highmemtest

With a limit of 3g I observe the following values:

   "totalAvailableMemoryBytes":2415919104,
"highMemoryLoadThresholdBytes":2899102924

Expected behavior

I would expect the high memory load threshold to be less than the available memory, so it has a chance to kick in and run the GC more aggressively prior to going OOM.

To that end, I would expect to see values similar to it running outside a container in windows, where I observe:

   "totalAvailableMemoryBytes":16909012992,
"highMemoryLoadThresholdBytes":15218111692

Or running it inside a container without a memory limit, where I observe:

   "totalAvailableMemoryBytes":13191815168,
"highMemoryLoadThresholdBytes":11872633651

Configuration

.NET Version: net6.0
OS: Observed on both Windows 10 and Amazon Linux 2 when running the mcr.microsoft.com/dotnet/runtime:6.0-bullseye-slim docker image
Arch: x64
Specific to: cgroup memory limit when running docker container

Regression?

I did not confirm it, but I suspect .NET 5 exhibits the same behavior.

Other information

https://docs.microsoft.com/en-us/dotnet/core/run-time-config/garbage-collector#heap-limit

The section for heap limit states:

The default value, which only applies in certain cases, is the greater of 20 MB or 75% of the memory limit on the container. The default value applies if:
-    The process is running inside a container that has a specified memory limit.
-    System.GC.HeapHardLimitPercent is not set.

When running the container with --memory=3g, /sys/fs/cgroup/memory/memory.limit_in_bytes is 3221225472
75% of that is 2415919104, which exactly matches the value of TotalAvailableMemoryBytes on GCMemoryInfo

https://docs.microsoft.com/en-us/dotnet/core/run-time-config/garbage-collector#high-memory-percent

The section for High Memory Percent states:

By default, when the physical memory load reaches 90%, garbage collection becomes more aggressive about doing full, compacting garbage collections to avoid paging.

90% of the aforementioned 3221225472 is 2899102924 which matches the value of HighMemoryLoadThresholdBytes on GCMemoryInfo

I think the disconnect here is that HighMemoryLoadThresholdBytes does not take into consideration that the Heap Limit is 75% when running inside a container with memory limit and instead assumes a Heap Limit of 100%. It should either be updated to have the same container-awareness logic that TotalAvailableMemoryBytes ( e.g. When running inside a container that has a specified memory limit set, the default value is 68% ), or just default to 90% of the calculated TotalAvailableMemoryBytes value.

Author: Bio2hazard
Assignees: -
Labels:

area-GC-coreclr, untriaged

Milestone: -

@Bio2hazard
Copy link
Author

So I think the bug is in /src/coreclr/gc/gc.cpp which is unfortunately too large to display in github.

Now, I'm not a c++ dev, but I think I figured out what's going on.

In gc.cpp, line 42772-42776 has the logic that sets the heap limit to 75%:

if (gc_heap::is_restricted_physical_mem)
{
    uint64_t physical_mem_for_gc = gc_heap::total_physical_mem * (uint64_t)75 / (uint64_t)100;
    gc_heap::heap_hard_limit = (size_t)max ((20 * 1024 * 1024), physical_mem_for_gc);
}

line 42978-42995 has the logic that sets the high memory percentage to 90%:

else
{
    // We should only use this if we are in the "many process" mode which really is only applicable
    // to very powerful machines - before that's implemented, temporarily I am only enabling this for 80GB+ memory.
    // For now I am using an estimate to calculate these numbers but this should really be obtained
    // programmatically going forward.
    // I am assuming 47 processes using WKS GC and 3 using SVR GC.
    // I am assuming 3 in part due to the "very high memory load" is 97%.
    int available_mem_th = 10;
    if (gc_heap::total_physical_mem >= ((uint64_t)80 * 1024 * 1024 * 1024))
    {
        int adjusted_available_mem_th = 3 + (int)((float)47 / (float)(GCToOSInterface::GetTotalProcessorCount()));
        available_mem_th = min (available_mem_th, adjusted_available_mem_th);
    }

    gc_heap::high_memory_load_th = 100 - available_mem_th;
    gc_heap::v_high_memory_load_th = 97;
}

It's a bit confusing, but the comment actually only applies to the inner if (gc_heap::total_physical_mem >= ((uint64_t)80 * 1024 * 1024 * 1024)) part.

The important part is Line 42986 int available_mem_th = 10; and then Line 42993 gc_heap::high_memory_load_th = 100 - available_mem_th;

So that's where the 90% comes from.

Fix

A possible fix would be: If in a restricted physical mem environment, default the high memory threshold to 68% instead of 90% ( to complement the 75% max heap size ). This would be the code change for that:

else
    {
        int available_mem_th = 10;
        // If the hard limit is specified, default to 68% instead of 90% of physical memory
        if (gc_heap::is_restricted_physical_mem)
        {
            available_mem_th = 32;
        }

        // We should only use this if we are in the "many process" mode which really is only applicable
        // to very powerful machines - before that's implemented, temporarily I am only enabling this for 80GB+ memory.
        // For now I am using an estimate to calculate these numbers but this should really be obtained
        // programmatically going forward.
        // I am assuming 47 processes using WKS GC and 3 using SVR GC.
        // I am assuming 3 in part due to the "very high memory load" is 97%.
        if (gc_heap::total_physical_mem >= ((uint64_t)80 * 1024 * 1024 * 1024))
        {
            int adjusted_available_mem_th = 3 + (int)((float)47 / (float)(GCToOSInterface::GetTotalProcessorCount()));
            available_mem_th = min (available_mem_th, adjusted_available_mem_th);
        }

        gc_heap::high_memory_load_th = 100 - available_mem_th;
        gc_heap::v_high_memory_load_th = 97;
    }

That would resolve the issue, however I think it is indicative of a bigger problem. memory_load is used throughout the file to determine how much memory % is in use. The problem is that it seems purely based on total_physical_mem, not heap_hard_limit.

I wonder if

void gc_heap::get_memory_info (uint32_t* memory_load,
                               uint64_t* available_physical,
                               uint64_t* available_page_file)
{
    GCToOSInterface::GetMemoryStatus(is_restricted_physical_mem ? total_physical_mem  : 0,  memory_load, available_physical, available_page_file);
}

could be updated to GCToOSInterface::GetMemoryStatus(heap_hard_limit ? heap_hard_limit : 0, memory_load, available_physical, available_page_file); or something along those lines, though I suspect that would have wide-reaching consequences that would require a very thorough audit.

@Maoni0
Copy link
Member

Maoni0 commented Sep 11, 2021

this is by design. the default limit is set to be 75% assuming that there's native memory usage in the container (container in this context means an environment with a memory limit). we do not assume that all available memory is available for the GC to use. 75% is a number we picked - it's meant to be sufficient for general scenarios. you can change this default by specifying hardlimit.

outside a container, we did not have a default percentage for limit to begin with. it's true the same principle applies outside container env as well. but we only added this default limit concept when we added container support. adding it outside container env might regress existing apps and since you could use the same hardlimit config there if you want to change the default, I didn't change the default there.

@tmds
Copy link
Member

tmds commented Sep 13, 2021

I would expect the high memory load threshold to be less than the available memory, so it has a chance to kick in and run the GC more aggressively prior to going OOM.

TotalAvailableMemoryBytes is the size of the .NET managed heap (defaults to 75% of physical memory).

.NET will GC to stay below this limit in a container. It won't wait till physical memory reaches HighMemoryLoadThreshold.

For #50414 there seems to be an issue as we're approaching TotalAvailableMemoryBytes.

Are you seeing an OOM in your containers?

@mangod9 mangod9 removed the untriaged New issue has not been triaged by the area owner label Sep 13, 2021
@mangod9 mangod9 added this to the Future milestone Sep 13, 2021
@Bio2hazard
Copy link
Author

Hi @Maoni0 and @tmds - I realize that TotalAvailableMemoryBytes - the .NET managed heap defaults to 75% of physical memory if it is restricted ( physical memory not being of the host, but the hard limit of the container ).

That's not the problem, the problem is that the HighMemoryLoadThreshold does not take that 75% into consideration and is always 90%.

.NET will GC to stay below this limit in a container. It won't wait till physical memory reaches HighMemoryLoadThreshold.

From reading https://raw.githubusercontent.com/dotnet/runtime/main/src/coreclr/gc/gc.cpp I see a few checks for heap_hard_limit which changes behaviors, but searching for high_memory_load shows 10 or so locations where extra steps are taken to turn background collections into blocking collections, wait for background collections, proactively reclaim space and immediately trigger a gen 2 collection. Those will currently not work unless the defaults are manually overridden.

It seems most of the code to detect memory pressure is not based on heap_hard_limit but on high_memory_load_th which defaults to a value that can never be reached on a container that has a hard limit set.

Am I not interpreting the c++ code correctly?

@tmds yes, I'm seeing some OOMs that I can't quite explain and have been investigating.

In .net core 3.1 this job had a 5% chance to crash. Of those that would crash, I'd estimate that 75% were shut down by the memory killer for exceeding their memory hard limit and 25% crashed with a System.OutOfMemoryException.

After upgrading to .NET 6 preview, the same job now has a 95% chance to crash, and 100% of those crash from throwing a System.OutOfMemoryException.

@Maoni0
Copy link
Member

Maoni0 commented Sep 14, 2021

That's not the problem, the problem is that the HighMemoryLoadThreshold does not take that 75% into consideration and is always 90%.

it's always 90% because we assume there's other memory usage. so we'd only want to start being aggressive if the total memory load (including both GC usage and other memory usage) is 90%, not when it's reached the 90% of the hard limit. again, this is the default behavior.

@Maoni0
Copy link
Member

Maoni0 commented Sep 14, 2021

another thing I should mention, if it's not obvious, is that of course we would do a full compacting GC if we cannot commit memory based on the hardlimit.

@tmds
Copy link
Member

tmds commented Sep 14, 2021

It seems most of the code to detect memory pressure is not based on heap_hard_limit but on high_memory_load_th which defaults to a value that can never be reached on a container that has a hard limit set.
Am I not interpreting the c++ code correctly?

My understanding (which may be wrong) is that the GC does accounting on how much of the managed heap is used, and uses that to trigger GCs.
HighMemoryLoadThresholdBytes memory pressure is not needed for the GC to get into action.

Of those that would crash, I'd estimate that 75% were shut down by the memory killer for exceeding their memory hard limit and 25% crashed with a System.OutOfMemoryException.

OutOfMemoryException happens when the GC accounting says the managed heap is full.

The OOM kill happens by the OS based on total memory used.

After upgrading to .NET 6 preview, the same job now has a 95% chance to crash

That is a major regression.

Can you update the issue title to something like .NET 6 container throws OutOfMemoryException?

#50414 is also about OutOfMemoryException in a container, but that workload is artificial. You are hitting this in a real application?

How does it behave with .NET 5?

cc @janvorli

@Bio2hazard Bio2hazard changed the title Garbage Collector High Memory Load value is higher than available memory when running inside a container with cgroup memory limit .NET 6 container throws OutOfMemoryException Sep 14, 2021
@Bio2hazard
Copy link
Author

Bio2hazard commented Sep 14, 2021

@tmds

Hi, yes this is a real application. It's a console app running in production that fetches gzipped csv files and converts them to the parquet format. The container instances have a tight memory limit so I can run multiple container instances per host - when I first created the app it had to process about half an exabyte worth of data, so scaling horizontally in an efficient manner was vital.

The way I created the application has some similarities with the artifical workload in #50414 Details on the app are available below.

Findings Summary

.NET 6 = runtime:6.0-bullseye-slim
.NET 5 = runtime:5.0-bullseye-slim
.NET Core 3.1 = runtime:3.1-buster-slim

Container ran with --memory=3100m. Same input data was used for all runs and the input data has just over 3 million rows.

.NET Workstation GC Server GC
6 All Pass All Fail
5 All Pass All Fail
3.1 All Pass All Pass

Detailed Results

Every 50_000 rows processed, memory metrics are emitted.

FullGcMemoryInfoReport = Data from GC.GetGCMemoryInfo(). After running it on .NET Core 3.1 I realized it's missing a lot of the info, so I added GC.CollectionCount(gen) for all 3 generations.

CGroupMemoryReport = Data from /sys/fs/cgroup/memory/ , namely memory.limit_in_bytes, memory.usage_in_bytes and memory.max_usage_in_bytes

I excluded data from /sys/fs/cgroup/memory/memory.memsw because the usage values are the same as the regular memory values.

.NET 6 Server GC

Run 1:
{"eventType":"FullGcMemoryInfoReport","rowNum":300000,"compacted":false,"concurrent":true,"generation":2,"index":2,"fragmentedBytes":834576,"promotedBytes":2206629464,"finalizationPendingCount":0,"heapSizeBytes":2208390664,"memoryLoadBytes":357564416,"pauseTimePercentage":1.25,"pinnedObjectsCount":0,"totalCommittedBytes":2267475968,"totalAvailableMemoryBytes":2437939200,"highMemoryLoadThresholdBytes":2925527040}
{"eventType": "CGroupMemoryReport", "rowNum": 300000, "limitInBytes": 3250585600, "usageInBytes": 473812992, "maxUsageInBytes": 779841536}
Aborted

Run 2:
{"eventType":"FullGcMemoryInfoReport","rowNum":350000,"compacted":true,"concurrent":false,"generation":1,"index":4,"fragmentedBytes":2925304,"promotedBytes":5485704,"finalizationPendingCount":24,"heapSizeBytes":2213386120,"memoryLoadBytes":455081984,"pauseTimePercentage":0.84,"pinnedObjectsCount":2,"totalCommittedBytes":2292273152,"totalAvailableMemoryBytes":2437939200,"highMemoryLoadThresholdBytes":2925527040}
{"eventType": "CGroupMemoryReport", "rowNum": 350000, "limitInBytes": 3250585600, "usageInBytes": 486424576, "maxUsageInBytes": 779841536}
Aborted

Run 3:
{"eventType":"FullGcMemoryInfoReport","rowNum":600000,"compacted":false,"concurrent":true,"generation":2,"index":5,"fragmentedBytes":4517376,"promotedBytes":2210647344,"finalizationPendingCount":5,"heapSizeBytes":2215297320,"memoryLoadBytes":552599552,"pauseTimePercentage":0.82,"pinnedObjectsCount":0,"totalCommittedBytes":2292232192,"totalAvailableMemoryBytes":2437939200,"highMemoryLoadThresholdBytes":2925527040}
{"eventType": "CGroupMemoryReport", "rowNum": 600000, "limitInBytes": 3250585600, "usageInBytes": 823992320, "maxUsageInBytes": 1865146368}
Aborted

Run 4:
{"eventType":"FullGcMemoryInfoReport","rowNum":600000,"compacted":false,"concurrent":true,"generation":2,"index":5,"fragmentedBytes":7562848,"promotedBytes":2210161944,"finalizationPendingCount":5,"heapSizeBytes":2217721232,"memoryLoadBytes":552599552,"pauseTimePercentage":0.43,"pinnedObjectsCount":0,"totalCommittedBytes":2292281344,"totalAvailableMemoryBytes":2437939200,"highMemoryLoadThresholdBytes":2925527040}
{"eventType": "CGroupMemoryReport", "rowNum": 600000, "limitInBytes": 3250585600, "usageInBytes": 591986688, "maxUsageInBytes": 591986688}
Aborted

.NET 6 Workstation GC

Run 1:
{"eventType":"FullGcMemoryInfoReport","rowNum":3050000,"compacted":true,"concurrent":false,"generation":1,"index":176,"fragmentedBytes":7768976,"promotedBytes":2066664,"finalizationPendingCount":0,"heapSizeBytes":2232898296,"memoryLoadBytes":1462763520,"pauseTimePercentage":0.53,"pinnedObjectsCount":0,"totalCommittedBytes":2238799872,"totalAvailableMemoryBytes":2437939200,"highMemoryLoadThresholdBytes":2925527040}
{"eventType": "CGroupMemoryReport", "rowNum": 3050000, "limitInBytes": 3250585600, "usageInBytes": 1486970880, "maxUsageInBytes": 1486970880}
{"eventType": "Completion", "durationSeconds": 51.1620156}

Run 2:
{"eventType":"FullGcMemoryInfoReport","rowNum":3050000,"gen0CollectionCount":155,"gen1CollectionCount":91,"gen2CollectionCount":7,"compacted":true,"concurrent":false,"generation":1,"index":155,"fragmentedBytes":9087016,"promotedBytes":751976,"finalizationPendingCount":0,"heapSizeBytes":2231368744,"memoryLoadBytes":1462763520,"pauseTimePercentage":0.43,"pinnedObjectsCount":0,"totalCommittedBytes":2238574592,"totalAvailableMemoryBytes":2437939200,"highMemoryLoadThresholdBytes":2925527040}
{"eventType": "CGroupMemoryReport", "rowNum": 3050000, "limitInBytes": 3250585600, "usageInBytes": 1488908288, "maxUsageInBytes": 1488908288}
{"eventType": "Completion", "durationSeconds": 49.2860017}

.NET 5 Server GC

Run 1:
{"eventType":"FullGcMemoryInfoReport","rowNum":400000,"compacted":false,"concurrent":true,"generation":2,"index":2,"fragmentedBytes":1265848,"promotedBytes":2206923760,"finalizationPendingCount":0,"heapSizeBytes":2209247376,"memoryLoadBytes":325058560,"pauseTimePercentage":0.78,"pinnedObjectsCount":2,"totalCommittedBytes":2265993216,"totalAvailableMemoryBytes":2437939200,"highMemoryLoadThresholdBytes":2925527040}
{"eventType": "CGroupMemoryReport", "rowNum": 400000, "limitInBytes": 3250585600, "usageInBytes": 722436096, "maxUsageInBytes": 1860648960}
Aborted

Run 2:
{"eventType":"FullGcMemoryInfoReport","rowNum":350000,"compacted":false,"concurrent":true,"generation":2,"index":2,"fragmentedBytes":971296,"promotedBytes":2206711304,"finalizationPendingCount":0,"heapSizeBytes":2208390104,"memoryLoadBytes":325058560,"pauseTimePercentage":1.08,"pinnedObjectsCount":1,"totalCommittedBytes":2267799552,"totalAvailableMemoryBytes":2437939200,"highMemoryLoadThresholdBytes":2925527040}
{"eventType": "CGroupMemoryReport", "rowNum": 350000, "limitInBytes": 3250585600, "usageInBytes": 718237696, "maxUsageInBytes": 1860648960}
Aborted

.NET 5 Workstation GC

Run 1:
{"eventType":"FullGcMemoryInfoReport","rowNum":3050000,"compacted":true,"concurrent":false,"generation":1,"index":177,"fragmentedBytes":8735344,"promotedBytes":1019784,"finalizationPendingCount":0,"heapSizeBytes":2232115952,"memoryLoadBytes":1462763520,"pauseTimePercentage":0.4,"pinnedObjectsCount":0,"totalCommittedBytes":2239213568,"totalAvailableMemoryBytes":2437939200,"highMemoryLoadThresholdBytes":2925527040}
{"eventType": "CGroupMemoryReport", "rowNum": 3050000, "limitInBytes": 3250585600, "usageInBytes": 1484357632, "maxUsageInBytes": 1484357632}
{"eventType": "Completion", "durationSeconds": 53.162754}

Run 2:
{"eventType":"FullGcMemoryInfoReport","rowNum":3050000,"compacted":true,"concurrent":false,"generation":1,"index":179,"fragmentedBytes":11400800,"promotedBytes":1652632,"finalizationPendingCount":0,"heapSizeBytes":2234520536,"memoryLoadBytes":1462763520,"pauseTimePercentage":0.46,"pinnedObjectsCount":0,"totalCommittedBytes":2240872448,"totalAvailableMemoryBytes":2437939200,"highMemoryLoadThresholdBytes":2925527040}
{"eventType": "CGroupMemoryReport", "rowNum": 3050000, "limitInBytes": 3250585600, "usageInBytes": 1714769920, "maxUsageInBytes": 1847549952}
{"eventType": "Completion", "durationSeconds": 51.051156}

.NET Core 3.1 Server GC

Run 1:
{"eventType":"FullGcMemoryInfoReport","rowNum":3050000,"fragmentedBytes":1536136,"heapSizeBytes":2233704392,"memoryLoadBytes":1852833791,"totalAvailableMemoryBytes":2437939200,"highMemoryLoadThresholdBytes":2925527040}
{"eventType": "CGroupMemoryReport", "rowNum": 3050000, "limitInBytes": 3250585600, "usageInBytes": 1856679936, "maxUsageInBytes": 1857949696}
{"eventType": "Completion", "durationSeconds": 45.156954}

Run 2:
{"eventType":"FullGcMemoryInfoReport","rowNum":3050000,"fragmentedBytes":6385968,"heapSizeBytes":2238567392,"memoryLoadBytes":1787822080,"totalAvailableMemoryBytes":2437939200,"highMemoryLoadThresholdBytes":2925527040}
{"eventType": "CGroupMemoryReport", "rowNum": 3050000, "limitInBytes": 3250585600, "usageInBytes": 1842814976, "maxUsageInBytes": 1987911680}
{"eventType": "Completion", "durationSeconds": 46.5500175}

-- Added GC Gen Collection Counts  --

Run 3:
{"eventType":"FullGcMemoryInfoReport","rowNum":3050000,"gen0CollectionCount":14,"gen1CollectionCount":12,"gen2CollectionCount":3,"fragmentedBytes":451648,"heapSizeBytes":2232756168,"memoryLoadBytes":1527775232,"totalAvailableMemoryBytes":2437939200,"highMemoryLoadThresholdBytes":2925527040}
{"eventType": "CGroupMemoryReport", "rowNum": 3050000, "limitInBytes": 3250585600, "usageInBytes": 1617502208, "maxUsageInBytes": 1617502208}
{"eventType": "Completion", "durationSeconds": 46.5642079}

Run 4:
{"eventType":"FullGcMemoryInfoReport","rowNum":3050000,"gen0CollectionCount":17,"gen1CollectionCount":15,"gen2CollectionCount":3,"fragmentedBytes":393496,"heapSizeBytes":2233633800,"memoryLoadBytes":1820327936,"totalAvailableMemoryBytes":2437939200,"highMemoryLoadThresholdBytes":2925527040}
{"eventType": "CGroupMemoryReport", "rowNum": 3050000, "limitInBytes": 3250585600, "usageInBytes": 1862709248, "maxUsageInBytes": 1987842048}
{"eventType": "Completion", "durationSeconds": 48.5279659}

.NET Core 3.1 Workstation GC

Run 1:
{"eventType":"FullGcMemoryInfoReport","rowNum":3050000,"fragmentedBytes":6352552,"heapSizeBytes":2230767832,"memoryLoadBytes":1462763520,"totalAvailableMemoryBytes":2437939200,"highMemoryLoadThresholdBytes":2925527040}
{"eventType": "CGroupMemoryReport", "rowNum": 3050000, "limitInBytes": 3250585600, "usageInBytes": 1470181376, "maxUsageInBytes": 1470181376}
{"eventType": "Completion", "durationSeconds": 51.7918388}

Run 2:
{"eventType":"FullGcMemoryInfoReport","rowNum":3050000,"fragmentedBytes":10571248,"heapSizeBytes":2233205032,"memoryLoadBytes":1690304512,"totalAvailableMemoryBytes":2437939200,"highMemoryLoadThresholdBytes":2925527040}
{"eventType": "CGroupMemoryReport", "rowNum": 3050000, "limitInBytes": 3250585600, "usageInBytes": 1702096896, "maxUsageInBytes": 1835315200}
{"eventType": "Completion", "durationSeconds": 45.7423049}

-- Added Collection Counts --

Run 3:
{"eventType":"FullGcMemoryInfoReport","rowNum":3050000,"gen0CollectionCount":230,"gen1CollectionCount":116,"gen2CollectionCount":10,"fragmentedBytes":7088832,"heapSizeBytes":2231524336,"memoryLoadBytes":1690304512,"totalAvailableMemoryBytes":2437939200,"highMemoryLoadThresholdBytes":2925527040}
{"eventType": "CGroupMemoryReport", "rowNum": 3050000, "limitInBytes": 3250585600, "usageInBytes": 1701007360, "maxUsageInBytes": 1999908864}
{"eventType": "Completion", "durationSeconds": 45.1154457}

Run 4:
{"eventType":"FullGcMemoryInfoReport","rowNum":3050000,"gen0CollectionCount":221,"gen1CollectionCount":110,"gen2CollectionCount":11,"fragmentedBytes":9017064,"heapSizeBytes":2233361312,"memoryLoadBytes":1690304512,"totalAvailableMemoryBytes":2437939200,"highMemoryLoadThresholdBytes":2925527040}
{"eventType": "CGroupMemoryReport", "rowNum": 3050000, "limitInBytes": 3250585600, "usageInBytes": 1702215680, "maxUsageInBytes": 1999908864}
{"eventType": "Completion", "durationSeconds": 47.9280517}

.NET 6 Server GC with extra GC Settings

Setting DOTNET_GCHighMemPercent to 0x41 ( 65% ) = Crash
Setting DOTNET_GCHighMemPercent to 0x28 ( 40% ) = Crash
Setting DOTNET_GCHeapHardLimitPercent to 0x5f ( 95% ) = Pass
Setting DOTNET_GCHeapHardLimitPercent to 0x55 ( 85% ) = Pass
Setting DOTNET_GCHeapHardLimitPercent to 0x50 ( 80% ) = Pass
Setting DOTNET_GCHeapHardLimitPercent to 0x4b ( 75% - As a control ) = Crash

Setting GCHighMemPercent does not seem to make a difference, but increasing GCHeapHardLimitPercent does.

App Details

I pre-define buffers for each column and create them on application start, because the default behavior of creating a new array that's 50% larger and copying the data over was both slow and wasteful. The buffers themselves use ArrayBufferWriter<T> and I pass the buffer size in as the initialCapacity on the constructor, so similar to the artificial app, I have a bunch of big arrays.

The app reads the remote data into a stream, unzips it and pushes it into a Pipe that has pause thresholds set. I manually parse the CSV line by line using SequenceReader, and write the data into their respective buffers backed by ArrayBufferWriter<T>.

When the app reaches the desired number of rows, a buffer is almost full or there is no more input data, I fetch the ReadOnlyMemory<T> from ArrayBufferWriter<T>.WrittenMemory, Pin it and pass the pointer via interop call to apache parquet.

That method uses unsafe for pinning and grabbing the pointer, but all the observed crashes happen long before it tries to write any files.

@tmds
Copy link
Member

tmds commented Sep 15, 2021

Thanks for characterizing your workload and providing metrics.

The way I created the application has some similarities with the artifical workload in #50414

@Maoni0 @janvorli the referenced issue has a reproducer that leads to OutOfMemoryException for an application that should fit within the managed heap. It may have the same root cause as this issue. Can you please take a look?

.NET 5 = runtime:5.0-bullseye-slim

@Bio2hazard can you also try using the first 5.0 image (mcr.microsoft.com/dotnet/runtime:5.0.0)?

.NET 6 Server GC with extra GC Settings

I think reducing DOTNET_GCHeapHardLimitPercent leads to Pass because more GCs are triggered due to the memory load.

Though I don't think those should be required. GCs should already have occurred when the managed heap usage becomes high.

throws OutOfMemoryException

That the app throws OutOfMemoryException means that we're run out of managed heap.
totalAvailableMemoryBytes has the same value for .NET 3.1 and 6.0, so the heap size is the same.

I assume your app uses same allocation APIs independent of it targeting .NET 3.1, 5.0 or 6.0?
So, we can assume that your app doesn't have memory leaks because it doesn't run out of memory on 3.1.

I wonder if something changed in .NET that would cause your application to require a larger heap.
If you increase the container memory limit, what happens?
Does it still throw OutOfMemoryException, but just takes longer to get to that point?

@janvorli
Copy link
Member

@tmds I was just going to start investigating this (and the #50414) problem today.

@Maoni0
Copy link
Member

Maoni0 commented Sep 15, 2021

I see this issue has changed to an OOM issue. yeah, if it doesn't get OOM with workstation GC but does with Server, or if it doesn't get OOM with a larger limit but with a smaller limit, that's clearly an issue that should be investigated. I'll see what @janvorli finds out.

@Bio2hazard
Copy link
Author

@tmds

@Bio2hazard can you also try using the first 5.0 image (mcr.microsoft.com/dotnet/runtime:5.0.0)?

.NET 5 Server GC (mcr.microsoft.com/dotnet/runtime:5.0.0)

This one had an unusual behavior that I did not observe on the -slim version in that it appears to occasionally deadlock 🤔 3 out of 5 runs just stalled and I had to ctrl+c out.

Run 1:
{"eventType":"FullGcMemoryInfoReport","rowNum":400000,"gen0CollectionCount":3,"gen1CollectionCount":2,"gen2CollectionCount":2,"compacted":false,"concurrent":true,"generation":2,"index":2,"fragmentedBytes":2091968,"promotedBytes":2208499688,"finalizationPendingCount":0,"heapSizeBytes":2211286808,"memoryLoadBytes":422576128,"pauseTimePercentage":0.65,"pinnedObjectsCount":1,"totalCommittedBytes":2276786176,"totalAvailableMemoryBytes":2437939200,"highMemoryLoadThresholdBytes":2925527040}
{"eventType": "CGroupMemoryReport", "rowNum": 400000, "limitInBytes": 3250585600, "usageInBytes": 513638400, "maxUsageInBytes": 513638400}
< Deadlocked >

Run 2:
{"eventType":"FullGcMemoryInfoReport","rowNum":500000,"gen0CollectionCount":3,"gen1CollectionCount":2,"gen2CollectionCount":2,"compacted":false,"concurrent":true,"generation":2,"index":2,"fragmentedBytes":3822008,"promotedBytes":2207981080,"finalizationPendingCount":0,"heapSizeBytes":2212661152,"memoryLoadBytes":487587840,"pauseTimePercentage":0.22,"pinnedObjectsCount":2,"totalCommittedBytes":2285789184,"totalAvailableMemoryBytes":2437939200,"highMemoryLoadThresholdBytes":2925527040}
{"eventType": "CGroupMemoryReport", "rowNum": 500000, "limitInBytes": 3250585600, "usageInBytes": 550965248, "maxUsageInBytes": 589717504}
< Deadlocked >

-- Restarted Container --

Run 3:
{"eventType":"FullGcMemoryInfoReport","rowNum":350000,"gen0CollectionCount":3,"gen1CollectionCount":3,"gen2CollectionCount":2,"compacted":false,"concurrent":true,"generation":2,"index":2,"fragmentedBytes":2396512,"promotedBytes":2207731480,"finalizationPendingCount":0,"heapSizeBytes":2211772672,"memoryLoadBytes":455081984,"pauseTimePercentage":0.28,"pinnedObjectsCount":2,"totalCommittedBytes":2292285440,"totalAvailableMemoryBytes":2437939200,"highMemoryLoadThresholdBytes":2925527040}
{"eventType": "CGroupMemoryReport", "rowNum": 350000, "limitInBytes": 3250585600, "usageInBytes": 476250112, "maxUsageInBytes": 476250112}
Aborted

Run 4:
{"eventType":"FullGcMemoryInfoReport","rowNum":300000,"gen0CollectionCount":3,"gen1CollectionCount":2,"gen2CollectionCount":2,"compacted":false,"concurrent":true,"generation":2,"index":2,"fragmentedBytes":1479120,"promotedBytes":2206950152,"finalizationPendingCount":0,"heapSizeBytes":2209512376,"memoryLoadBytes":357564416,"pauseTimePercentage":0.5,"pinnedObjectsCount":2,"totalCommittedBytes":2287513600,"totalAvailableMemoryBytes":2437939200,"highMemoryLoadThresholdBytes":2925527040}
{"eventType": "CGroupMemoryReport", "rowNum": 300000, "limitInBytes": 3250585600, "usageInBytes": 467820544, "maxUsageInBytes": 491061248}
< Deadlocked >

Run 5:
{"eventType":"FullGcMemoryInfoReport","rowNum":300000,"gen0CollectionCount":3,"gen1CollectionCount":2,"gen2CollectionCount":2,"compacted":false,"concurrent":true,"generation":2,"index":2,"fragmentedBytes":1239576,"promotedBytes":2207744856,"finalizationPendingCount":0,"heapSizeBytes":2209212352,"memoryLoadBytes":357564416,"pauseTimePercentage":0.89,"pinnedObjectsCount":1,"totalCommittedBytes":2287591424,"totalAvailableMemoryBytes":2437939200,"highMemoryLoadThresholdBytes":2925527040}
{"eventType": "CGroupMemoryReport", "rowNum": 300000, "limitInBytes": 3250585600, "usageInBytes": 467705856, "maxUsageInBytes": 544829440}
Aborted

.NET 5 Workstation GC (mcr.microsoft.com/dotnet/runtime:5.0.0)

Workstation GC continues to work without any observed issues

Run 1:
{"eventType":"FullGcMemoryInfoReport","rowNum":3050000,"gen0CollectionCount":180,"gen1CollectionCount":123,"gen2CollectionCount":8,"compacted":true,"concurrent":false,"generation":1,"index":180,"fragmentedBytes":9647864,"promotedBytes":1396832,"finalizationPendingCount":0,"heapSizeBytes":2233583520,"memoryLoadBytes":1462763520,"pauseTimePercentage":0.42,"pinnedObjectsCount":0,"totalCommittedBytes":2240290816,"totalAvailableMemoryBytes":2437939200,"highMemoryLoadThresholdBytes":2925527040}
{"eventType": "CGroupMemoryReport", "rowNum": 3050000, "limitInBytes": 3250585600, "usageInBytes": 1481097216, "maxUsageInBytes": 1481097216}
{"eventType": "Completion", "durationSeconds": 47.1836118}

Run 2:
{"eventType":"FullGcMemoryInfoReport","rowNum":3050000,"gen0CollectionCount":196,"gen1CollectionCount":116,"gen2CollectionCount":8,"compacted":true,"concurrent":false,"generation":1,"index":196,"fragmentedBytes":8866304,"promotedBytes":1404608,"finalizationPendingCount":0,"heapSizeBytes":2231906616,"memoryLoadBytes":1690304512,"pauseTimePercentage":0.55,"pinnedObjectsCount":0,"totalCommittedBytes":2238410752,"totalAvailableMemoryBytes":2437939200,"highMemoryLoadThresholdBytes":2925527040}
{"eventType": "CGroupMemoryReport", "rowNum": 3050000, "limitInBytes": 3250585600, "usageInBytes": 1710972928, "maxUsageInBytes": 1839542272}
{"eventType": "Completion", "durationSeconds": 40.2547436}

I think reducing DOTNET_GCHeapHardLimitPercent leads to Pass because more GCs are triggered due to the memory load.

Though I don't think those should be required. GCs should already have occurred when the managed heap usage becomes high.

To clarify, the default GCHeapHardLimitPercent for containers with a hard memory limit is 75%, so in those tests I'm increasing the value, not reducing it. 👍 So that test shows that increasing the limit from 75% -> 80% is enough to make it pass.


I assume your app uses same allocation APIs independent of it targeting .NET 3.1, 5.0 or 6.0?
So, we can assume that your app doesn't have memory leaks because it doesn't run out of memory on 3.1.

Yes, the only preprocessor directive in place is for reporting metrics, to account for differences in 3.1s GCMemoryInfo.
I don't believe my app has any memory leaks. To make sure, I've just ran a profiling session with JetBrains dotMemory.

Important: !! I don't know how to profile a running linux container, so this profiling information is with the app running directly in Windows where it doesn't OOM !!

Here is a graphical representation of the first 12 seconds of the app running on .net5.0 with ServerGC and Heap Limit set:
image

This pattern remains steady for the remainder of the apps lifetime
image

For comparison, here is a graphical representation of the first 12 seconds of the app running on .net3.1 with ServerGC and Heap Limit set:
image

And the remainder of the apps lifetime
image

In this one comparison, it seems the .net 3.1 GC is ever so slightly more aggressive, running just a tad sooner and overall running 20 times vs. .net 5 where it only runs 17 times. Again take these measurements with a grain of salt since this is on windows without a climit. It should help to visualize the memory movement within the app though. I'm happy to provide more details profiling information if there is any area you are curious about 👍

I wonder if something changed in .NET that would cause your application to require a larger heap.
If you increase the container memory limit, what happens?
Does it still throw OutOfMemoryException, but just takes longer to get to that point?

As the memory profiling information hopefully shows, memory usage doesn't grow much while the application is running - intentionally so, as I tried my hardest to front-load allocations and then re-use buffers for both speed and predictability, so for the most part it seems the app will either consistently OOM while processing the first 20% of rows, or it will always succeed. The only time I've observed crashes later on during processing is when giving it a even tighter memory limit, presumably due to the ~120mb of gen 0 allocations per GC cycle.

Anyhow, I bumped the limit up by just 100 mb, from 3100mb to 3200mb and it passes now on .net 5 with Server GC.
3200MB = Pass
3175MB = Pass
3150MB = Fails
3000MB = Fails
2900MB = Fails ( before it starts processing )

Next, I wanted to see what happens if I reduce the memory limit on .net 3.1 with Server GC
3100MB ( control ) = Pass
3075MB = Fails
3050MB = Fails
3025MB = Fails
3000MB = Fails
2975MB = Fails
2950MB = Fails
2900MB = Fails ( before it starts processing )

Next I wanted to see what the least amount of memory required is for .net 3.1 with Workstation GC
3000MB = Pass ( 3 runs, all pass )
2975MB = Fails

Next I wanted to see if 3000MB is sufficient for .net 5 with Workstation GC
3100MB ( control ) = Pass
3075MB = Fails
3050MB = Fails ( before it starts processing )
3025MB = Fails ( before it starts processing )
3000MB = Fails ( before it starts processing )

Note: I only set the container memory limit via --memory= and the COMPlus_gcServer env value. I did not touch the heap limit, so keep in mind that 3000MB means a container climit of 3000MB but a hard heap limit of 2250MB. ( 75% of climit )!

Conclusions

Version/GC Mode 2975MB 3000MB 3100MB 3175MB
.NET 3.1 Workstation Fail Pass Pass Pass
.NET 5.0 Workstation Fail Fail Pass Pass
.NET 3.1 Server Fail Fail Pass Pass
.NET 5.0 Server Fail Fail Fail Pass

So it seems .NET 5.0 needs an extra 100MB climit ( = 75MB heap limit ) to be happy, compared to .net 3.1. It isn't clear to me yet as to why - I memory profiled .NET 5 workstation GC vs. .NET 3.1 workstation GC and the allocations, and it appears largely identical.

Top is 3.1, Bottom is 5.0
image

@janvorli
Copy link
Member

I was testing the test app from #50414 locally today and while I can repro it on .NET 6 preview 6, the test runs just fine on .NET 6 RC1. @Maoni0 said there was a change that came in preview 7 and that can likely fix the issue, so I gave it a try and it seems to be the case.

@Bio2hazard
Copy link
Author

Bio2hazard commented Sep 15, 2021

I was testing the test app from #50414 locally today and while I can repro it on .NET 6 preview 6, the test runs just fine on .NET 6 RC1. @Maoni0 said there was a change that came in preview 7 and that can likely fix the issue, so I gave it a try and it seems to be the case.

I just updated my dockerfile with

FROM mcr.microsoft.com/dotnet/runtime:6.0.0-rc.1-bullseye-slim AS base
WORKDIR /app

FROM mcr.microsoft.com/dotnet/sdk:6.0.100-rc.1-bullseye-slim AS build
...

and wanted to report back that for my app nothing has changed.

{"eventType":"FullGcMemoryInfoReport","rowNum":350000,"gen0CollectionCount":4,"gen1CollectionCount":3,"gen2CollectionCount":2,"compacted":true,"concurrent":false,"generation":1,"index":4,"fragmentedBytes":300936,"promotedBytes":4860848,"finalizationPendingCount":24,"heapSizeBytes":2210557344,"memoryLoadBytes":455081984,"pauseTimePercentage":0.95,"pinnedObjectsCount":1,"totalCommittedBytes":2292232192,"totalAvailableMemoryBytes":2437939200,"highMemoryLoadThresholdBytes":2925527040}
{"eventType": "CGroupMemoryReport", "rowNum": 350000, "limitInBytes": 3250585600, "usageInBytes": 487907328, "maxUsageInBytes": 487907328}
Aborted

@janvorli
Copy link
Member

Hmm, it is possible that this issue and the one in #50414 are different then.

@Maoni0
Copy link
Member

Maoni0 commented Sep 15, 2021

@janvorli, it's also possible that you were hitting that issue (from what you showed me it definitely looked like you were hitting it) but @Bio2hazard was not.

@tmds
Copy link
Member

tmds commented Sep 16, 2021

To clarify, the default GCHeapHardLimitPercent for containers with a hard memory limit is 75%, so in those tests I'm increasing the value, not reducing it. +1 So that test shows that increasing the limit from 75% -> 80% is enough to make it pass.

Ah yes. Same as with increasing --memory, this shows increasing heap size a little makes the OutOfMemoryException no longer occurs.

Using GCHeapHardLimitPercent is more efficient because it allows to keep the same container memory limit.

Conclusions

From the table it's clear your limits are on the edge. Reducing 100MB memory makes it fail on .NET Core 3.1. Adding 75 MB makes it pass on .NET 5+.

.NET 5+ needs a little more heap to run your application. It's a regression, though not a major one.

@janvorli
Copy link
Member

@Bio2hazard would you mind trying the experiment with modifying the limits with 6.0 RC1? I am interested in seeing if the behavior persists or if the increment needed to make it pass is smaller.

@Bio2hazard
Copy link
Author

Bio2hazard commented Sep 16, 2021

Sure thing @janvorli

I used the -rc.1-bullseye-slim docker images for both runtime and sdk and set <TargetFramework>net6.0</TargetFramework><LangVersion>10</LangVersion> in my csproj.

First I tested .NET 6.0 RC1 Workstation GC
3100MB: Pass
3075MB: Fail

Those results are the same as .NET 5

Next I tested .NET 6.0 RC1 Server GC
3200MB: Pass (3x)
3175MB: Fail (3x)

Next I re-tested .NET 5.0 Server GC since I previously had only ran it once with 3175MB where it had passed.
3175MB: 2 Fails, 1 Pass

So it seems when I had tested .NET 5.0 Server GC before it just got lucky.

I then wanted to determine the lowest amount of memory ( in 5MB increments ) where .NET 5.0 Server GC can pass 5 runs in a row. This turned out to be 3190MB.

I repeated that test for .NET 6.0 RC1, and it ended up being 3190MB as well. In other words, I was unable to observe any differences between .NET 5 and .NET 6 RC1

Version/GC Mode 2975MB 3000MB 3100MB 3190MB
.NET 3.1 Workstation Fail Pass Pass Pass
.NET 5.0 Workstation Fail Fail Pass Pass
.NET 6.0 RC1 Workstation Fail Fail Pass Pass
.NET 3.1 Server Fail Fail Pass Pass
.NET 5.0 Server Fail Fail Fail Pass
.NET 6.0 RC1 Server Fail Fail Fail Pass

As a side note, I looked into what is causing the ~120MB worth of Gen 0 garbage per GC cycle and it could be related to #27748 not quite sure, but all the allocations are coming from renting byte arrays from the ArrayPool. Or perhaps it chooses to release them instead of re-using them due to memory pressure? Not sure.

@Bio2hazard
Copy link
Author

I replaced the default MemoryPool<T>.Shared used by System.IO.Pipelines.Pipe with a custom one based on ConfigurableArrayPool, to allow for more arrays per bucket and facilitate re-use.

This significantly dropped allocations and allows Server GC to run successfully on less memory. This + the fact that it also runs successfully on less memory with Workstation GC points towards Server GC choosing to go OOM instead of running a GC cycle.

I think there are 2 problems at play here:
1)

.NET 5+ needs a little more heap to run your application. It's a regression, though not a major one.

  1. Server GC mode does not work well when the application is running close to its heap limit. It will go OOM without trying to run a GC cycle that would have prevented it.

With this information I might be able to set something up to reproduce the issue that I can share.

@tmds
Copy link
Member

tmds commented Sep 17, 2021

I replaced the default MemoryPool.Shared used by System.IO.Pipelines.Pipe with a custom one based on ConfigurableArrayPool, to allow for more arrays per bucket and facilitate re-use. This significantly dropped allocations and allows Server GC to run successfully on less memory.

cc @davidfowl

This significantly dropped allocations and allows Server GC to run successfully on less memory. This + the fact that it also runs successfully on less memory with Workstation GC points towards Server GC choosing to go OOM instead of running a GC cycle.

Or that using Server GC is causing additional allocations on the heap that can't be reclaimed.

Server GC mode does not work well when the application is running close to its heap limit. It will go OOM without trying to run a GC cycle that would have prevented it. With this information I might be able to set something up to reproduce the issue that I can share.

It is an interesting hypothesis, which was definitely the case for #50414.
That was fixed in preview7. So you'd need to reproduce it using latest .NET 6 preview/rc.

@Bio2hazard
Copy link
Author

Ok @tmds @janvorli

I think I was able to create some sample code that reproduces the issue: https://gist.github.com/Bio2hazard/ee353c1042ee56a97c0d0b3d62c590bc

With workstation GC this runs reliably, at least I didn't get any OOM despite running it for 5 minutes. On Server GC it will eventually go OOM, sometimes quickly, sometimes after a few minutes.

The memory requirements of the app should be static, which is why it works on workstation gc. Each cycle adds the same amount of garbage to the heap, so the fact that server GC sometimes goes OOM after 1 second, and sometimes goes OOM after 60 seconds should indicate that the OOM is purely driven by GC behavior, not app requirements.

The app itself just allocates a large amount of memory to the LOH and then endlessly thrashes a small amount of memory. It also shows the annoying issue with array pool and async that was leading to unexpected allocations in my production app.

Please let me know if the repro sample works for you.

@Bio2hazard
Copy link
Author

Hi @tmds,

just wanted to follow up and ask if the reproduction code worked. It uses the RC runtime:6.0.0-rc.1-bullseye-slim.

I'd love to see this fixed so I'd like to know if there's anything else I can do to help on this.

@janvorli
Copy link
Member

@Bio2hazard I am sorry, I've completely missed your previous comment with a link to the repro. Thank you a lot for that! I'll give it a try. I suspect that the issue might be related to the fact that on Unix, the mmap allocated memory doesn't actually get reflected in any cgroups values that report memory in use until the pages are touched. In fact, they don't contribute to the resident set at all until they are touched. The discrepancy between what GC sees as a committed memory and the memory load reported by cgroups might throw it over the edge in some specific cases. But that's just my theory.

@Bio2hazard
Copy link
Author

Hi @janvorli ,

I suspect that the issue might be related to the fact that on Unix, the mmap allocated memory doesn't actually get reflected in any cgroups values that report memory in use until the pages are touched.

That's a reasonable assumption, but I've actually accounted for that. 👍

In my reproduction code I make sure to touch every buffer right away.

When I first started investigating this issue, I stumbled across #55126 (comment) which recommends using TouchPage to ensure the pages are touched, so I've been making sure to account for that throughout my work to root cause this problem.

@janvorli
Copy link
Member

@Bio2hazard while you have accounted for the buffers you allocate, you cannot do the same for allocations made by the managed and native parts of the runtime. So there will always be some discrepancy depending on what your app does. I have noticed in the past that e.g. openssl allocates quite a lot of native memory and I am not sure what are the usage patterns around that memory. But as for your repro, I think the difference will be quite small.
Anyways, I was able to reliably reproduce the problem locally using your repro and I am investigating it. So far I can see that when it OOMs, an allocation that overflows the GC budget is requested. Your code requests allocation of 4kB array and GC needs to allocate more space for the current heap's allocation context. So it requests 2621440 bytes from the small object heap and that overflows the budget.

@tmds
Copy link
Member

tmds commented Sep 24, 2021

I think I was able to create some sample code that reproduces the issue: https://gist.github.com/Bio2hazard/ee353c1042ee56a97c0d0b3d62c590bc

I'm looking at the example.

Because of the await Task.Delay(3) you're jumping threads which causes ArrayPool<byte>.Shared to use extra memory due to thread local buffers:

/// <summary>A per-thread array of arrays, to cache one array per array size per thread.</summary>
[ThreadStatic]
private static ThreadLocalArray[]? t_tlsBuckets;

That extra memory needed as the app runs could be enough for the OOM.

Or, there is still reclaimable memory but the GC doesn't try to collect (or is limited in what it can collect) when an allocation fails. This may be a bug, or it can be by design. @janvorli @Maoni0 what is the expected behavior?

@Bio2hazard
Copy link
Author

@janvorli hah good catch, yeah I can't account for the managed and native runtime parts. Glad to hear the repro is working!

@tmds yes, I'm aware of the implications of the await Task.Delay(3) - in fact, I purposefully placed it there because I wanted to draw attention as to why my production app had unexpected allocations. Below it, I have a commented out line for GC.AllocateUninitializedArray<byte>(allocationSize); which also worked to reproduce the issue.

I think the issue is a bit more severe than saying it uses extra memory. While technically correct, it consistently uses extra memory to the point where the pooling provides almost no benefit at all, and I would argue that most developers would not expect that behavior from ArrayPool.Shared. Since the amount of buffers per pool size is limited, and each thread has its own pool, it appears that any async usage / thread jumps between renting and returning can easily lead to one thread exhausting its available buffers, while the other thread discards returned buffers due to being at the limit.

There are 2 problems with this.

  1. The documentation for ArrayPool<T>.Shared makes no mention of this behavior. I feel it should caution against renting / returning on different threads. ( It would also be wonderful if a separate default pool could be offered that forgoes the PerCore cpu advantages in favor of reliably pooling memory in a async setting )

  2. Look at the usage examples for System.IO.Pipelines, which is the first search result. The recommended pattern is to concurrently read and process on 2 separate threads. In my production app this lead to repeated buffer allocations, which is one of the things that - as per the doc - the pipelines should prevent.

It's of course possible that I mis-interpreted the behavior, but between #27748 and the behavior observed in both my production app and the reproduction sample, it definitely seems easy to run into unexpected allocations.

That extra memory needed as the app runs could be enough for the OOM.

The absolute memory requirements of the reproduction do not exceed the available memory, so it runs without issues on Workstation GC. Furthermore, on server GC it runs for a variable amount of time, so some number of iterations pass. It just seems like the GC tries to squeeze in a bit more than is available and fails out without running a GC cycle that could've prevented it.

@jkotas
Copy link
Member

jkotas commented Sep 24, 2021

Related: #52098

@tmds
Copy link
Member

tmds commented Sep 25, 2021

Below it, I have a commented out line for GC.AllocateUninitializedArray(allocationSize); which also worked to reproduce the issue.

Are you also uncommenting the ArrayPool<byte>.Shared.Return?

it consistently uses extra memory to the point where the pooling provides almost no benefit at all, and I would argue that most developers would not expect that behavior from ArrayPool.Shared.

It uses the TLS storage for extra performance at the cost of some memory.

Also, because your app runs so close to the memory limit, it will trim buffers on gen2 GC also undoing the performance benefit of pooling:

Gen2GcCallback.Register(s => ((TlsOverPerCoreLockedStacksArrayPool<T>)s).Trim(), this);

So for this to work well, you need to pay with some memory.

The absolute memory requirements of the reproduction do not exceed the available memory, so it runs without issues on Workstation GC.

Workstation GC will be more deterministic. Server GC will reduce GC pause time to improve overall response time but that comes at the cost of extra resources, which may include memory.

Because an app runs under a limit with workstation GC does not mean the app will not go OOM under the same limit with server GC.

without running a GC cycle

I'm curious to learn if a GC is performed on the path that leads to the OOM exception, and whether it may be limited in some way.

@Bio2hazard
Copy link
Author

Bio2hazard commented Sep 25, 2021

Are you also uncommenting the ArrayPool.Shared.Return?

Yes, though the numbers at the top may need to be tweaked for optimal reproduction of the issue with GC.AllocateUninitializedArray(allocationSize);. After I made the decision to go with ArrayPool.Shared I tweaked things to find the optimal reproduction case for that, so I don't know how exactly it'll behave with the current settings.

Also, because your app runs so close to the memory limit, it will trim buffers on gen2 GC also undoing the performance benefit of pooling:

If you look at the Trim code, it uses Utilities.GetMemoryPressure()

Utilities.MemoryPressure pressure = Utilities.GetMemoryPressure();

Which in turn uses HighMemoryLoadThresholdBytes to determine memory load.

const double HighPressureThreshold = .90; // Percent of GC memory pressure threshold we consider "high"
const double MediumPressureThreshold = .70; // Percent of GC memory pressure threshold we consider "medium"
GCMemoryInfo memoryInfo = GC.GetGCMemoryInfo();
if (memoryInfo.MemoryLoadBytes >= memoryInfo.HighMemoryLoadThresholdBytes * HighPressureThreshold)
{
return MemoryPressure.High;
}

This actually takes us back full circle to the original topic of this issue, that HighMemoryLoadThresholdBytes is higher than TotalAvailableMemoryBytes , which was stated in previous responses as being the default and by design and dialog then shifted to the OOM I've been encountering.

This means for a container with a 3 GB limit, the ArrayPool.Shared implementation will only start aggressively trimming if the Memory Load exceeds 2,609,192,631 - however the TotalAvailableMemoryBytes on a 3 GB container is just 2,415,919,104. This means when run on a container with a memory limit, ArrayPool.Shared is unaware that the app is running close to the memory limit and will never be able to aggressively trim.

Because an app runs under a limit with workstation GC does not mean the app will not go OOM under the same limit with server GC.

Based on previous comments I was lead to believe that a OOM situation on server GC that doesn't occur on workstation GC was an anomaly, and that the server GC mode should attempt a full GC prior to going OOM:

another thing I should mention, if it's not obvious, is that of course we would do a full compacting GC if we cannot commit memory based on the hardlimit.

yeah, if it doesn't get OOM with workstation GC but does with Server, or if it doesn't get OOM with a larger limit but with a smaller limit, that's clearly an issue that should be investigated.

So at this point I don't know what the expected behavior here is. I'm very curious to see what @janvorli s investigation reveals.

@tmds
Copy link
Member

tmds commented Sep 27, 2021

ArrayPool.Shared is unaware that the app is running close to the memory limit and will never be able to aggressively trim.

I agree. This should also consider load of the managed heap in a container, and that should trigger agressive trimming in this example.

Looking at the reproducer from performance perspective, you don't want this trimming to occur.
So if this is your expected workload, you should increase your available memory. Or, you should use a pool that requires less memory (e.g. not keeping per thread/core buckets).

@BMartin345
Copy link

Any updates on this issue?

@saber-wang
Copy link

This problem is also encountered in the container

System.OutOfMemoryException: Exception of type \u0027System.OutOfMemoryException\u0027 was thrown.    at System.GC.AllocateNewArray(IntPtr typeHandle, Int32 length, GC_ALLOC_FLAGS flags)    at System.GC.\u003CAllocateUninitializedArray\u003Eg__AllocateNewUninitializedArray|66_0[T](Int32 length, Boolean pinned)    at System.Buffers.TlsOverPerCoreLockedStacksArrayPool\u00601.Rent(Int32 minimumLength)    at System.Text.Json.Utf8JsonWriter.WriteStringEscapeValue(ReadOnlySpan\u00601 value, Int32 firstEscapeIndexVal)    at System.Text.Json.Serialization.Converters.StringConverter.Write(Utf8JsonWriter writer, String value, JsonSerializerOptions options)    at System.Text.Json.Nodes.JsonObject.WriteTo(Utf8JsonWriter writer, JsonSerializerOptions options)    at System.Text.Json.Nodes.JsonObject.WriteTo(Utf8JsonWriter writer, JsonSerializerOptions options)    at System.Text.Json.Nodes.JsonNode.ToJsonString(JsonSerializerOptions options) 

@mangod9
Copy link
Member

mangod9 commented Apr 7, 2022

@saber-wang your issue seems to be unrelated to this. Could you please create a separate bug with repro or dumps? Usually OOMs are not GC related, but the application not freeing memory appropriately.

Also for this particular issue this looks to be something to be fixed in ArrayPool?

@steelheaddigital
Copy link

steelheaddigital commented Aug 31, 2023

I just want to confirm that we are also seeing this issue with .NET 6 and Docker Swarm. We were seeing a steady stream of System.OutOfMemoryException. Disabling Docker memory limits stopped the errors. The hosts have plenty of free memory and memory limits were set sufficiently above what the containers were actually using. This is with Docker version 24.0.2.

@davgia
Copy link

davgia commented Oct 20, 2023

I want to add that the issue is present even with .NET 6 and Kubernetes (containerd). What I've noticed is that the process does not use more than the 50% of the available memory (limited from K8s). Is there a way to advise the process memory limit?

@groone
Copy link

groone commented Oct 20, 2023

Workaround is to define DOTNET_GCHighMemPercent environment variable (in hex): https://learn.microsoft.com/en-us/dotnet/core/runtime-config/garbage-collector#high-memory-percent
The defaults for a k8s container with memory limits are insane. High memory is defined higher than the total limit : High memory is by default 90% of physical limit while the max heap size is 75% of physical limit. The heap size limit is reached first and OutOfMemoryException is thrown as containers don't use swap memory.
So for my ~ 1 GB container I went for 72% ( DOTNET_GCHighMemPercent = 48 ) and it solved the out of memory exception problems for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests