runtime: ill-fated GC outcome in spiky/bursty scenarios #42805

raulk · 2020-11-24T14:33:46Z

Motivation

The Go GC machinery leaves the door open to several ill-fated scenarios, that have been reported in other issues. The one closest to what I'm describing here (and providing a reproduction for) is #10064.

Imagine the Go program has 64GiB available memory. The last garbage collection resulted in an live set of 54GiB. With the default GOGC=100 value, the pacer will schedule a collection once another 54GiB have been allocated, or in the next 2 minutes, whichever happens first.

If there's a rapid spike of heap allocation due to program mode change (e.g. a database compaction, which is how my team discovered this), such that it amounts to more than 10GiB, an OOM panic will occur.

And that appears to be reasonable behaviour, if those 10GiB are effectively retained / reachable / in scope.

However, what's not reasonable is that the same will occur even if 9.90GiB of those 10GiB have been released / become unreachable. For example, the program underwent 99 iterations of this logic, in under 2 minutes from the last GC:

allocate a 100MiB slice, populate it.
flush it to disk.
syntactically the slice goes out of scope (becomes unreachable).

The next iteration (100th) will cause the go runtime to expand the heap beyond its available memory, and that will cause an OOM panic. Instead, I would've expected the runtime to detect the impending OOM, and instead choose to run a forced GC.

The above scenario is greatly simplifying things, of course.

Reproduction harness

I built a reproduction harness here: https://github.com/raulk/trampoline/.

This program creates a cgroup and enforces the memory limit indicated by the -limit parameter (default: 32MiB). The cgroup's swap memory value is set to the same value, to prevent the program from using any swap. (IMPORTANT: make sure the right cgroup options are enabled to enforce this caging; check README for more info).

The program will then allocate a byte slice of size 90% of the configured limit (+ slice overhead). This will simulate a spike in heap usage, and will very likely induce GC at around 30MiB (with the default limit value).

Of course, the exact numbers are dependent on many conditions, and thus non-deterministic. Could be less or more in your setup, and you may need to tweak the limit parameter.

Given the default value of GOGC=100, the GC pacer will schedule to run when the allocated heap amounts to 2x of the live set at GC mark phase end. In my setup, this clocks in at 60MiB. Of course, that's beyond our 32MiB limit.

Next, the program releases the 90% byte slab, and allocates the remaining 10%. With the default limit value, it releases 30198988 bytes to allocate 3355443 bytes (obviating slice headers).

At that point, the program has enough unused heap space that it could reclaim and assign to the new allocation. But unfortunately, GC is scheduled too far out, and the Go runtime does not run GC as a last resource before going above its limit. Therefore, instead of reusing vacant, resident memory, it decides to expand the heap and goes beyond its cgroup limit, thus triggering the OOM killer.

The gist here is that the Go runtime had 9x times (roughly) as much memory free as it needed to allocate, but it was not capable of reclaiming it in time.

Discussion & ideas

I'm not sure if it's at all possible to implement a reactive GC trigger, i.e. trap a failure in malloc and then force a GC run before trying to allocate again. I don't think it is, because most OS will use virtual memory with overcommit modes, so the malloc won't fail. I guess it's not possible either to trap the SIGKILL sent by the OOM killer; it induces immediate program termination.
All JVM GC algorithms call GC reactively, just like I would've expected Go to do. But the JVM also has a configurable maximum heap size, which Go doesn't (see runtime: make max heap size configurable #9849) That's probably what makes it possible to implement such a mechanism.
For large heaps, I think the go runtime should automatically regulate the GC percentage, reducing it as it the heap increases in size. The doubling effect of GOGC=100 (default) is ineffective with large heap sizes, as it allows too much garbage creation before a GC is triggered. Maybe also regulate the forced GCs frequency accordingly, to make it adaptable.

Workaround

We'll probably end up setting up a memory watchdog, initialized with a user-configured memory limit (à la JVM -Xmx). As the heap grows, we'll probably reduce GOGC dynamically by calling debug.SetGCPercent. As the heap approaches the limit, we'll trigger GC more aggressively and manually.

Related issues

#42430 (gc pacer problems meta-issue)
#14735 (throttling heap growth)
#16843 (mechanism for monitoring heap size)
#10064 (GC behavior in non-steady mode)
#9849 (make max heap size configurable)

The text was updated successfully, but these errors were encountered:

mvdan · 2020-11-24T14:43:19Z

cc @aclements @mknyszek @dvyukov

dvyukov · 2020-11-24T14:47:58Z

FWIW I still think that growing heap more carefully is a good idea. Number of times heap grows is limited, so it can't incur permanent negative performance effect. #10064 #14735

randall77 · 2020-11-24T16:29:52Z

This sounds like exactly the problem SetMaxHeap is intended to solve. See #42430 #29696.

dvyukov · 2020-11-24T16:40:39Z

I would say it's related, but also somewhat orthogonal. I can see how SetMaxHeap helps in server environment. But, say, I am compiling lots of beefy Go code on my not super beefy laptop, there are lots of parallelism and lots of compiler/linker/vet/test invocations. Frequently it badly freezes my machine, sometimes I need to hard reboot even. I am not sure who/how will set SetMaxHeap for all these subprocesses and what will be the limits. But if Go processes would be overall more careful consuming large amounts of memory, it may help.

randall77 · 2020-11-24T16:51:23Z

@dvyukov Sure that's a problem as well, but not the OP's problem. He has and knows a hard limit.

The difficulty I see in your scenario is how we would know. I don't see how a Go process can reliably tell that it is "using too much memory" until mmaping new memory fails. We could run a GC at that point, sure. But we never get a mmap failure in your scenario: the OS is stealing all the available memory, paging out your window manager, etc., in order to satisfy our requests. How do we know the OS is "trying too hard" and we should back off?

mknyszek · 2020-11-24T16:59:22Z

@raulk Thanks for the report! From my perspective (and I agree with Keith), this boils down to more evidence suggesting we should have a configurable maximum heap. You mention that you think GOGC should go down when the heap size goes up; I want to note that this is effectively what a maximum heap size achieves. In a world of containers, I think this makes a lot of sense.

@dvyukov We're reasonably careful about heap growths nowadays since we'll eagerly return memory to the OS in that case, but you still run into trouble with the amount of memory needed doubling (which is independent of "heap growth" as far as the runtime's meaning (i.e. (*mheap).grow), so I don't think just being careful on heap growths is enough.

Your example of lots of Go processes is a real issue, but I think it's also somewhat orthogonal to this issue which seems to be focused on a server application (@raulk correct me if I'm wrong). Unfortunately, Go generally doesn't play well with co-tenants (ironically it's usually worse if they're all Go code too, because the idle GC will try to eat up GOMAXPROCS CPUs). I'm think optimistically limiting GOGC during a live heap increase (like you suggest in #10064 and #14735) might help that a bit (maybe even more than a bit in some cases, and could be worth doing; not trying to make a judgement about that here), but it doesn't directly address the co-tenancy problem.

dvyukov · 2020-11-24T17:01:48Z

The overall idea is to grow heap slower. #10064 describes scenario where we can non-deterministically grow heap 2x of what it could be otherwise.

raulk · 2020-11-24T17:34:52Z

@mknyszek @randall77 I think I agree with your assessments here. I believe mmap would succeed when expanding the heap beyond the hard limit. But when the process effectively writes to the mapped memory, that's when it would become backed by physical memory and it would summon the OOMKiller. There might be ways that one can interrogate the OS about the limits in force, but this would be hugely platform dependent.

I think the most deterministic way of achieving sympathy here is through SetMaxHeap, although I do want to note that Go still has a fragmentation problem, as it does not perform compaction in its effort to be zero copy. So one could still get into fragmentation-based ill-fated scenarios, even if memory was freed in a timely manner.

The reason why I suggested decreasing GOGC as we approach the max heap is that if one doesn't do that, GC pacing becomes entirely reactive instead of proactive. For example, if I have a 64GiB max heap, my previous live set was 32GiB, with the default GOGC we wouldn't run GC proactively until another 32GiB have been allocated, which is obviously too late (or until the 2min timer fires).

That said, SetMaxHeap would trigger GC as the allocated heap approaches the 64GiB cap. I'm not familiar with the details on how that's implemented, but I'm assuming it will do so closer to the last mile (e.g. 90% of max heap?).

If that's the case, we would have had this timeline, potentially:

gc @ 32GiB
gc @ 62GiB (triggered by max heap supervisor)
gc @ 63GiB (triggered by max heap supervisor)
gc @ 64GiB (triggered by max heap supervisor)

Instead, if one decreases GOGC inversely proportional to the remaining until max heap, you could get a much balanced pattern, for example:

gc @ 32GiB (GOGC=>50)
gc @ 48GiB (GOGC=>25)
gc @ 60GiB (GOGC=> 12.5)
gc @ 62GiB (triggered by max heap supervisor)
gc @ 63GiB (triggered by max heap supervisor)
gc @ 64GiB (triggered by max heap supervisor)

raulk · 2020-11-24T17:36:03Z

In case it's useful, I tried https://go-review.googlesource.com/c/go/+/227767 with my program, and it did not help. I received the notification, but for some reason it still OOM'ed, even if I forced a manual GC inside the notification consumer.

mknyszek · 2020-11-24T18:50:11Z

I think the most deterministic way of achieving sympathy here is through SetMaxHeap, although I do want to note that Go still has a fragmentation problem, as it does not perform compaction in its effort to be zero copy. So one could still get into fragmentation-based ill-fated scenarios, even if memory was freed in a timely manner.

Nice observation about fragmentation. This is something that has come up already recently, and going forward I think anything like SetMaxHeap that lands should:

Account for fragmentation by making the limit an actual memory limit and not a heap usage limit.
Eagerly return memory to the OS to stay under that limit.

The reason why I suggested decreasing GOGC as we approach the max heap is that if one doesn't do that, GC pacing becomes entirely reactive instead of proactive.

The way SetMaxHeap currently works is closer to the former timeline, but the exact GC trigger point is set according to a feedback loop as usual. Personally, I'm not sure I see the value in the latter timeline, since you're not really gaining anything, you're just GCing more frequently. Plus, when you start bumping up against the limit, you want to give the GC as much runway as possible, especially if you're allocating heavily, to avoid actually passing the maximum (the max is actually a soft maximum because otherwise you end up with GC death spirals).

In case it's useful, I tried https://go-review.googlesource.com/c/go/+/227767 with my program, and it did not help. I received the notification, but for some reason it still OOM'ed, even if I forced a manual GC inside the notification consumer.

All feedback in this space is useful. :) The notification system for SetMaxHeap isn't supposed to be used for triggering a GC manually; the notification suggests the runtime is already aware that it needs to do something. It's supposed to be for having your application drop work (like a web server dropping requests or client connections). In practice, it's hard to use unless it's built deeply into your web framework already (be that net/http or something else), and even then it's not trivial to implement either. I don't think the notification system is something that we're going to carry forward at this time (at least not tied to any kind of heap limit).

mknyszek · 2020-11-24T18:51:38Z

The overall idea is to grow heap slower. #10064 describes scenario where we can non-deterministically grow heap 2x of what it could be otherwise.

@dvyukov Yeah, I think I get it. Almost like making the heap goal itself an EWMA or something. I think I can see the value in that. The cliff that's dependent on timing is a bit of a problem and it would be nice to smooth that out.

thepudds · 2022-11-04T14:22:39Z

@raulk It sounds like you might have introduced a workaround initially, but I'm curious if the Go 1.19 GOMEMLIMIT soft memory limit was able to address the problem you originally reported here?

mvdan added GarbageCollector NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Nov 24, 2020

raulk mentioned this issue Nov 30, 2020

implement a memory watchdog filecoin-project/lotus#5058

Closed

jordanlewis mentioned this issue May 8, 2021

sql: can OOM a cluster with concurrent TPC-H load cockroachdb/cockroach#64906

Closed

raulk mentioned this issue Sep 27, 2021

runtime/debug: soft memory limit #48409

Closed

mur-me mentioned this issue Nov 23, 2021

[placeholder] - for check of the LOTUS_MAX_HEAP variable in Lotus protofire/Filecoin-node-hosting-management#154

Open

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022

mknyszek added this to Go Compiler / Runtime Jul 7, 2022

mknyszek moved this to Triage Backlog in Go Compiler / Runtime Jul 15, 2022

seankhliao added this to the Unplanned milestone Aug 27, 2022

ganigeorgiev mentioned this issue Oct 2, 2023

Possible memory leak when writing a lot of data pocketbase/pocketbase#969

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: ill-fated GC outcome in spiky/bursty scenarios #42805

runtime: ill-fated GC outcome in spiky/bursty scenarios #42805

raulk commented Nov 24, 2020 •

edited

Loading

mvdan commented Nov 24, 2020

dvyukov commented Nov 24, 2020

randall77 commented Nov 24, 2020

dvyukov commented Nov 24, 2020

randall77 commented Nov 24, 2020

mknyszek commented Nov 24, 2020

dvyukov commented Nov 24, 2020

raulk commented Nov 24, 2020

raulk commented Nov 24, 2020 •

edited

Loading

mknyszek commented Nov 24, 2020

mknyszek commented Nov 24, 2020

thepudds commented Nov 4, 2022 •

edited

Loading

runtime: ill-fated GC outcome in spiky/bursty scenarios #42805

runtime: ill-fated GC outcome in spiky/bursty scenarios #42805

Comments

raulk commented Nov 24, 2020 • edited Loading

Motivation

Reproduction harness

Discussion & ideas

Workaround

Related issues

mvdan commented Nov 24, 2020

dvyukov commented Nov 24, 2020

randall77 commented Nov 24, 2020

dvyukov commented Nov 24, 2020

randall77 commented Nov 24, 2020

mknyszek commented Nov 24, 2020

dvyukov commented Nov 24, 2020

raulk commented Nov 24, 2020

raulk commented Nov 24, 2020 • edited Loading

mknyszek commented Nov 24, 2020

mknyszek commented Nov 24, 2020

thepudds commented Nov 4, 2022 • edited Loading

raulk commented Nov 24, 2020 •

edited

Loading

raulk commented Nov 24, 2020 •

edited

Loading

thepudds commented Nov 4, 2022 •

edited

Loading