-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: "out of memory" in heapsampling.go on 386 and wasm since 2021-11-05 #49564
Comments
We haven't seen any more of these failures. Our working hypothesis is that these were related to the new pacer. That landed 2021-11-04. In all of these failures, the heap is around 1.7 GB when it fails, which is way bigger than we'd expect from this test. Based on printing MemStats.Sys at the end of this test, the largest memory footprint I've seen on my linux/amd64 laptop is 120 MB. |
I want to amend @aclements' a little bit just to say that we've seen these large memory footprints (I think around 113 MiB?) with the old pacer too ( What's strange is that the failures showed up twice, then for a period of about 8 days, then disappeared. |
Checking on this as a release blocker. Is this still occurring @bufflig? |
Still no more failures in the logs. Maybe this was a bad interaction with some other
|
Seems still no new failure since 2021-11-12. |
It's unfortunate that we don't know what caused this, but at this point I think we can close this issue. We'll reopen it if the issue returns. |
2022-02-15T21:18:59-08ed488/windows-386-2008 Given the timing (and the relative rarity since the last failure), retargeting to Go 1.19. |
Another one this week:
|
@golang/runtime, this release-blocking issue needs an owner now that it is no longer assigned to Jeremy. (This is a recurring test regression on a first class port.) |
I've been investigating heap overrun issues in a test I wrote for #48409 that appear to be similar to #37331. So far, I've discovered that any test that allocates in a loop can somewhat easily overrun the heap in an unbounded manner due to an OS scheduler preemption (even on undersubscribed systems) during mark termination. I believe I have sufficient evidence to narrow it down to an OS scheduler preemption at this point, too. I've been trying to obtain a scheduler trace and prove it for certain. It reproduces more readily on VMs, but those traces seem to be busted. I can reproduce on a dedicated machine as well, but not while tracing is active. (I ran it continuously over the weekend trying to obtain a trace -- no dice... on Linux, anyway.) This bug frankly looks a lot like those cases, and is consistent with previous behavior I've observed when investigating this bug (with smaller spikes). |
FWIW, the best mitigation I have come up with now is to slow down how fast the program is allocating by inserting |
Would it be OK to reduce the size of allocations in that test (maybe just on 32-bit systems)? Based on my back-of-the-envelope calculation, each |
There is new code that allows you to create Windows dumps when you Alex |
Curiously, also on
|
It's not so surprising to me. :) I encountered this failure on js/wasm when landing the SetMemoryLimit changes. Basically, js/wasm is very slow and single-threaded, so it can trigger the GC CPU limiter if it gets particularly unlucky, so it's especially sensitive to bugs in the GC CPU limiter, at least when a test allocates in a loop like this one. I suspect https://go.dev/cl/404304 will help on that front, since there are still bugs. :) |
A different
|
Looking back over this, I think the only thing to do here is to slow down the allocation rate of this test. If it remains an issue after that, we should consider just skipping the test on some platforms. I hope (🤞 ) that the js/wasm issues are resolved by the fixes to the GC CPU limiter. |
Change https://go.dev/cl/408825 mentions this issue: |
greplogs --dashboard -md -l -e FAIL\\s+heapsampling\\.go --since=2021-01-01
2021-11-12T18:57:22-b1b6d92/windows-386-2008
2021-11-12T16:58:34-95d0657/windows-386-2008
2021-11-05T20:59:43-dbd3cf8/windows-386-2008
2021-11-05T18:20:07-53bab19/windows-386-2008
2021-11-05T17:46:59-e48e4b4/windows-386-2008
2021-11-05T15:52:40-a0d661a/windows-386-2008
(CC @bufflig)
The text was updated successfully, but these errors were encountered: