-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AArch64 CI tests: qemu hits memory limit and fails with SIMD tests enabled #1893
Comments
I suspect a qemu issue, as @alexcrichton had said earlier; it's too bad that upgrading to 5.0.0 didn't fix it. I wonder if we could transition to running CI jobs on our native aarch64 machine, now that we have one -- @alexcrichton, thoughts (I think GitHub has a native-CI-runner feature)? |
Locally I ran the test suite in qemu 5.0.0 and I saw the peak memory usage jump by ~1GB after applying #1871. This is the peak memory usage of QEMU itself when running the test suite. Already 10GB is pretty huge, for comparison it takes 200MB on native to run the I ran a small test on Github Actions CI and found that a program could allocate a 10687086592-byte (9.95 GiB) vector but would fail to allocate 10791944192 bytes (10.05 GiB). Similarly in local testing (according to Given that this doesn't feel like a bug in QEMU other than "maybe too much memory is used?" and it seems like we're just hitting OOM on CI. It appears that if we cross the 10GiB threshold for allocated memory we get OOM-killed. That would explain why it's not an issue locally either because we presumably have lots more ram and/or less aggressive OOM killers. In terms of fixing this, that may be a bit harder. Some options include:
None of these AFAIK are easy-ish things to do, unfortunately... I suppose there's the option of writing fewer tests :) |
Hmm. Just now I went down a small rabbit-hole trying to work out if there's a way to reduce the translation cache size for qemu's JIT, in case that's the issue. Unfortunately it seems there's only Another option to add to the above list would be "fix qemu's memory blowup". Unfortunately that doesn't seem a whole lot easier than the other options, but who knows, maybe it's a quick fix once found. @akirilov-arm: for now, while we develop aarch64 SIMD support, I think it's reasonable to keep the SIMD tests specifically disabled in-tree, in the absence of better options. (We should be careful to run tests locally on a native aarch64 machine, of course.) We'll have to find a better solution before declaring SIMD "done", though. I'll go ahead and rename this issue to track the qemu memory blowup (which is the root problem), if you don't mind. Sorry again about our CI wonkiness! |
@cfallin What is your preference with respect to opening PRs implementing AArch64 functionality - don't enable any relevant tests, but document their names in the description, so that people may run them manually, or enable all relevant tests, but disable them afterwards in case of CI failures (whose cause seems to be running out of memory)? I like the second option more - we have already merged a couple of changes after I had tried to push the first iteration of #1871, so evidently it works. Honestly, it's a little bit bizarre that the
On the other hand I have the feeling that we may run out of luck soon and start seeing consistent failures with any test. cc @jgouly |
Yes, I think this is the best option -- let's do this for now, and reference this issue when we have to disable a test to get a green CI to merge. |
This commit disables the usage of "static" memory on CI and instead forces all memories to be "dynamic" meaning that they reserve much smaller chunks of memory. This causes the QEMU process's memory to drastically drop (10GiB -> 600MiB) and should allow us to keep enabling tests without hitting the OOM killer on CI. Closes bytecodealliance#1871 (includes that) Closes bytecodealliance#1893
Whoa nice find, that gives me an idea and testing locally it drastically reduces the memory usage of qemu (10GB -> 600MB). I think that means we can fix our CI quite easily actually! |
* Enable the spec::simd::simd_align test for AArch64 Copyright (c) 2020, Arm Limited. * Disable static memory under QEMU on CI This commit disables the usage of "static" memory on CI and instead forces all memories to be "dynamic" meaning that they reserve much smaller chunks of memory. This causes the QEMU process's memory to drastically drop (10GiB -> 600MiB) and should allow us to keep enabling tests without hitting the OOM killer on CI. Closes #1871 (includes that) Closes #1893 * Fix typo Co-authored-by: Anton Kirilov <anton.kirilov@arm.com>
The AArch64 CI test that runs using QEMU fails consistently for PR #1871 and the reasons are not clear - here's the relevant excerpt from the log:
I have reproduced the test environment locally using the following commands:
However, I don't experience any test failures. In addition to that, I don't see any issues either when I run the test natively in an AArch64 environment. In that case the list of commands can be simplified to:
Note that the
--features test-programs/test_programs
parameter is omitted because it requiresrust-lld
, which appears not to be a part of the native AArch64 toolchain.This issue has also been discussed in PR #1802.
cc @cfallin
The text was updated successfully, but these errors were encountered: