Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address Sanitizer ASLR Bug #184

Closed
jamesjuett opened this issue Aug 22, 2024 · 6 comments
Closed

Address Sanitizer ASLR Bug #184

jamesjuett opened this issue Aug 22, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@jamesjuett
Copy link
Contributor

jamesjuett commented Aug 22, 2024

Over the past month or so, I've encountered a "heisenbug" where the address sanitizer would mysteriously fail on the autograder, with stderr something like:

AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL

I mentioned this in https://github.com/eecs280staff/p2-cv/pull/227 and it also happened when I was testing for https://github.com/eecs280staff/p3-euchre/pull/249. See this submission at Aug 21, '24, 02:11 PM EDT here for an example. Some of the Euchre Public Tests (UB Checks) fail in this way. (Note that these links will not be publicly available.)

The issue manifests only occasionally and always disappeared on a rerun.

We also ran into this with CI failures in our project repos because this but affected Github Actions runners for a hot minute:
actions/runner-images#9491

Problem

It seems this is related to a bug mentioned here:
https://stackoverflow.com/questions/77894856/possible-bug-in-gcc-sanitizers

Essentially, it was somewhat recently discovered that not enough bits of entropy were being used for ASLR in the Linux kernel configuration. Some distributions, including Ubuntu, now include a patch. In particular, Ubuntu linux kernel versions 6.5.0 and newer use 32 instead of 28 bits of entropy. See https://launchpad.net/ubuntu/+source/linux/6.5.0-25.25, search for "ARCH_MMAP_RND_". Or this commit.

Unfortunately, this change aggravates a bug in Address Sanitizer, which has since been patched, but the patch is only available in fairly new versions of each compiler. The fix appears to have made its way into Ubuntu gcc 13.2:
https://git.launchpad.net/ubuntu/+source/gcc-13/commit/?id=6c5be2a496335c513dbe6fa85df2402cfc0f0a8b

Affected Platforms

CAEN Linux

$ uname -a
Linux caen-vnc-mi12.engin.umich.edu 4.18.0-553.16.1.el8_10.x86_64 #1 SMP Thu Aug 1 04:16:12 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux
$g++ --version
g++ (GCC) 8.5.0 20210514 (Red Hat 8.5.0-22)

I am not able to reproduce the issue on CAEN Linux. I can't find much online, but I'm guessing the kernel is defaulting to only 28 bits of entropy, so the bug in ASan doesn't manifest.

Student WSL Ubuntu

I am not able to reproduce this on my machine, unless I manually set sudo sysctl vm.mmap_rnd_bits=32, in which case the bug readily shows up (maybe 25% of runs on any trivial program with the address sanitizer enabled).

It looks like the WSL kernel may eventually include an increase to use more (32) bits of entropy:
microsoft/WSL2-Linux-Kernel@856cf33

But that commit is currently only for 6.X.X versions of the kernel, and WSL currently ships with 5.X.X by default. Only if someone manually upgraded it, which is pretty involved, would it be a potential issue:
https://learn.microsoft.com/en-us/community/content/wsl-user-msft-kernel-v6

Ubuntu 24.04 is available for WSL via the Microsoft Store, but it isn't the default yet. When it become the default, our students would get gcc 13.2 or newer, which contains the fix for the ASan bug. Hopefully that happens before kernel 6.X.X becomes the default.

Student Mac

Not affected AFAIK.

Autograder

This is the most interesting case. Clearly the AG is affected, since I've been seeing the issue. But it only manifests rarely - it turns out that some of the AG grader machines have different kernel versions. For example:
https://autograder.io/web/project/2666?current_tab=student_lookup&current_student_lookup=495003
Particularly the submissions from:

  1. Aug 22, '24, 07:38 AM EDT
  2. Aug 22, '24, 07:35 AM EDT

1 does not fail the asan test. uname -r it is kernel version 5.4.0-182-generic.
2 does fail the asan test. uname -r is 6.5.0-35-generic.

So the failure is dependent on the particular machine that the grading job gets dispatched to. Even though the grading is running in a docker container, the ASLR is still handled by the host kernel.

The ideal fix is likely to upgrade our AG image to Ubuntu 24.04 so we can use gcc 13.2 which has the ASan fix. I plan to try this out and make a PR with the appropriate changes if it seems to resolve the issue.

@jamesjuett jamesjuett added the bug Something isn't working label Aug 22, 2024
@jamesjuett
Copy link
Contributor Author

I've confirmed that upgrading to gcc 13.2 seems to fix the issue.

See the Aug 23, '24, 12:19 PM EDT submission at:
https://autograder.io/web/project/2666?current_tab=student_lookup&current_student_lookup=495003

The uname -r (actually uname -a, just realized I mistitled the test case) is:

Linux 6671b93c2f37 6.5.0-35-generic #35-Ubuntu SMP PREEMPT_DYNAMIC Fri Apr 26 11:23:57 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

with the 6.5.0 kernel that has more aggressive ASLR and for which things were failing earlier.

The ASan test on that submission shows 100 successful runs with no error.

@awdeorio
Copy link
Contributor

I'm looking into Ubuntu versions concurrently in EECS 485. I don't have a Windows machine. These Microsoft docs seem to imply that Ubuntu 24.04 is the recommended (default?) version.

@jamesjuett
Copy link
Contributor Author

I'm looking into Ubuntu versions concurrently in EECS 485. I don't have a Windows machine. These Microsoft docs seem to imply that Ubuntu 24.04 is the recommended (default?) version.

That tutorial from Canonical suggests searching for and installing Ubuntu 24.04. But, it looks like the default in the Microsoft store for just plain "Ubuntu" is still Ubuntu 22.04: https://apps.microsoft.com/detail/9pdxgncfsczv?hl=en-us&gl=US

So, if students install "Ubuntu" via the Microsoft Store, it's 22.04 for now. I assume this is also what they get with wsl --install command from PowerShell, which the MS docs says defaults to "Ubuntu" (https://learn.microsoft.com/en-us/windows/wsl/install#install-wsl-command).

But, I haven't found anything that indicates how the pointer from "Ubuntu" to "Ubuntu 22.04" is maintained or when it might switch over to "Ubuntu 24.04". It wouldn't be surprising if it happens soon.

@awdeorio
Copy link
Contributor

awdeorio commented Aug 23, 2024

Do you think we (anyone who uses the EECS 280 tutorial, which includes EECS 485 students) should suggestion wsl --install Ubunut-24.04 ?

EDIT: wsl --install -d Ubuntu-24.04
EDIT 2: A second motivation would be to make the install instructions deterministic where everyone would get the same version, regardless of precisely when MS updates their "Ubuntu" pointer.

@jamesjuett
Copy link
Contributor Author

Do you think we (anyone who uses the EECS 280 tutorial, which includes EECS 485 students) should suggestion wsl --install Ubunut-24.04 ?

Definitely worth considering! I like the idea of consistency. Want to make a new issue for that specifically?

@jamesjuett
Copy link
Contributor Author

Closing this - the AG was the only platform that seemed to be currently affected, and that's resolved with https://github.com/eecs280staff/p1-stats/pull/373.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants