-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std library thread.rs: undo attempt at NetBSD "find parallelism" #116665
Conversation
…ism". First off, I have it on good authority this code is Just Wrong: by default a given thread does not have affinity to any specific set of CPUs. This particular change came with this pull request: rust-lang#112226 However, even worse, this code causes a segmentation fault for certain NetBSD target architectures in the "bootstrap" program when building rust natively on those platforms. So far armv7/9.0, powerpc/10.0_BETA, and i386/9.3 all crash with a segmentation fault. However, for some strange reason, this isn't consistent across the board: riscv64/current, amd64/10.0_BETA, aarch64/9.0 and sparc64/10.0_BETA all pass this hurdle. A trivial C reimplementation also doesn't crash on any of these systems, ref. the thread which starts at https://mail-index.netbsd.org/current-users/2023/10/10/msg044510.html but also always prints 0. However, if we get a SEGV running this code, the entire build fails, of course. So ... while I do not have a full explanation for the SEGVs, this undoes the addition from pull request 112226, and restores the ability to build rust natively on the above flagged-as-problematical platforms.
r? @m-ou-se (rustbot has picked a reviewer for you, use r? to override) |
Thinking a bit more about this, all those targets which pass this hurdle are 64-bit platforms, while all which crash are 32-bit platforms. I'll bet that is at least part of the explanation. |
This mirrors the upstream pull request I just filed at rust-lang/rust#116665 I have applied this to a 1.72.1 rust checkout, and have re-built the bootstrap kits for our NetBSD ports with that change applied, and this commit bumps the bootstrap requirement for 1.73.0 from 1.72.0 to use the newly generated 1.72.1 bootstrap kits for all the NetBSD targets. This gets us building again on NetBSD/evbarmv7hf 9.0, NetBSD/macppc 10.0_BETA, and most probably NetBSD/i386 9.3 as well, whereas with the 1.72.0 bootstrap code, the "bootstrap" rust program would crash with SEGV on those particular platforms.
It looks like we found another part of this problem. The problem is that the definition of is wrong for 32-bit ports. 32-bit ports are typically ILP32, i.e. both "int", "long" and "pointer" are 32 bits. And the C definition is
So explicitly setting this to a 64-bit type for all platforms is wrong. Will return with a separate request for that later. |
...in particular for 32-bit CPUs / ports, such as 32-bit arm, i386, and powerpc. In the C header files on NetBSD, this is defined as typedef unsigned long cpuid_t; and on ILP32 CPUs, that ends up being a 32-bit quantity. Defining this as a 64-bit type wrecks havoc on our 32-bit ports when e.g. _cpuset_isset() is used (as was introduced with 1.72.0), causing immediate SEGV due to NULL pointer de-reference, as observed in rust-lang/rust#116665 So, instead, define it as ::c_ulong, and let the CPU-specific type definitions take care of the sizing.
NetBSD's mod.rs: fix cpuid_t definition. ...in particular for 32-bit CPUs / ports, such as 32-bit arm, i386, and powerpc. In the C header files on NetBSD, this is defined as typedef unsigned long cpuid_t; and on ILP32 CPUs, that ends up being a 32-bit quantity. Defining this as a 64-bit type wrecks havoc on our 32-bit ports when e.g. _cpuset_isset() is used (as was introduced with rust 1.72.0), causing immediate SEGV due to NULL pointer de-reference, as observed in rust-lang/rust#116665 So, instead, define it as ::c_ulong, and let the CPU-specific type definitions take care of the sizing.
Hm, on second consideration, perhaps ripping this out is ... not the right thing. While it is true that by default a thread does not have affinity to a particular set of CPUs, the process may have a non-null set. However, following the application of rust-lang/libc#3386 this code will now most probably get a type error on at least some targets, in particular the 32-bit targets, so that part needs to be fixed if this code is to stay. |
...in particular for 32-bit CPUs / ports, such as 32-bit arm, i386, and powerpc. In the C header files on NetBSD, this is defined as typedef unsigned long cpuid_t; and on ILP32 CPUs, that ends up being a 32-bit quantity. Defining this as a 64-bit type wrecks havoc on our 32-bit ports when e.g. _cpuset_isset() is used (as was introduced with 1.72.0), causing immediate SEGV due to NULL pointer de-reference, as observed in rust-lang/rust#116665 So, instead, define it as ::c_ulong, and let the CPU-specific type definitions take care of the sizing.
...in particular for 32-bit CPUs / ports, such as 32-bit arm, i386, and powerpc. In the C header files on NetBSD, this is defined as typedef unsigned long cpuid_t; and on ILP32 CPUs, that ends up being a 32-bit quantity. Defining this as a 64-bit type wrecks havoc on our 32-bit ports when e.g. _cpuset_isset() is used (as was introduced with 1.72.0), causing immediate SEGV due to NULL pointer de-reference, as observed in rust-lang/rust#116665 So, instead, define it as ::c_ulong, and let the CPU-specific type definitions take care of the sizing.
@he32 What's the status of this? Do you still want this reviewed and merged? |
In NetBSD you might administratively run with a smaller-than-max affinity set, although the default is that the affinity set is empty. So ... in principle the code itself isn't wrong, as long as the type for So ... this patch exists in our packaging system, but will be removed once the correct definition of So ... I think I'll just abandon / cancel this pull request as an upstream request, and we'll deal with the issues in patches in our packaging system as long as is necessary. |
☔ The latest upstream changes (presumably #117285) made this pull request unmergeable. Please resolve the merge conflicts. |
As mentioned above, on second (third, fourth) take, the code isn't wrong, but the sibling fix to So with that, this is pull request is abandoned / canceled / closed. |
First off, I have it on good authority this code is Just Wrong: on NetBSD, by default a given thread does not have affinity to any specific set of CPUs.
This particular change came with this pull request:
Rust pull request 112226
However, even worse, this code causes a segmentation fault for certain NetBSD target architectures in the "bootstrap" program when building rust natively on those platforms. So far armv7/9.0, powerpc/10.0_BETA, and i386/9.3 all crash with a segmentation fault. However, for some strange reason, this isn't consistent across the board: riscv64/current, amd64/10.0_BETA, aarch64/9.0 and sparc64/10.0_BETA all pass this hurdle. A trivial C reimplementation also doesn't crash on any of these systems, ref. the thread which starts at
NetBSD current-users postings about this issue
but also always prints 0. However, if we get a SEGV running this code, the entire build fails, of course.
So ... while I do not have a full explanation for the SEGVs, this undoes the addition from pull request 112226, and restores the ability to build rust natively on the above flagged-as-problematical platforms.