-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: c-shared builds fail with musllibc #13492
Comments
musl does expose getauxval() for querying the auxiliary vector but does not provide any way to get the main program's argc/argv. The musl maintainers argue that this isn't something a shared library should have access to anyway. |
Somewhat relatedly, C99 says [section 5.1.2.2.1]:
So in the shared library cases where a C main function runs, the Go runtime should conservatively assume the C program may legitimately mutate the argv strings. In particular, it can't use gostringnocopy on them like it currently does in goargs. |
So I think in the cases where we need to play nicely with arbitrary C code:
That would just leave needing to figure out a solution for os.Args. It would kinda suck, but maybe we could just leave it nil in the case where Go isn't acting as the program's entry point? |
If we can't get os.Args, then we have to leave it nil. But I think we should only do that for environments where we can't get it. When we can get it, as for glibc, we should. |
I just looked at glibc, Bionic, uClibc, musl, and dietlibc, as well as the dynamic linkers for FreeBSD, NetBSD, and OpenBSD. It looks like only glibc passes (argc, argv, envp) to the DSO init functions. If we want to detect glibc at build time, it seems like we can include To detect at runtime, we could use a weak reference to a glibc-only symbol like gnu_get_libc_version and see if it resolves. I'm worried about making sure we pick a symbol that won't later be implemented by other C libraries though. (E.g., I found a thread where someone suggested adding a gnu_get_libc_version function to musl to make it compatible with Nvidia's binary drivers.) Any other suggestions/ideas for detecting glibc? |
For what it's worth, I tracked down that glibc started passing (argc, argv, envp) to the DSO init functions in 1996: https://sourceware.org/git/?p=glibc.git;a=blobdiff;f=elf/dl-open.c;h=76f6329762308de4ba1620c50ff32d2c02359766;hp=40b52247253cf045498761342afd09ba3c7e1187;hb=dcf0671d905200c449f92ead6cf43c184637a0d5;hpb=4884d0f03c5a3b3d2459655e76fa2d0684d389dc So at least we don't need to worry about glibc version detection. |
As explained nicely in golang/go#13492, since 1996 glibc's dlopen() passes argv and a bunch of other stuff to the initialization functions of shared objects, while we don't pass any argument (see object::run_init_funcs()). Golang code (ref cloudius-systems#522), in particular, assumes it gets argv, auxv, etc., this way. Fixes: cloudius-systems#795 Signed-off-by: Benoît Canet <benoit@scylladb.com>
I just ran into this issue while trying to help my partner debug a crash when using a golang plugin they'd written for fluentbit (https://github.com/fluent/fluent-bit). When running on Alpine Linux which uses musl, I get a segfault inside of After finding this issue, it seems that the way forward is to:
Can anyone comment if testing for glibc would still be desired or not? |
The same problem appears when using c-archive build mode with musl. |
If there is a reliable way to test for glibc, then I think it would be perfectly reasonable to do that. @mdempsky 's comment above suggests a way. |
Unassigning because I'm not planning to work on this, but still happy to review if anyone has suggestions. |
@mdempsky I started work on the x64 / x86 / arm / arm64 versions the other night, but it was becoming quite time consuming to test. I have docker images running qemu, which are bootstrapped with a cross-compiled toolchain from my host using bootstrap.sh, but then recompile the toolchain locally inside of qemu for CGO support. Is there a better / faster way to test across the various targets? Also, do you have any recommendations for getting access to the above pre-processor defines from the platform-specific assembly files? Is it possible, or would they need to call into some C function to do the work, e.g.: void override_args(int argc, char *argv, int *argc_out, char *argv_out) {
#if GLIBC
*argc_out = argc;
*argv_out = argv;
#endif
} |
@inolen I think calling out to a C function (or Go function using cgo) in runtime/cgo is probably simplest/cleanest. Even if POSIX guaranteed there was a system header that could be safely #include'd into non-C files, cmd/asm doesn't support the full C preprocessor language. We could potentially have cmd/dist detect glibc and generate cmd/asm-compatible .h files, but then we need to re-run make.bash depending on target libc, which seems unfortunate. Lastly, sorry, I don't have any good solution to efficiently testing either. That's a contributing factor to why I haven't gotten around to it yet. :/ |
As explained nicely in golang/go#13492, since 1996 glibc's dlopen() passes argv and a bunch of other stuff to the initialization functions of shared objects, while we don't pass any argument (see object::run_init_funcs()). Golang code (ref cloudius-systems#522), in particular, assumes it gets argv, auxv, etc., this way. Fixes: cloudius-systems#795 Signed-off-by: Benoît Canet <benoit@scylladb.com>
As explained nicely in golang/go#13492, since 1996 glibc's dlopen() passes argv and a bunch of other stuff to the initialization functions of shared objects, while we don't pass any argument (see object::run_init_funcs()). Golang code (ref cloudius-systems#522), in particular, assumes it gets argv, auxv, etc., this way. Fixes: cloudius-systems#795 Signed-off-by: Benoît Canet <benoit@scylladb.com>
As explained nicely in golang/go#13492, since 1996 glibc's dlopen() passes argv and a bunch of other stuff to the initialization functions of shared objects, while we don't pass any argument (see object::run_init_funcs()). Golang code (ref cloudius-systems#522), in particular, assumes it gets argv, auxv, etc., this way. Fixes: cloudius-systems#795 Signed-off-by: Benoît Canet <benoit@scylladb.com>
As explained nicely in golang/go#13492, since 1996 glibc's dlopen() passes argv and a bunch of other stuff to the initialization functions of shared objects, while we don't pass any argument (see object::run_init_funcs()). Golang code (ref cloudius-systems#522), in particular, assumes it gets argv, auxv, etc., this way. Fixes: cloudius-systems#795 Signed-off-by: Benoît Canet <benoit@scylladb.com>
CL https://golang.org/cl/37868 mentions this issue. |
@mdempsky pushed review to https://go-review.googlesource.com/c/37868/ Initially, I tried the approach I mentioned above, but setting up the default arguments / calling into the cgo function in each platform-specific assembler function became a lot of code. After digging around the runtime code more, I found the I'm not sure if the code allocating the default arguments is correct. I read through https://github.com/golang/go/blob/master/src/runtime/HACKING.md and I think using No new tests were added, as the old testcarchive / testcshared tests failed when using musl. However I did setup a few scripts which ran the golang library tests for various os / arch combinations through docker / binfmt_misc / qemu here: |
Hi, I just created a patch for gcc-go that avoids this problem just skipping goargs() and goenv() when build as c-[shared|archive]. Essentially:
It works properly with libs that don't rely/need this data like https://github.com/hoehermann/purple-gowhatsapp/ but probably will affect others that use it in their logic. In the aim of get a better integration of musl and go, would you be willing to change this glibc-ism and enforce that when building on c-archive or c-shared args and env are not accesible? There are plenty ways for passing the needed info. If yes I can submit a PR, currently it's based on gcc-go so probably it needs some changes. |
@donob4n That approach means that |
The current situation is that it crashes dereferencing a pointer that came from interpreting garbage on the stack or in a register as a pointer, so anything that doesn't crash from that seems like an improvement. If there are libraries written in Go that are trying to interpret the main program's initial arguments, or other random data left there after the main program overwrote that storage, this is surely a bug that needs to be identified and fixed. It's very intentional that musl does not provide these arguments to ctors because (1) it's nonstandard functionality with no means to detect its presence, meaning the only way you can use it is by writing nonportable code that commits UB when the functionality isn't available, and (2) it's functionality whose only purpose is to write library-unsafe library code that peeks (or worse, pokes) at data that doesn't belong to it. |
From my perspective, the current situation is that these libraries work as expected when using glibc, which is the majority of Linux systems. I respect that musl has adopted a different approach, and I certainly think we should support that if we can figure out how. But while I don't know of any good answer, I don't think the approach of breaking existing code that works when using glibc is the best one. |
If the problem is breaking working code for glibc it could only skip 'goargs()' when no-glibc lib is detected (or at least musl). This way the affected apps will only fail when someone tries to build them with a non-glibc and hopefully he will found the problem and report to upstrream. Since it needs some hooks on sysargs() and other funcs, it could also write some debug warning like "Trying to read args but built as c-xxxx". |
Is there a reasonable way for a Go archive to know whether it is being run on a glibc or a musl system? |
Do you mean in runtime? There is a GLIBC macro that can be used when building. |
The runtime is not written in C. Building a pure Go program must not require a C compiler. |
I don't understand the question. If you're building a Go program which is pure Go interfacing with the underlying target's syscall layer only, with no C libs, and providing its own entry point, then of course you can do whatever you want. My understanding is that we're talking about build modes where code is being built to be linked with C library code, or as a library that's loadable into C programs. (Here by "C" I'm being a little bit sloppy, but the meaning is programs built on the underlying host C implementation and that need to be compatible with it.) In that case, the only place the arguments are available is via a local pointer object belonging to The question is not whether you're running on a musl-based host but whether you're linked to a host libc environment that provides these nonstandard interfaces. |
@richfelker You probably know this, but just for clarity for all. We are talking about I agree that what really matters is some way for the shared library to determine whether the dynamic linker is passing the argc/argv values to the It is not my place to suggest musl changes, but given the existing glibc behavior, behavior that some projects other than Go also expect, would it be completely unreasonable for musl to clear the first three arguments passed to |
Yes. Specifically, it would invoke UB. The specification for the ctor functions in the init array is that they take no arguments. If we called them with arguments, that's a call where the signature of the callee mismatches the function type used to call it, and the behavior is undefined. This actually matters if you want to someday support fancy ABIs that can do runtime CFI-like checks. Establishing a norm that we support this usage, even just by passing zeroed args, precludes the possibility to do anything like that. |
Given the popularity and benefits of Alpine Linux in containerized applications, the dependency on non-standard / glibc-specific behaviors makes advocating for the use of Golang difficult. If we're to describe these issues in the terms of "first class ports" versus non-first class ports, it might make more sense to rename the ports |
The ports page https://go.dev/wiki/PortingPolicy does already say that only glibc ports are considered first class. |
I think @jgowdy's subtext was: please reconsider that musl is not first class. It's kind of ironic that the main drivers for container deployments and the popularity of Go (docker, kubernetes, ...) are written in Go, but the the majority(?) of their payloads run on a "non-first-class" operating system (alpine linux). |
Fair enough, sorry for the misunderstanding. This issue is not the place for that discussion, though. |
Running into this issue as well when trying to build a Go library that's statically linked. It'd be nice if musl could be considered a first-class platform because Alpine Linux is an extremely common OS for cloud purposes. |
Because of golang/go#13492, we cannot use Go as a shared library that we link to Rust. Instead we are now building the Go code as a separate binary that is executed by the Rust one. Co-authored-by: Greg Soltis <greg.soltis@vercel.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com> Co-authored-by: Greg Soltis <greg@goldfiglabs.com> Co-authored-by: Thomas Knickman <tom.knickman@vercel.com>
Randomly came across this. In 2015 @mdempsky suggested attempting to resolve |
…rchive This change fixes a segmentation fault that occurs on musl-based systems when using c-archive (or c-shared with LD_PRELOAD). The issue was caused by uninitialized argv and envp, which led to crashes at runtime. This fix ensures that both argv and envp are correctly initialized, preventing the segmentation fault. While this fix addresses crashes due to missing argv and envp, it does not address the error described in issue golang#54805, where a c-shared library loaded with dlopen triggers an error related initial-exec access to TLS objects [1] in a dynamically loaded library. [1]: http://git.musl-libc.org/cgit/musl/commit/?id=5c2f46a214fceeee3c3e41700c51415e0a4f1acd Fixes golang#13492
This change fixes a segmentation fault that occurs on musl-based systems when using c-archive (or c-shared with LD_PRELOAD). The issue was caused by uninitialized argv and envp, which led to crashes at runtime. This fix ensures that both argv and envp are correctly initialized, preventing the segmentation fault. While this fix addresses crashes due to missing argv and envp, it does not address the error described in issue golang#54805, where a c-shared library loaded with dlopen triggers an error related initial-exec access to TLS objects [1] in a dynamically loaded library. [1]: http://git.musl-libc.org/cgit/musl/commit/?id=5c2f46a214fceeee3c3e41700c51415e0a4f1acd Fixes golang#13492
Change https://go.dev/cl/610837 mentions this issue: |
Currently, some of the init_array functions provided by a c-shared build expect to be called with (argc, argv, envp) arguments as is done by glibc, but this isn't specified by the ELF gABI for DT_INIT_ARRAY (http://www.sco.com/developers/gabi/latest/ch5.dynamic.html#init_fini), and isn't done with other libc implementations like musllibc.
CC @ianlancetaylor @mwhudson
The text was updated successfully, but these errors were encountered: