-
-
Notifications
You must be signed in to change notification settings - Fork 470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linking musl with mold causes issues with global variables from libc #1071
Comments
I couldn't reproduce the issue on my machine, so I need your input files. Can you run the last link command with |
Attached is the tarball for the program which segfaults for me. Would you rather have the tarball from linking musl itself? Statically linked executables don't have this problem. |
I build your program with the given tarball, and the resulting executable worked without crashing in my Alpine/musl Docker container. It is likely that the executable itself isn't actually broken. So you wrote that you build musl yourself. Are you sure your musl is fine? |
If you provided a different libc.so, then yes it would have worked. Here is the tarball of the link step for musl: Yes, my musl is fine when linked with other linkers. |
It seems your reproducer fails really only when it was loaded by your musl libc.so. I built musl 1.2.4 myself and tried to run your program under my musl (i.e. run the program as The fact that your program didn't crash with other linkers doesn't immediately mean that your musl is fine; it might happen to work for some program (think C's undefined behavior). How did you build your musl? What is your distro? How can I reproduce your binaries from scratch? I also want to make sure you didn't apply your local patch to your musl. |
To clarify, were you able to use my It's not just this one off, its a large number of programs which crash or have bugs. I have not patched musl, it is built normally ( |
I could reproduce the issue with the musl built from your object files, but that's not really debuggable because it's just .o files. It's not that different from If KISS Linux provides an official docker image, I can fire it up and try it myself. |
We don't have an official docker image but I've created one. I think it should work if you run
(the image is here). I'm not particularly familiar with Docker but I have tested it and can still reproduce the issue. When you are in the image, you will have to do the following:
$ cat >test.c <<EOF
#define _GNU_SOURCE
#include <stdio.h>
#include <errno.h>
int main(void) {
puts(program_invocation_short_name);
return 0;
}
EOF
$ cc test.c
$ ./a.out
Segmentation fault (core dumped) |
Thanks for the info. How can I build musl with debug info? |
Sure. You need to go into the repository for musl and edit its cd ~/repos/repo/core/musl/
vi build Uncomment the If you want to be able to step through the source while debugging, you'll need to add something like this to the top of the build file: export CFLAGS="$CFLAGS -fdebug-prefix-map=$PWD=/usr/src/musl-1.2.4" and then put the musl source in mkdir -p /usr/src
cd /usr/src
kiss d musl
tar xzf ~/.cache/kiss/sources/musl/musl-1.2.4.tar.gz Finally you can |
Mimalloc pointers (see microsoft/mimalloc#360 (comment) and https://bugs.gentoo.org/917089) are somehow pointing to the wrong heap space after linking I can reproduce it with a Gentoo stage3 tarball: CMake Error at /usr/share/cmake/Modules/CMakeTestCCompiler.cmake:67 (message):
The C compiler
"/usr/lib/llvm/16/bin/x86_64-gentoo-linux-musl-clang"
is not able to compile a simple test program.
It fails with the following output:
Change Dir: /var/tmp/portage/sys-libs/libcxx-16.0.6/work/runtimes_build-abi_x86_64.amd64/CMakeFiles/CMakeScratch/TryCompile-ankcJC
Run Build Command(s):/usr/bin/ninja -v cmTC_391b2 && [1/2] /usr/lib/llvm/16/bin/x86_64-gentoo-linux-musl-clang -O2 -pipe -march=native -mtune=native -D_FORTIFY_SOURCE=3 -g0 -flto -MD -MT CMakeFiles/cmTC_391b2.dir/testCCompiler.c.o -MF CMakeFiles/cmTC_391b2.dir/testCCompiler.c.o.d -o CMakeFiles/cmTC_391b2.dir/testCCompiler.c.o -c /var/tmp/portage/sys-libs/libcxx-16.0.6/work/runtimes_build-abi_x86_64.amd64/CMakeFiles/CMakeScratch/TryCompile-ankcJC/testCCompiler.c
[2/2] : && /usr/lib/llvm/16/bin/x86_64-gentoo-linux-musl-clang -O2 -pipe -march=native -mtune=native -D_FORTIFY_SOURCE=3 -g0 -flto -O2 -pipe -march=native -mtune=native -D_FORTIFY_SOURCE=3 -g0 -Wl,-O3 -Wl,--as-needed -Wl,--strip-debug -Wl,--undefined-version -Wl,--icf=safe -Wl,--threads=4 -Wl,--compress-debug-sections=none -fuse-ld=mold -rtlib=compiler-rt -unwindlib=libunwind CMakeFiles/cmTC_391b2.dir/testCCompiler.c.o -o cmTC_391b2 && :
FAILED: cmTC_391b2
: && /usr/lib/llvm/16/bin/x86_64-gentoo-linux-musl-clang -O2 -pipe -march=native -mtune=native -D_FORTIFY_SOURCE=3 -g0 -flto -O2 -pipe -march=native -mtune=native -D_FORTIFY_SOURCE=3 -g0 -Wl,-O3 -Wl,--as-needed -Wl,--strip-debug -Wl,--undefined-version -Wl,--icf=safe -Wl,--threads=4 -Wl,--compress-debug-sections=none -fuse-ld=mold -rtlib=compiler-rt -unwindlib=libunwind CMakeFiles/cmTC_391b2.dir/testCCompiler.c.o -o cmTC_391b2 && :
mimalloc: error: mi_free: pointer does not point to a valid heap space: 0x7f1fbad089b0
clang-16: error: unable to execute command: Segmentation fault (core dumped)
clang-16: error: linker command failed due to signal (use -v to see invocation)
ninja: build stopped: subcommand failed. |
I built mold in the |
@rui314 Should be reproducible in a stage3-musl-llvm chroot after recompiling llvm with
emerge --sync
echo "sys-devel/llvm binutils-plugin" > /etc/portage/package.use/custom
emerge -1 =sys-devel/llvm-16.0.6 --exclude=llvm:17 && emerge sys-libs/mold
COMMON_FLAGS="-O2 -pipe -march=native -mtune=native -D_FORTIFY_SOURCE=3 -g0 -flto"
CC="clang"
CXX="clang++"
CFLAGS="${COMMON_FLAGS}"
CXXFLAGS="${COMMON_FLAGS} -stdlib=libc++"
FCFLAGS="${COMMON_FLAGS}"
FFLAGS="${COMMON_FLAGS}"
LDFLAGS="${COMMON_FLAGS} ${LDLIBS} -Wl,-O3 -Wl,--as-needed -Wl,--strip-debug -Wl,--undefined-version -Wl,--icf=safe -Wl,--threads=4 -Wl,--compress-debug-sections=none -fuse-ld=mold -rtlib=compiler-rt -unwindlib=libunwind"
CHOST="x86_64-gentoo-linux-musl"
ACCEPT_KEYWORDS="amd64 ~amd64"
LD="ld.mold"
LC_MESSAGES=C
EMERGE_DEFAULT_OPTS="${EMERGE_DEFAULT_OPTS}"
MAKEOPTS="-j4" emerge -1 =sys-libs/musl-1.2.3* sys-libs/libcxx --exclude=sys-devel/llvm |
@LinuxUserGD isn't it mold segfaulting in your case, not a program linked to musl built with mold? |
Yes, mold segfaults with Starting program: /usr/bin/ld.mold -pie --hash-style=gnu --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib/ld-musl-x86_64.so.1 -o a.out /lib/Scrt1.o /lib/crti.o /usr/lib/llvm/16/bin/../../../../lib/clang/16/lib/linux/clang_rt.crtbegin-x86_64.o -L/lib -L/usr/lib -plugin /usr/lib/llvm/16/bin/../lib/LLVMgold.so -plugin-opt=mcpu=skylake -plugin-opt=O2 -z relro -z now -O3 --as-needed --strip-debug --undefined-version --icf=safe --threads=4 --compress-debug-sections=none /tmp/check_cxx11-b34c02.o -lc++ -lm /usr/lib/llvm/16/bin/../../../../lib/clang/16/lib/linux/libclang_rt.builtins-x86_64.a --as-needed -lunwind --no-as-needed -lc /usr/lib/llvm/16/bin/../../../../lib/clang/16/lib/linux/libclang_rt.builtins-x86_64.a --as-needed -lunwind --no-as-needed /usr/lib/llvm/16/bin/../../../../lib/clang/16/lib/linux/clang_rt.crtend-x86_64.o /lib/crtn.o
[Detaching after fork from child process 232385]
mimalloc: error: mi_free: pointer does not point to a valid heap space: 0x7ffff7e36c50
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7fd7de7 in setjmp () from /lib/ld-musl-x86_64.so.1 |
@rui314 I made a docker image with the above commands run, so that it contains the buggy musl. Just
to reproduce. |
Thank you, everyone. I successfully reproduced the issue following your instructions. It's a challenging issue to debug, but it appears to be related to a subtle bug in weak symbol handling. I will prepare a fix. |
This was a bad bug, thank you again for reporting. I believe the above commit fixed the issue. Can you try again with the git head? |
It seems to be fixed, thank you! |
The mimalloc segfault is fixed by da3f5dd as well, thanks! |
--dynamic-list, --export-dynamic-symbol and --export-dynamic-symbol-list have different semantics for executables and DSOs. If the output is an executable, they specify a list of symbols that are to be exported. If the output is a shared object, they specify the list of symbols that are to be interposable. mold havne't implemented the latter semantics. This commit fixes that issue. Fixes rui314#1071
mold version: 2.0.0
musl version: 1.2.4
I recently rebuilt musl and used mold to link it, and subsequently experienced segfaults and bugs in a lot of random programs. After some digging, I found that the problems were all from globals from musl (
program_invocation_short_name
andoptind
in particular). Using a different linker to link musl fixed the problems.Interestingly, programs built with clang didn't have these problems. Consider this C program:
This program, built with GCC against musl linked with mold, segfaults when
puts
tries to dereference a NULL pointer.The difference between clang and GCC is how the global is accessed. GCC does this:
but clang does this:
I have confirmed that the use of
@GOTPCREL
fixes the GCC program.The first version always gets NULL, the second gets the correct value initialised by musl.
Similarly with
optind
, in GCC programs,optind
is always 1 even after callinggetopt
, but clang programs can read the updated value.Now, this is quickly approaching the limits of my understanding. Please let me know if I can help with more testing.
This happened to me once before a few months ago, but since then I had forgotten how I fixed it.
The text was updated successfully, but these errors were encountered: