Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mbuffer: cross-compiled libgcc_s.so.1 must be built with glibc headers for pthread_cleanup_push() to work #213453

Closed
Majiir opened this issue Jan 29, 2023 · 12 comments · Fixed by #247900
Labels
0.kind: bug Something is broken 6.topic: cross-compilation Building packages on a different platform than they will be used on

Comments

@Majiir
Copy link
Contributor

Majiir commented Jan 29, 2023

Describe the bug

On armv7l-linux or aarch64-linux cross-compiled from x86_64-linux, mbuffer breaks:

$ echo "hello" | mbuffer
hello
libgcc_s.so.1 must be installed for pthread_exit to work
Aborted (core dumped)

From armv7l-linux:

Module linux-vdso.so.1 with build-id 04745a31932e964ec4cd2f2af22bff73c959bc5d
Module libatomic.so.1 without build-id.
Module libdl.so.2 with build-id e21ee3b08115e4ff693e068a5ae47e30a6eed56e
Module ld-linux-armhf.so.3 with build-id 5450bffccfed072763370bd64a69b6ea758babb1
Module libc.so.6 with build-id cc4343a8abc2fde08e30176f06b9f4385f438ef5
Module libpthread.so.0 with build-id 6b805729d47513ad23db66eeac106a69998cb666
Module libm.so.6 with build-id 9350661951669899d3d720161626db749d157003
Module libcrypto.so.3 with build-id f4a6e190dcfd9af5793d9afd5cd89ef82559a3e4
Module mbuffer without build-id.
Stack trace of thread 2019:
#0  0x00000000b6a7b2f8 __pthread_kill_implementation (libc.so.6 + 0x7b2f8)
ELF object binary architecture: ARM

Steps To Reproduce

You can either run this on another architecture using boot.binfmt.emulatedSystems = [ "armv7l-linux" "aarch64-linux" ].

$ nix build nixpkgs#pkgsCross.aarch64-multiplatform.mbuffer
$ echo "hello" | result/bin/mbuffer

or

$ nix build nixpkgs#pkgsCross.armv7l-hf-multiplatform.mbuffer
$ echo "hello" | result/bin/mbuffer

Expected behavior

mbuffer should not crash:

$ echo "hello" | mbuffer
hello
summary:  0.0 kiByte in  0.0sec - average of  0.0 kiB/s

Additional context

This was responsible for syncoid failing to replicate datasets to my ARM NAS.

The issue does not occur for natively compiled aarch64-linux builds. It appears to be a cross-compilation issue only.

Notify maintainers

@tokudan @skeuchel

Metadata

 - system: `"armv7l-linux"`
 - host os: `Linux 6.1.8, NixOS, 22.11 (Raccoon), 22.11.20230127.cc4bb87`
@Majiir Majiir added 0.kind: bug Something is broken 6.topic: cross-compilation Building packages on a different platform than they will be used on labels Jan 29, 2023
@wegank
Copy link
Member

wegank commented Jan 29, 2023

@trofi

Majiir added a commit to Majiir/nixpkgs that referenced this issue Jan 29, 2023
@trofi
Copy link
Contributor

trofi commented Jan 30, 2023

It is a bug in glibc/gcc derivations. In non-cross cases it gets copied from gcc used to bid glibc into glibc output: https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/libraries/glibc/default.nix#L67-L86

    # When building glibc from bootstrap-tools, we need libgcc_s at RPATH for
    # any program we run, because the gcc will have been placed at a new
    # store path than that determined when built (as a source for the
    # bootstrap-tools tarball)
    # Building from a proper gcc staying in the path where it was installed,
    # libgcc_s will now be at {gcc}/lib, and gcc's libgcc will be found without
    # any special hack.
    # TODO: remove this hack. Things that rely on this hack today:
    # - dejagnu: during linux bootstrap tcl SIGSEGVs
    # - clang-wrapper in cross-compilation
    # Last attempt: https://github.com/NixOS/nixpkgs/pull/36948
    preInstall = lib.optionalString (stdenv.hostPlatform == stdenv.buildPlatform) ''
      if [ -f ${stdenv.cc.cc}/lib/libgcc_s.so.1 ]; then
          mkdir -p $out/lib
          cp ${stdenv.cc.cc}/lib/libgcc_s.so.1 $out/lib/libgcc_s.so.1
          # the .so It used to be a symlink, but now it is a script
          cp -a ${stdenv.cc.cc}/lib/libgcc_s.so $out/lib/libgcc_s.so
      fi
    '';

But in cross-case gcc used to build glibc does not yet provide libgcc_s.so as it needs bits of glibc for it (and the library path is slightly incorrect). One of possible workarounds is to try to add -lgcc_s to NIX_LDFLAGS to get it pulled via RPATH from expected location.

@vcunat
Copy link
Member

vcunat commented May 4, 2023

We don't do the copying to glibc anymore, but this error still occurs as described on current nixpkgs master.

@trofi
Copy link
Contributor

trofi commented May 4, 2023

Yup. cross-compilation never propagated libgcc_s.so.1 to glibc in any form (and does not today).

@ghost
Copy link

ghost commented Jun 29, 2023

Could you please try this PR? It should fix the problem.

#238154

@Majiir
Copy link
Contributor Author

Majiir commented Jun 30, 2023

With that PR, I'm seeing this on both aarch64-linux (a Raspberry Pi CM4) and with QEMU on x86_64-linux:

$ nix build .#pkgsCross.aarch64-multiplatform.mbuffer

$ echo "hello" | result/bin/mbuffer
hello
Trace/breakpoint trap (core dumped)

$ echo $?
133

Log from aarch64 machine:

Process 8391 (mbuffer) of user 1000 dumped core.

Module libgcc_s.so.1 without build-id.
Module mbuffer without build-id.
Stack trace of thread 8393:
#0  0x0000007f91cddf74 uw_init_context_1 (libgcc_s.so.1 + 0xdf74)
#1  0x0000007f91cde5e4 _Unwind_ForcedUnwind (libgcc_s.so.1 + 0xe5e4)
#2  0x0000007f952a7278 __pthread_unwind (libc.so.6 + 0x87278)
#3  0x0000007f9529f508 pthread_exit (libc.so.6 + 0x7f508)
#4  0x000000000040ab44 readBlock (mbuffer + 0xab44)
#5  0x000000000040ad5c inputThread (mbuffer + 0xad5c)
#6  0x0000007f9529e630 start_thread (libc.so.6 + 0x7e630)
#7  0x0000007f95306e9c thread_start (libc.so.6 + 0xe6e9c)

Stack trace of thread 8392:
#0  0x0000007f958d3a50 n/a (n/a + 0x0)
#1  0x0000007f958c05a4 n/a (n/a + 0x0)
#2  0x0000007f958c09f8 n/a (n/a + 0x0)
#3  0x0000007f958c11d4 n/a (n/a + 0x0)
#4  0x0000007f9534ddc4 do_dlsym (libc.so.6 + 0x12ddc4)
#5  0x0000007f958b943c n/a (n/a + 0x0)
#6  0x0000007f958b943c n/a (n/a + 0x0)
ELF object binary architecture: AARCH64

Log from x86_64-linux machine with QEMU:

Process 2387662 (qemu-aarch64) of user 1000 dumped core.
                                                   
Module /nix/store/gyg6zqfnx4250nkshaxsnp4x73hciwxa-mbuffer-aarch64-unknown-linux-gnu-20230301/bin/mbuffer without build-id.
Module /nix/store/ifwgjiqa6rwdw8r1y4qimmc53p79709h-aarch64-unknown-linux-gnu-stage-static-gcc-12.3.0-libgcc/lib/libgcc_s.so.1 without build-id.
Module libpcre2-8.so.0 without build-id.
Module libgcc_s.so.1 without build-id.
Module libnuma.so.1 without build-id.
Module libcapstone.so.4 without build-id.
Module libz.so.1 without build-id.
Stack trace of thread 2387664:
#0  0x00007fad663d9049 __sigsuspend (libc.so.6 + 0x39049)
#1  0x000055af21f761f0 dump_core_and_abort (qemu-aarch64 + 0x2991f0)
#2  0x000055af21f76575 handle_pending_signal (qemu-aarch64 + 0x299575)
#3  0x000055af21f78235 process_pending_signals (qemu-aarch64 + 0x29b235)
#4  0x000055af21d479fa cpu_loop (qemu-aarch64 + 0x6a9fa)
#5  0x000055af21f804a1 clone_func (qemu-aarch64 + 0x2a34a1)
#6  0x00007fad66425e24 start_thread (libc.so.6 + 0x85e24)
#7  0x00007fad664a79b0 __clone3 (libc.so.6 + 0x1079b0)

Stack trace of thread 2387663:
#0  0x00007fad6649fd8d syscall (libc.so.6 + 0xffd8d)
#1  0x000055af21fbbeda qemu_event_wait (qemu-aarch64 + 0x2deeda)
#2  0x000055af21fc3ea2 call_rcu_thread (qemu-aarch64 + 0x2e6ea2)
#3  0x000055af21fbad58 qemu_thread_start (qemu-aarch64 + 0x2ddd58)
#4  0x00007fad66425e24 start_thread (libc.so.6 + 0x85e24)
#5  0x00007fad664a79b0 __clone3 (libc.so.6 + 0x1079b0)

Stack trace of thread 2387665:
#0  0x00007fad66422a36 __futex_abstimed_wait_common (libc.so.6 + 0x82a36)
#1  0x00007fad66425228 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x85228)
#2  0x000055af21fbb86b qemu_cond_wait_impl (qemu-aarch64 + 0x2de86b)
#3  0x000055af21d459f7 cpu_exec_start (qemu-aarch64 + 0x689f7)
#4  0x000055af21d47908 cpu_loop (qemu-aarch64 + 0x6a908)
#5  0x000055af21f804a1 clone_func (qemu-aarch64 + 0x2a34a1)
#6  0x00007fad66425e24 start_thread (libc.so.6 + 0x85e24)
#7  0x00007fad664a79b0 __clone3 (libc.so.6 + 0x1079b0)

Stack trace of thread 2387662:
#0  0x000055af21d469d6 safe_syscall_base (qemu-aarch64 + 0x699d6)
#1  0x000055af21f827fd do_futex.constprop.0 (qemu-aarch64 + 0x2a57fd)
#2  0x000055af21f8cbba do_syscall1.constprop.0 (qemu-aarch64 + 0x2afbba)
#3  0x000055af21f9195e do_syscall (qemu-aarch64 + 0x2b495e)
#4  0x000055af21d479c7 cpu_loop (qemu-aarch64 + 0x6a9c7)
#5  0x000055af21d43736 main (qemu-aarch64 + 0x66736)
#6  0x00007fad663c3ace __libc_start_call_main (libc.so.6 + 0x23ace)
#7  0x00007fad663c3b89 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x23b89)
#8  0x000055af21d43e75 _start (qemu-aarch64 + 0x66e75)
ELF object binary architecture: AMD x86-64

@ghost
Copy link

ghost commented Jul 2, 2023

Well it's finding libgcc_s.so now, so there's clearly some other problem with mbuffer addition to that.

With that PR, I'm seeing this on both aarch64-linux (a Raspberry Pi CM4) and with QEMU on x86_64-linux:

Module libgcc_s.so.1 without build-id.

Module /nix/store/ifwgjiqa6rwdw8r1y4qimmc53p79709h-aarch64-unknown-linux-gnu-stage-static-gcc-12.3.0-libgcc/lib/libgcc_s.so.1 without build-id.

When I run it with dontStrip=true I even get line numbers from inside libgcc:

Thread 3 "mbuffer" received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 0xfffff255d1a0 (LWP 11182)]
0x0000fffff1d1df74 in uw_init_context_1 (context=context@entry=0xfffff255bea0, outer_cfa=outer_cfa@entry=0xfffff255c620, outer_ra=0xfffff79a77a4 <__pthread_unwind+52>) at ../../../gcc-12.3.0/libgcc/unwind-dw2.c:1593

That line (unwind-dw2.c:1593) is the last line in the following snippet of code:

static void __attribute__((noinline))
uw_init_context_1 (struct _Unwind_Context *context,
»       »          void *outer_cfa, void *outer_ra)
{
  void *ra = __builtin_extract_return_addr (__builtin_return_address (0));
  _Unwind_FrameState fs;
  _Unwind_SpTmp sp_slot;
  _Unwind_Reason_Code code;

  memset (context, 0, sizeof (struct _Unwind_Context));
  context->ra = ra;
  if (!ASSUME_EXTENDED_UNWIND_CONTEXT)
    context->flags = EXTENDED_CONTEXT_BIT;

  code = uw_frame_state_for (context, &fs);
  gcc_assert (code == _URC_NO_REASON);

... so libgcc is deliberately assert()ing here due to _URC_NO_REASON.

Grepping through the gcc docs turns up:

If the frame can be decoded, the register save addresses should be updated in @var{fs} and the macro should evaluate to @code{_URC_NO_REASON}.

This is happening because mbuffer uses pthread_cleanup_push(). Once you do that, glibc needs to unwind the stack frame of any thread that calls pthread_exit(). And unwinders are a giant headache.

I'm afraid that's all I've got here. I've removed the closed-by from #238154 since it was only part of the problem here.

@ghost
Copy link

ghost commented Jul 2, 2023

Ah, I think I know what it is.

We can compile a libgcc without glibc, but it won't have an unwinder. We need to pass the glibc headers to the gcc build in order to get one of those.

@ghost
Copy link

ghost commented Jul 3, 2023

@Majiir would you please try #241208?

I believe it fixes one of the two problems which caused this issue, and it includes #238154 which fixes the other one. It works for me:

$ echo hello | /nix/store/h2z60m6gz0qmbch0dh9wjnmr2a66nmll-mbuffer-aarch64-unknown-linux-gnu-20230301/bin/mbuffer
hello
summary:  0.0 kiByte in  0.0sec - average of  0.0 kiB/s

BTW, this issue (cross-compile mbuffer, then run it under qemu) is a really valuable test case. We really ought to add it to pkgs/tests/cross so Hydra will catch any regressions automatically. Especially since the failure mode gives the user no error message to help them troubleshoot the problem! mbuffer is a great candidate for this since it's one of those very rare non-C++ programs that uses advanced stack-unwinding features from pthreads.

@ghost ghost changed the title mbuffer: libgcc_s.so.1 must be installed for pthread_exit to work mbuffer: cross-compiled libgcc_s.so.1 must be built with --enable-threads for pthread_cleanup_push() to work Jul 3, 2023
@ghost ghost changed the title mbuffer: cross-compiled libgcc_s.so.1 must be built with --enable-threads for pthread_cleanup_push() to work mbuffer: cross-compiled libgcc_s.so.1 must be built with glibc headers for pthread_cleanup_push() to work Jul 3, 2023
@Majiir
Copy link
Contributor Author

Majiir commented Jul 3, 2023

@amjoseph-nixpkgs Confirming #241208 fixes the issue. Tested with both armv7l-linux and aarch64-linux on real hardware and with QEMU on x86_64.

@ghost ghost mentioned this issue Jul 13, 2023
12 tasks
@ghost
Copy link

ghost commented Jul 13, 2023

I think this is fixed, and more cleanly, by #243230 Edit: it is not. ☹️ I will keep trying.

Test is in #243248

Currently building.

@ghost
Copy link

ghost commented Jul 13, 2023

Added this to pkgs.tests.cross.sanity, but commented out since it is not fixed yet.

@ghost ghost self-assigned this Aug 8, 2023
@ghost ghost linked a pull request Aug 8, 2023 that will close this issue
@ghost ghost closed this as completed in #247900 Aug 15, 2023
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken 6.topic: cross-compilation Building packages on a different platform than they will be used on
Projects
None yet
4 participants