Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[experiment] Try building Rust for AArch64 #13278

Closed
wants to merge 7 commits into from

Conversation

mati865
Copy link
Collaborator

@mati865 mati865 commented Sep 28, 2022

Can somebody with AArch64 hardware try to build it and report the error that will inevitably occur?

@jeremyd2019
Copy link
Member

https://github.com/msys2-arm/MINGW-packages/actions/runs/3147837656/jobs/5117738600 - this is running on a KVM virtual machine on a Raspberry Pi 4B 8GB, so whatever it does isn't going to happen fast 😁 - hopefully it produces something useful and not just a network issue

@jeremyd2019
Copy link
Member

It seems to have failed on dependencies, though I didn't see a download error. My guess is depending on the group mingw-w64-clang-x86_64-toolchain didn't work out.

@jeremyd2019
Copy link
Member

https://github.com/msys2-arm/MINGW-packages/actions/runs/3153177053/jobs/5130433521 managed to get past the dependencies and sources download, hopefully will come up with a useful error this time.

@jeremyd2019
Copy link
Member

==> Starting build()...
  /C/_/mingw-w64-rust/PKGBUILD: line 92: The: command not found
  ==> ERROR: A failure occurred in build().

Wha?

@mati865
Copy link
Collaborator Author

mati865 commented Sep 29, 2022

Shoot, uncommented 1 line too much.

@jeremyd2019
Copy link
Member

https://github.com/msys2-arm/MINGW-packages/actions/runs/3154365326/jobs/5132415331

  RuntimeError: src/stage0.json doesn't contain a checksum for dist/2022-08-11/rust-std-1.63.0-aarch64-pc-windows-gnu.tar.xz. Pre-built artifacts might not be available for this target at this time, see https://doc.rust-lang.org/nightly/rustc/platform-support.html for more information.

@jeremyd2019
Copy link
Member

https://github.com/msys2-arm/MINGW-packages/actions/runs/3155479209/jobs/5135142992

extracting C:/_/mingw-w64-rust/src/CLANGARM64/build/cache/2022-08-11/cargo-1.63.0-x86_64-pc-windows-gnu.tar.xz
Building rustbuild
error: Unable to update registry `crates-io`
Caused by:
  attempting to make an HTTP request, but --frozen was specified
failed to run: C:/_/mingw-w64-rust/src/CLANGARM64/build/x86_64-pc-windows-gnu/stage0/bin/cargo.exe build --manifest-path C:/_/mingw-w64-rust/src/rustc-1.64.0-src/src/bootstrap/Cargo.toml --frozen
Build completed unsuccessfully in 0:05:14

@mati865
Copy link
Collaborator Author

mati865 commented Sep 30, 2022

I have no idea why this error was thrown on AAarch64, it didn't do that for me when compiling x86_64 -> x86_64.

@jeremyd2019
Copy link
Member

jeremyd2019 commented Sep 30, 2022

https://github.com/msys2-arm/MINGW-packages/actions/runs/3160479614/jobs/5144981155

  error: linker `x86_64-w64-mingw32-gcc` not found
    |
    = note: program not found
  
  error: could not compile `winapi-x86_64-pc-windows-gnu` due to previous error
  warning: build failed, waiting for other jobs to finish...
  error: could not compile `proc-macro2` due to previous error
  error: could not compile `winapi` due to previous error
  failed to run: C:/_/mingw-w64-rust/src/CLANGARM64/build/x86_64-pc-windows-gnu/stage0/bin/cargo.exe build --manifest-path C:/_/mingw-w64-rust/src/rustc-1.64.0-src/src/bootstrap/Cargo.toml
  Build completed unsuccessfully in 0:13:27

Probably needs mingw-w64-clang-x86_64-gcc-compat, or more configuration options to override the 'gcc' defaults. Strangely, the exact command referenced as not being found is not listed in the configure output

  configure: target.x86_64-pc-windows-gnu.llvm-config := C:/msys64/clangarm64/bin/l ...
  configure: build.python         := C:/msys64/clangarm64/bin/python
  configure: target.x86_64-pc-windows-gnu.linker := clang
  configure: target.x86_64-pc-windows-gnu.cc := C:/msys64/clang64/bin/gcc.exe
  configure: target.x86_64-pc-windows-gnu.cxx := C:/msys64/clang64/bin/g++.exe
  configure: target.x86_64-pc-windows-gnu.linker := C:/msys64/clang64/bin/gcc.exe
  configure: target.x86_64-pc-windows-gnu.llvm-config := C:/msys64/clang64/bin/llvm ...
  configure: target.aarch64-pc-windows-gnullvm.cc := C:/msys64/clangarm64/bin/clang ...
  configure: target.aarch64-pc-windows-gnullvm.cxx := C:/msys64/clangarm64/bin/clan ...
  configure: target.aarch64-pc-windows-gnullvm.linker := C:/msys64/clangarm64/bin/c ...
  configure: target.aarch64-pc-windows-gnullvm.llvm-config := C:/msys64/clangarm64/ ...

@mati865
Copy link
Collaborator Author

mati865 commented Sep 30, 2022

error: linker x86_64-w64-mingw32-gcc not found that is unexpected

@jeremyd2019
Copy link
Member

https://github.com/msys2-arm/MINGW-packages/actions/runs/3161324190/jobs/5147172653

error: linking with `x86_64-w64-mingw32-gcc` failed: exit code: 1
    |
    = note: "x86_64-w64-mingw32-gcc" "-fno-use-linker-plugin" "-Wl,--dynamicbase" "-Wl,--disable-auto-image-base" "-m64" "-Wl,--high-entropy-va" "C:\\_\\mingw-w64-rust\\src\\CLANGARM64\\build\\x86_64-pc-windows-gnu\\stage0\\lib\\rustlib\\x86_64-pc-windows-gnu\\lib\\rsbegin.o" "C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\rustceS91ZH\\symbols.o" "C:/_/mingw-w64-rust/src/CLANGARM64/build/bootstrap\\debug\\build\\winapi-x86_64-pc-windows-gnu-bb63608d47ad563d\\build_script_build-bb63608d47ad563d.build_script_build.640f57f7-cgu.0.rcgu.o" "C:/_/mingw-w64-rust/src/CLANGARM64/build/bootstrap\\debug\\build\\winapi-x86_64-pc-windows-gnu-bb63608d47ad563d\\build_script_build-bb63608d47ad563d.build_script_build.640f57f7-cgu.1.rcgu.o" "C:/_/mingw-w64-rust/src/CLANGARM64/build/bootstrap\\debug\\build\\winapi-x86_64-pc-windows-gnu-bb63608d47ad563d\\build_script_build-bb63608d47ad563d.build_script_build.640f57f7-cgu.10.rcgu.o" "C:/_/mingw-w64-rust/src/CLANGARM64/build/bootstrap\\debug\\build\\winapi-x86_64-pc-windows-gnu-bb63608d...
    = note: x86_64-w64-mingw32-gcc: warning: argument unused during compilation: '-no-pie' [-Wunused-command-line-argument]
            lld: error: unable to find library -lgcc_eh
            lld: error: unable to find library -lgcc
            x86_64-w64-mingw32-gcc: error: linker command failed with exit code 1 (use -v to see invocation)

and more similar

@mati865
Copy link
Collaborator Author

mati865 commented Sep 30, 2022

No idea why it worked entirely differently for me on local x86_64...

@jeremyd2019
Copy link
Member

https://github.com/msys2-arm/MINGW-packages/actions/runs/3162085756/jobs/5148624572

Seems to be building this time. Will update with outcome.

@jeremyd2019
Copy link
Member

error: linking with `x86_64-w64-mingw32-gcc` failed: exit code: 1
  |
  = note: "x86_64-w64-mingw32-gcc" "-fno-use-linker-plugin" "-Wl,--dynamicbase" "-Wl,--disable-auto-image-base" "-m64" "-Wl,--high-entropy-va" "C:\\_\\mingw-w64-rust\\src\\CLANGARM64\\build\\x86_64-pc-windows-gnu\\stage0\\lib\\rustlib\\x86_64-pc-windows-gnu\\lib\\rsbegin.o" "C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\rustcOFIBls\\symbols.o" "C:/_/mingw-w64-rust/src/CLANGARM64/build/bootstrap\\debug\\deps\\llvm_config_wrapper-a98a2aff861e4907.13nrj6ndrng8aq64.rcgu.o" "C:/_/mingw-w64-rust/src/CLANGARM64/build/bootstrap\\debug\\deps\\llvm_config_wrapper-a98a2aff861e4907.15bz6oqmkaq6w41y.rcgu.o" "C:/_/mingw-w64-rust/src/CLANGARM64/build/bootstrap\\debug\\deps\\llvm_config_wrapper-a98a2aff861e4907.1amx6xg1wbr6zqgb.rcgu.o" "C:/_/mingw-w64-rust/src/CLANGARM64/build/bootstrap\\debug\\deps\\llvm_config_wrapper-a98a2aff861e4907.1at4f0gwa9r2ovrt.rcgu.o" "C:/_/mingw-w64-rust/src/CLANGARM64/build/bootstrap\\debug\\deps\\llvm_config_wrapper-a98a2aff861e4907.1bopjdy9zjfqypmy.rcgu.o" "C:/_/mingw-w64-rust/src/CLANGARM64/build...
  = note: C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:/msys64/clangarm64/lib/libadvapi32.a(ADVAPI32.dll): recognised but unhandled machine type (0xaa64) in Import Library Format archive
          C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:/msys64/clangarm64/lib/libadvapi32.a(ADVAPI32.dll): recognised but unhandled machine type (0xaa64) in Import Library Format archive
          C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:/msys64/clangarm64/lib/libadvapi32.a: error adding symbols: file format not recognized
          collect2.exe: error: ld returned 1 exit status
          
error: could not compile `bootstrap` due to previous error
warning: build failed, waiting for other jobs to finish...
error: linking with `x86_64-w64-mingw32-gcc` failed: exit code: 1
  |
  = note: "x86_64-w64-mingw32-gcc" "-fno-use-linker-plugin" "-Wl,--dynamicbase" "-Wl,--disable-auto-image-base" "-m64" "-Wl,--high-entropy-va" "C:\\_\\mingw-w64-rust\\src\\CLANGARM64\\build\\x86_64-pc-windows-gnu\\stage0\\lib\\rustlib\\x86_64-pc-windows-gnu\\lib\\rsbegin.o" "C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\rustcExjRuN\\symbols.o" "C:/_/mingw-w64-rust/src/CLANGARM64/build/bootstrap\\debug\\deps\\bootstrap-165737a4af08d09b.1b4b9qsugwnvsh0s.rcgu.o" "C:/_/mingw-w64-rust/src/CLANGARM64/build/bootstrap\\debug\\deps\\bootstrap-165737a4af08d09b.1bfh0jrtigbtntwm.rcgu.o" "C:/_/mingw-w64-rust/src/CLANGARM64/build/bootstrap\\debug\\deps\\bootstrap-165737a4af08d09b.1fjbaeyityxhi1w0.rcgu.o" "C:/_/mingw-w64-rust/src/CLANGARM64/build/bootstrap\\debug\\deps\\bootstrap-165737a4af08d09b.1n0ow6vqnpp8z7qa.rcgu.o" "C:/_/mingw-w64-rust/src/CLANGARM64/build/bootstrap\\debug\\deps\\bootstrap-165737a4af08d09b.1vptmfwxu9erfpvz.rcgu.o" "C:/_/mingw-w64-rust/src/CLANGARM64/build/bootstrap\\debug\\deps\\bootstrap-165737a4af08d09...
  = note: C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:/msys64/clangarm64/lib/liblzma.dll.a(liblzma-5.dll): recognised but unhandled machine type (0xaa64) in Import Library Format archive
          C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:/msys64/clangarm64/lib/liblzma.dll.a(liblzma-5.dll): recognised but unhandled machine type (0xaa64) in Import Library Format archive
          C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:/msys64/clangarm64/lib/liblzma.dll.a: error adding symbols: file format not recognized
          collect2.exe: error: ld returned 1 exit status
          
error: could not compile `bootstrap` due to previous error
error: linking with `x86_64-w64-mingw32-gcc` failed: exit code: 1
  |
  = note: "x86_64-w64-mingw32-gcc" "-fno-use-linker-plugin" "-Wl,--dynamicbase" "-Wl,--disable-auto-image-base" "-m64" "-Wl,--high-entropy-va" "C:\\_\\mingw-w64-rust\\src\\CLANGARM64\\build\\x86_64-pc-windows-gnu\\stage0\\lib\\rustlib\\x86_64-pc-windows-gnu\\lib\\rsbegin.o" "C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\rustcHOrwuf\\symbols.o" "C:/_/mingw-w64-rust/src/CLANGARM64/build/bootstrap\\debug\\deps\\rustdoc-72ce5fb9d9d19cab.10hhd78ak8t0iys6.rcgu.o" "C:/_/mingw-w64-rust/src/CLANGARM64/build/bootstrap\\debug\\deps\\rustdoc-72ce5fb9d9d19cab.10oobkfleezby6f7.rcgu.o" "C:/_/mingw-w64-rust/src/CLANGARM64/build/bootstrap\\debug\\deps\\rustdoc-72ce5fb9d9d19cab.11c6749op3n2wuba.rcgu.o" "C:/_/mingw-w64-rust/src/CLANGARM64/build/bootstrap\\debug\\deps\\rustdoc-72ce5fb9d9d19cab.12crv7ro3jfhgawl.rcgu.o" "C:/_/mingw-w64-rust/src/CLANGARM64/build/bootstrap\\debug\\deps\\rustdoc-72ce5fb9d9d19cab.17x7ovbmcrje1608.rcgu.o" "C:/_/mingw-w64-rust/src/CLANGARM64/build/bootstrap\\debug\\deps\\rustdoc-72ce5fb9d9d19cab.18ccwon4by...
  = note: C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:/msys64/clangarm64/lib/libadvapi32.a(ADVAPI32.dll): recognised but unhandled machine type (0xaa64) in Import Library Format archive
          C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:/msys64/clangarm64/lib/libadvapi32.a(ADVAPI32.dll): recognised but unhandled machine type (0xaa64) in Import Library Format archive
          C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:/msys64/clangarm64/lib/libadvapi32.a: error adding symbols: file format not recognized
          collect2.exe: error: ld returned 1 exit status
          
error: could not compile `bootstrap` due to previous error
error: linking with `x86_64-w64-mingw32-gcc` failed: exit code: 1
  |
  = note: "x86_64-w64-mingw32-gcc" "-fno-use-linker-plugin" "-Wl,--dynamicbase" "-Wl,--disable-auto-image-base" "-m64" "-Wl,--high-entropy-va" "C:\\_\\mingw-w64-rust\\src\\CLANGARM64\\build\\x86_64-pc-windows-gnu\\stage0\\lib\\rustlib\\x86_64-pc-windows-gnu\\lib\\rsbegin.o" "C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\rustcXn1cMO\\symbols.o" "C:/_/mingw-w64-rust/src/CLANGARM64/build/bootstrap\\debug\\deps\\rustc-96cfce7ed8dc2581.10598wrr5zohe0ol.rcgu.o" "C:/_/mingw-w64-rust/src/CLANGARM64/build/bootstrap\\debug\\deps\\rustc-96cfce7ed8dc2581.108o5tleea91wjsx.rcgu.o" "C:/_/mingw-w64-rust/src/CLANGARM64/build/bootstrap\\debug\\deps\\rustc-96cfce7ed8dc2581.15jmbbrl1hvqput3.rcgu.o" "C:/_/mingw-w64-rust/src/CLANGARM64/build/bootstrap\\debug\\deps\\rustc-96cfce7ed8dc2581.18sc53dswybd55i9.rcgu.o" "C:/_/mingw-w64-rust/src/CLANGARM64/build/bootstrap\\debug\\deps\\rustc-96cfce7ed8dc2581.196dl92tel16qp2c.rcgu.o" "C:/_/mingw-w64-rust/src/CLANGARM64/build/bootstrap\\debug\\deps\\rustc-96cfce7ed8dc2581.1afi0n05mw6zt4sv.rcgu....
  = note: C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:/msys64/clangarm64/lib/libuserenv.a(USERENV.dll): recognised but unhandled machine type (0xaa64) in Import Library Format archive
          C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:/msys64/clangarm64/lib/libuserenv.a(USERENV.dll): recognised but unhandled machine type (0xaa64) in Import Library Format archive
          C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:/msys64/clangarm64/lib/libuserenv.a: error adding symbols: file format not recognized
          collect2.exe: error: ld returned 1 exit status
          
error: could not compile `bootstrap` due to previous error
failed to run: C:/_/mingw-w64-rust/src/CLANGARM64/build/x86_64-pc-windows-gnu/stage0/bin/cargo.exe build --manifest-path C:/_/mingw-w64-rust/src/rustc-1.64.0-src/src/bootstrap/Cargo.toml
Build completed unsuccessfully in 0:18:27

@mati865
Copy link
Collaborator Author

mati865 commented Oct 1, 2022

This is not going to work. Thank you for bearing with me but I'll have to find different solution.

@mati865 mati865 closed this Oct 1, 2022
@mati865 mati865 deleted the rust-aarch64 branch October 1, 2022 09:55
@jeremyd2019
Copy link
Member

What does it need? A real cross-compiler, like llvm-mingw or maybe a hack like #8762 (but the target llvm-config.exe would still be arm64, so maybe that would not work from an x86_64 host)?

@mati865
Copy link
Collaborator Author

mati865 commented Oct 1, 2022

TBH I have no idea, somehow cross compiling under MSYS2 doesn't work at all.
My current idea is to provide prebuilt 1.63 AArch64 release from Linux and plug it into build system to do native compilation in MSYS2 but it will take some time.

@jeremyd2019
Copy link
Member

jeremyd2019 commented Oct 1, 2022

The wrapper hack works for cross-compiling arm64 python from x86_64 host, and I think I was able to use them to build something using cmake. I don't know anything about rust's build system though, that seems to be the main problem because it appears to need some sorting out between libraries for build and host/target.

@jeremyd2019
Copy link
Member

I screwed around with using my cross clang wrappers on x86_64 host, from clang64 MSYSTEM, and got further, but it did end up failing to run /clangarm64/bin/llvm-config.exe as expected. Will have to try on arm64 box.

@jeremyd2019
Copy link
Member

jeremyd2019 commented Oct 2, 2022

Hmm, it seems running in clangarm64 MSYSTEM makes it want to link arm64 libs into BUILD/x86_64 binaries... Will have to go from clang64 MSYSTEM

@jeremyd2019
Copy link
Member

The cross build finished. But, trying to use the resulting install to build itself with _bootstrapping=no is giving the same behavior as trying your stage0 binaries.

Building rustbuild
running: C:/msys64/clangarm64/bin/cargo.exe build --manifest-path C:/M/mingw-w64-rust/src/rustc-1.64.0-src/src/bootstrap/Cargo.toml --verbose

hanging there, with a rustc.exe using 1 core and not much ram (like 2MB)

@jeremyd2019
Copy link
Member

FWIW, packages uploaded to https://github.com/msys2-arm/MINGW-packages/releases/tag/rust-1.64.0-aarch64-1, that's tagged at the PKGBUILD used to make them.

The branch where I injected your stage0 binaries is master...jeremyd2019:ced44d965443b70f7f39901dde92c403f9e9bf95

@jeremyd2019
Copy link
Member

Uh oh, unwinding again!

(this is from the rustc from the package I built, not the rustc from your binaries. I can check that one too but I bet it's similar)

STACK_TEXT:  
000000da`9a2fb490 00007ff9`8cf46648     : 60000040`0040000f 00000000`00000000 00000000`000000b8 00000000`0000001e : ntdll!KiUserExceptionDispatch+0x4
000000da`9a2fb8e0 00007ff9`8cf46250     : 000000da`9a2fb920 00007ff9`8cf46250 0000001e`00000001 00000000`000000b8 : ntdll!RtlpUnwindRestoreRegisterRange+0x98
000000da`9a2fb920 00007ff9`8cf45e4c     : 000000da`9a2fba20 00007ff9`8cf45e4c 000000da`9a2fb901 00007ff9`52d0a314 : ntdll!RtlpUnwindFunctionFull+0x310
000000da`9a2fb9b0 00007ff9`8cf8f928     : 000000da`9a2fb9c8 00007ff9`00000000 00007ff9`c0000005 00007ff9`52be65e4 : ntdll!RtlpxVirtualUnwind+0xac
000000da`9a2fba70 00007ff9`8cf8f5dc     : 00000000`00000000 00000000`00000000 000000da`9a2fbab0 00000000`00000000 : ntdll!RtlDispatchException+0x2d0
000000da`9a2fc0b0 00007ff9`887486c4     : 00000000`00000000 00000000`00000000 000000da`9a2fc0d0 00007ff9`00000000 : ntdll!RtlRaiseException+0xbc
000000da`9a2fc590 00007ff9`52c41a18     : 00000000`20474343 00000000`00000000 00007ff9`887486c4 00007ff9`00000001 : KERNELBASE!RaiseException+0x54
000000da`9a2fc640 00007ff9`52be65e8     : 00000000`00000001 0000023d`47e3ec90 00007ff9`45e5a3e0 00007ff9`52be65e8 : std_948032f1431020cd!ZN75_$LT$unwind..libunwind.._Unwind_Reason_Code$u20$as$u20$core..fmt..Debug$GT$3fmt17he361d65e81690710E+0x678
000000da`9a2fc660 00007ff9`52be6580     : 000000da`9a2fc720 00007ff9`52c8a450 00000000`00000000 00000000`00000000 : std_948032f1431020cd!rust_panic+0x18
000000da`9a2fc720 00007ff9`52beb75c     : 00000000`00000001 00007ff9`52c8a478 000000da`9a2fc7b8 00007ff9`45ded5b9 : std_948032f1431020cd!ZN3std9panicking20rust_panic_with_hook17h0afd4ae1cef53990E+0x334
000000da`9a2fc750 00007ff9`42a1d8f8     : 00007ff9`42a1d8f8 00000000`02000000 00007ff9`428d47a0 000000da`9a2fd4f8 : std_948032f1431020cd!ZN3std5panic13resume_unwind17hae3d4c3c71e07186E+0x8
000000da`9a2fc760 00007ff9`428d47a0     : 00007ff9`428d47a0 000000da`9a2fd4f8 00000000`00000030 00007ff9`45ded5b9 : rustc_driver_3057b6c2e27bfeb3!RNvMs_NtCsbGYn629KwXg_10rustc_span11fatal_errorNtB4_10FatalError5raise+0x14
000000da`9a2fc770 00007ff9`4289e180     : 00000000`00000030 00007ff9`45ded5b9 00007ff9`4289e180 00007ff9`45ded5b9 : rustc_driver_3057b6c2e27bfeb3!RNvXs3_NtCskV79zNA7O1Q_12rustc_errors18diagnostic_builderzNtB5_17EmissionGuarantee43diagnostic_builder_emit_producing_guarantee+0x4c
000000da`9a2fc790 00007ff9`4287f284     : 00007ff9`4287f284 00007ff9`4287f1dc 000000da`9a0d2000 00000000`00000000 : rustc_driver_3057b6c2e27bfeb3!RNvXs3P_NtCshxXGK4Nly0m_13rustc_session6configNtB6_26ProcMacroExecutionStrategyNtNtCsboQ9t7Pc7Cp_4core3fmt5Debug3fmt+0x287c
000000da`9a2fc7a0 00007ff9`42899914     : 000000da`9a0d2000 00000000`00000000 00000000`00000001 00000000`00000000 : rustc_driver_3057b6c2e27bfeb3!RNvNtCshxXGK4Nly0m_13rustc_session7session11early_error+0xfc
000000da`9a2fcb10 00007ff9`41059778     : 00000000`00000000 000000da`9a2fdad8 00000000`00000001 00000000`00000000 : rustc_driver_3057b6c2e27bfeb3!RNvNtCshxXGK4Nly0m_13rustc_session6config21build_session_options+0x2d60
000000da`9a2fdd20 00007ff9`40fee264     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : rustc_driver_3057b6c2e27bfeb3!RNvMs_Cs6jodYiCyqpf_12rustc_driverNtB4_11RunCompiler3run+0xd8
000000da`9a2ffbd0 00007ff9`41060824     : 00007ff9`45a6a5c0 0000023d`47e2ba10 00000000`00000016 00000000`00000016 : rustc_driver_3057b6c2e27bfeb3!Ordinal0+0x1e264
000000da`9a2ffc90 00007ff7`b0651500     : 00007ff9`52bfcf18 00000000`00002d62 000000da`01ceb3f4 0000023d`47e3e9c0 : rustc_driver_3057b6c2e27bfeb3!RNvCs6jodYiCyqpf_12rustc_driver4main+0x108
000000da`9a2ffd10 00007ff7`b065153c     : 00007ff7`b065153c 00007ff7`b0651f50 00007ff7`b0651554 00007ff7`b0653098 : rustc+0x1500
000000da`9a2ffd20 00007ff7`b0651554     : 00007ff7`b0651554 00007ff7`b0653098 00007ff9`52bddbdc 00007ff9`52bddbd0 : rustc+0x153c
000000da`9a2ffd30 00007ff9`52bddbdc     : 00007ff9`52bddbdc 00007ff9`52bddbd0 000000da`9a2ffd30 00007ff7`b0655000 : rustc+0x1554
000000da`9a2ffd40 00007ff7`b065152c     : 000000da`9a2ffd30 00007ff7`b0655000 0000023d`47e0bab0 00000000`00000019 : std_948032f1431020cd!ZN3std2rt19lang_start_internal17h9c3af3afacb33910E+0x2c
000000da`9a2ffd70 00007ff7`b065146c     : 00007ff7`b065146c 00007ff7`b06514f4 00000000`00000000 00000000`00000000 : rustc+0x152c
000000da`9a2ffd80 00007ff7`b06514c8     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : rustc+0x146c
000000da`9a2ffe40 00007ff9`8a612020     : 00007ff9`8a612020 00000000`00000000 000000da`9a2ffe90 00007ff9`8cf82d8c : rustc+0x14c8
000000da`9a2ffe50 00007ff9`8cf82d8c     : 000000da`9a2ffe90 00007ff9`8cf82d8c 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0x30
000000da`9a2ffe90 00000000`00000000     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x3c


STACK_COMMAND:  ~0s; .ecxr ; kb

@mati865
Copy link
Collaborator Author

mati865 commented Oct 3, 2022

Oof, I'll inspect x86_64 later this week (maybe even tomorrow).
So it has triggered one of early_error in https://github.com/rust-lang/rust/blob/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/compiler/rustc_session/src/config.rs#L2202 and hanged during unwinding?

@jeremyd2019
Copy link
Member

similar for your rustc.exe

@jeremyd2019
Copy link
Member

I realize now I probably noticed this accidentally too... I wanted to see if your rustc.exe was completely broken or not, so I tried rustc --version, except I typo-ed --version - it printed an error about an unknown option, and then hung. Properly spelled --version worked correctly.

@mati865
Copy link
Collaborator Author

mati865 commented Oct 3, 2022

Some bad news: x86_64 binary works fine even with misspelled --verison.
Some mixed news: I see few places places in Rust that could use some patching...

@jeremyd2019
Copy link
Member

Oof, I'll inspect x86_64 later this week (maybe even tomorrow).

x86_64 try:
https://github.com/jeremyd2019/MINGW-packages/actions/runs/3179059407/jobs/5181180253
lld: error: unable to find library -lwindows. No hang though

@mstorsjo
Copy link
Contributor

mstorsjo commented Oct 4, 2022

If things are somewhat easy to test running without diving too deep into all of Rust, I can try to have a look at the unwinding issue. (Primarily, I could try to run the hanging executable in Wine on aarch64, where I can debug the unwinder and see what goes wrong. Ideally, if there's a linker map for the executable, or debug info/symbols/something, this can help pinpoint in which function the unwinding fails.)

While there might be unwinding issues on i686 due to mismatches in how the .eh_frame section is set up/registered for llvm libunwind vs libgcc, there shouldn't be any such issues on aarch64, where SEH is used just like on x86_64, with much less vendor specific weirdness.

@jeremyd2019
Copy link
Member

Well, you can reproduce the hang by downloading and extracting https://github.com/mati865/rust-gnullvm-builds/releases/download/1.63.0/rustc-1.63.0-dev-aarch64-pc-windows-gnullvm.tar.xz and running rustc-1.63.0-dev-aarch64-pc-windows-gnullvm/rustc/bin/rustc.exe --verison (misspelled --version, which works properly, it's the error case that hangs it).

You'd probably need @mati865 to get you any debug info though.

@mati865
Copy link
Collaborator Author

mati865 commented Oct 4, 2022

Some mixed news: I see few places places in Rust that could use some patching...

I have some followup, it starts to make sense and if I'm right exceptions will be broken the same way on x86_64 or (very unlikely) they will work on both. Need to do few cross compilations though so it'd be the best for you to wait until I come back with more results.

@mati865
Copy link
Collaborator Author

mati865 commented Oct 4, 2022

I have 2 AArch64 builds, when misspelling --version their x86_64 counterparts do:

  1. crash on unwinding
  2. work fine

Testing should be as simple as running rustc-1.63.0-dev-aarch64-pc-windows-gnullvm/rustc/bin/rustc.exe --versiion on Windows with either llvm-mingw or MSYS2 CLANGARM64 libc++.dll present in PATH.

https://1drv.ms/u/s!AgMYIlqTF8b9gusFwK2BgXWIFYcTMw?e=0Qewvt

@jeremyd2019
Copy link
Member

jeremyd2019 commented Oct 4, 2022

  1. --version works, --versiion shows error then hangs
0:000> kb
 # RetAddr               : Args to Child                                                           : Call Site
00 00007ffc`01646250     : 000000c5`9c2fc700 00007ffc`01646250 0000001e`00000001 00000000`000000b8 : ntdll!RtlpUnwindRestoreRegisterRange+0x98
01 00007ffc`01645e4c     : 000000c5`9c2fc800 00007ffc`01645e4c 000000c5`9c2fc701 00007ffb`a4661fd4 : ntdll!RtlpUnwindFunctionFull+0x310
02 00007ffc`0168f928     : 000000c5`9c2fc7a8 00007ffb`00000000 00007ffc`c0000005 00007ffb`a4536870 : ntdll!RtlpxVirtualUnwind+0xac
03 00007ffc`0168f5dc     : 00000000`00000000 00000000`00000000 000000c5`9c2fc890 00000000`00000000 : ntdll!RtlDispatchException+0x2d0
04 00007ffb`fcea86c4     : 00000000`00000000 00000000`00000000 000000c5`9c2fceb0 00000245`00000000 : ntdll!RtlRaiseException+0xbc
05 00007ffb`a4599e5c     : 00000000`20474343 00000000`00000000 00007ffb`fcea86c4 00007ffb`00000001 : KERNELBASE!RaiseException+0x54
06 00007ffb`a4536874     : 00000000`00000001 00000245`b6040ba0 00007ffb`a21bc448 00007ffb`a4536874 : std_ade86df438fc736e!Unwind_SetIP+0x5e8
07 00007ffb`a45367fc     : 000000c5`9c2fd510 00007ffb`a45e4758 00000000`00000000 000000c5`9c2fd660 : std_ade86df438fc736e!rust_panic+0x18
08 00007ffb`a4535080     : 00000000`00000001 00007ffb`a21bc448 00000000`00000001 00007ffb`a45e4780 : std_ade86df438fc736e!ZN3std9panicking20rust_panic_with_hook17hfea87a6c9e3d76edE+0x3a0
09 00007ffb`9f1176fc     : 00007ffb`9f1176fc 000000c5`9c2ffc40 00007ffb`9efc6b1c 00000000`00000000 : std_ade86df438fc736e!ZN3std5panic13resume_unwind17hfa4c22b9092ca5b9E+0x8
0a 00007ffb`9efc6b1c     : 00007ffb`9efc6b1c 00000000`00000000 00000000`0000001f 00000245`b603fd00 : rustc_driver_291edf5d8fb73b07!RNvMs_NtCseh4TkM661Vr_10rustc_span11fatal_errorNtB4_10FatalError5raise+0x14
0b 00007ffb`9ef6ddf8     : 00000000`0000001f 00000245`b603fd00 00007ffb`9ef6ddf8 00000245`b603fd00 : rustc_driver_291edf5d8fb73b07!RNvXs3_NtCsdn4N8Pq9oWL_12rustc_errors18diagnostic_builderzNtB5_17EmissionGuarantee43diagnostic_builder_emit_producing_guarantee+0x4c
0c 00007ffb`9ef925fc     : 00007ffb`9ef925fc 00007ffb`9ef92554 00007ffc`01955000 00000000`00000000 : rustc_driver_291edf5d8fb73b07!RNvXsD_NtNtCsgfAjdAv69wV_13rustc_session6config12dep_trackingINtNtCs85lM9HBhICr_4core6option6OptionNtNtB9_7options6LdImplENtB5_15DepTrackingHash4hash+0x2ff0
0d 00007ffb`9d8a52e4     : 00007ffc`01955000 00000000`00000000 00000000`00000001 00000000`00000000 : rustc_driver_291edf5d8fb73b07!RNvNtCsgfAjdAv69wV_13rustc_session7session11early_error+0xfc
0e 00007ffb`9d89f4c0     : 00007ffc`0195a2a0 00000000`00000001 00007ffc`018e9be0 00000000`00000000 : rustc_driver_291edf5d8fb73b07!RNvCslNE9xzJZX8c_12rustc_driver14handle_options+0xde8
0f 00007ffb`9d83c778     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : rustc_driver_291edf5d8fb73b07!RNvMs_CslNE9xzJZX8c_12rustc_driverNtB4_11RunCompiler3run+0x88
10 00007ffb`9d8a685c     : 000000c5`9c2ffac0 00000245`b6009cd0 00000000`00000004 00000000`00000002 : rustc_driver_291edf5d8fb73b07!Ordinal0+0x1c778
11 00007ff7`689d1544     : 00007ffb`a4550134 00000000`0001690b 000000c5`30b8001a 00000245`b603fe50 : rustc_driver_291edf5d8fb73b07!RNvCslNE9xzJZX8c_12rustc_driver4main+0xf8
12 00007ff7`689d14fc     : 00007ff7`689d14fc 00007ff7`689d1f50 00007ff7`689d1514 00007ff7`689d3098 : rustc+0x1544
13 00007ff7`689d1514     : 00007ff7`689d1514 00007ff7`689d3098 00007ffb`a452d42c 00007ffb`a452d420 : rustc+0x14fc
14 00007ffb`a452d42c     : 00007ffb`a452d42c 00007ffb`a452d420 000000c5`9c2ffb20 00007ff7`689d5000 : rustc+0x1514
15 00007ff7`689d1570     : 000000c5`9c2ffb20 00007ff7`689d5000 00000245`b607ec20 00000000`0000000b : std_ade86df438fc736e!ZN3std2rt19lang_start_internal17h85e0958b4c9d739fE+0x2c
16 00007ff7`689d146c     : 00007ff7`689d146c 00007ff7`689d1538 00000000`00000000 00000000`00000000 : rustc+0x1570
17 00007ff7`689d14c8     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : rustc+0x146c
18 00007ffb`feaa2020     : 00007ffb`feaa2020 00000000`00000000 000000c5`9c2ffc80 00007ffc`01682d8c : rustc+0x14c8
19 00007ffc`01682d8c     : 000000c5`9c2ffc80 00007ffc`01682d8c 00000000`00000000 00000000`00000000 : KERNEL32!BaseThreadInitThunk+0x30
1a 00000000`00000000     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x3c
  1. --version works, --versiion shows error then Segmentation fault
(lldb) bt
* thread #1, stop reason = Exception 0xc00000ff encountered at address 0x7ffc01739920
  * frame #0: 0x00007ffc01739920 ntdll.dll`RtlRaiseStatus + 32
    frame #1: 0x00007ffc01645820 ntdll.dll`RtlUnwindEx + 1136
    frame #2: 0x00007ffbc67c96cc std-ade86df438fc736e.dll`_$LT$unwind..libunwind.._Unwind_Reason_Code$u20$as$u20$core..fmt..Debug$GT$::fmt::h77b7431d80108aa8 + 1308
    frame #3: 0x00007ffc016227a4 ntdll.dll`__chkstk + 132
    frame #4: 0x00007ffc0168f858 ntdll.dll`RtlRaiseException + 824
    frame #5: 0x00007ffc01622674 ntdll.dll`KiUserExceptionDispatcher + 36
ExceptionAddress: 00007ffb9dec68c8 (rustc_driver_291edf5d8fb73b07!RNvCslNE9xzJZX8c_12rustc_driver4main+0x0000000000000164)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 0000000000000000
   Parameter[1]: 0000000000000018
Attempt to read from address 0000000000000018
STACK_TEXT:  
000000f2`ea4ffc50 00007ff6`10301544     : 00007ffb`c61b0134 00000000`000169b8 000000f2`1dc0c9c9 0000023b`fc83fc70 : rustc_driver_291edf5d8fb73b07!RNvCslNE9xzJZX8c_12rustc_driver4main+0x164
000000f2`ea4ffcd0 00007ff6`103014fc     : 00007ff6`103014fc 00007ff6`10301f50 00007ff6`10301514 00007ff6`10303098 : rustc+0x1544
000000f2`ea4ffce0 00007ff6`10301514     : 00007ff6`10301514 00007ff6`10303098 00007ffb`c618d42c 00007ffb`c618d420 : rustc+0x14fc
000000f2`ea4ffcf0 00007ffb`c618d42c     : 00007ffb`c618d42c 00007ffb`c618d420 000000f2`ea4ffcf0 00007ff6`10305000 : rustc+0x1514
000000f2`ea4ffd00 00007ff6`10301570     : 000000f2`ea4ffcf0 00007ff6`10305000 0000023b`fc87ec00 00000000`0000000b : std_ade86df438fc736e!ZN3std2rt19lang_start_internal17h85e0958b4c9d739fE+0x2c
000000f2`ea4ffd30 00007ff6`1030146c     : 00007ff6`1030146c 00007ff6`10301538 00000000`00000000 00000000`00000000 : rustc+0x1570
000000f2`ea4ffd40 00007ff6`103014c8     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : rustc+0x146c
000000f2`ea4ffe00 00007ffb`feaa2020     : 00007ffb`feaa2020 00000000`00000000 000000f2`ea4ffe50 00007ffc`01682d8c : rustc+0x14c8
000000f2`ea4ffe10 00007ffc`01682d8c     : 000000f2`ea4ffe50 00007ffc`01682d8c 00000000`00000000 00000000`00000000 : KERNEL32!BaseThreadInitThunk+0x30
000000f2`ea4ffe50 00000000`00000000     : 00000000`00000000 00000000`00000000 00000000`00000000 0

@mati865
Copy link
Collaborator Author

mati865 commented Oct 4, 2022

Cannot even tell if it's good or bad.
Uploaded build 2 as the last attempt at somewhat blind patching, if it doesn't work I'll try to find time to look into libunwind code.

@mstorsjo
Copy link
Contributor

mstorsjo commented Oct 4, 2022

Cannot even tell if it's good or bad.
Uploaded build 2 as the last attempt at somewhat blind patching, if it doesn't work I'll try to find time to look into libunwind code.

Can you describe what's different between the builds, to see how it correlates with the actual behaviour?

Unfortunately, I don't think the libunwind sources help much for these kinds of issues, since most of the actual unwinding is done by the RtlUnwindEx call - that's why Wine is a great resource for debugging such cases, when you can poke into the actual unwinder.

I can maybe try to have a look tomorrow.

@jeremyd2019
Copy link
Member

2 prints error then seg faults. windbg shows same as with 1

@mati865
Copy link
Collaborator Author

mati865 commented Oct 5, 2022

Can you describe what's different between the builds, to see how it correlates with the actual behaviour?

Basically Rust has to plug somehow into libgcc/libunwind to catch them, on x86_64 I think it's done by _GCC_specific_handler that maps SEH to GCC?
What I tried was basically to push various knobs (without knowing details on how this works) with build 0 trying to use Linux (was curious if it will replicate hang on x86_64) and next iterations were trying to mimic on AArch64 what already works with x86_64 (I think code-wise build 2 for AArch64 should call the same functions as on x86_64).

@mati865
Copy link
Collaborator Author

mati865 commented Oct 5, 2022

Uploaded build 3 that differs only in debuginfo and build 4 that has debuginfo and a small change that is unlikely to fix the problem.

Unfortunately I couldn't figure out anything inspecting LLVM repo it looks like x86_64 Rusts code linking calling libunwind should also work with AArch64.

@jeremyd2019
Copy link
Member

3 crashes same as 1 and 2, since you mentioned debug info I caught the crash in windbg and created a minidump, and opened that in lldb (running rustc in lldb resulted in a different, less helpful backtrace)

(lldb) target create --core "C:/rustc.dmp"
Core file 'C:\rustc.dmp' (aarch64) was loaded
(lldb) bt
* thread #1, stop reason = Exception 0xc0000005 encountered at address 0x7ffe3e2169e8
  * frame #0: 0x00007ffe3e2169e8 rustc_driver-929564227a0ca87e.dll`rustc_driver::main [inlined] <dyn core::any::Any>::is::<rustc_span::fatal_error::FatalErrorMarker> at any.rs:263:24
    frame #1: 0x00007ffe3e2169e8 rustc_driver-929564227a0ca87e.dll`rustc_driver::main [inlined] <dyn core::any::Any + core::marker::Send>::is::<rustc_span::fatal_error::FatalErrorMarker> at any.rs:418:9
    frame #2: 0x00007ffe3e2169e8 rustc_driver-929564227a0ca87e.dll`rustc_driver::main [inlined] rustc_driver::catch_fatal_errors::<rustc_driver::main::{closure#0}, core::result::Result<(), rustc_errors::ErrorGuaranteed>>::{closure#0} at lib.rs:1126:12
    frame #3: 0x00007ffe3e2169e8 rustc_driver-929564227a0ca87e.dll`rustc_driver::main [inlined] <core::result::Result<core::result::Result<(), rustc_errors::ErrorGuaranteed>, alloc::boxed::Box<dyn core::any::Any + core::marker::Send>>>::map_err::<rustc_errors::ErrorGuaranteed, rustc_driver::catch_fatal_errors<rustc_driver::main::{closure#0}, core::result::Result<(), rustc_errors::ErrorGuaranteed>>::{closure#0}> at result.rs:855:27
    frame #4: 0x00007ffe3e2169e8 rustc_driver-929564227a0ca87e.dll`rustc_driver::main [inlined] rustc_driver::catch_fatal_errors::<rustc_driver::main::{closure#0}, core::result::Result<(), rustc_errors::ErrorGuaranteed>> at lib.rs:1125:5
    frame #5: 0x00007ffe3e2169dc rustc_driver-929564227a0ca87e.dll`rustc_driver::main [inlined] rustc_driver::catch_with_exit_code::<rustc_driver::main::{closure#0}> at lib.rs:1137:18
    frame #6: 0x00007ffe3e2169dc rustc_driver-929564227a0ca87e.dll`rustc_driver::main at lib.rs:1316:21
    frame #7: 0x00007ff611d41534 rustc.exe`rustc_main::main at main.rs:62:5
    frame #8: 0x00007ff611d41570 rustc.exe`std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()> [inlined] <fn() as core::ops::function::FnOnce<()>>::call_once at function.rs:248:5
    frame #9: 0x00007ff611d4156c rustc.exe`std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()> at backtrace.rs:122:18
    frame #10: 0x00007ff611d41504 rustc.exe`std::rt::lang_start::<()>::{closure#0} at rt.rs:145:18
    frame #11: 0x00007ffe6bc1d558 std-ccff071d958fbfba.dll`std::rt::lang_start_internal::had6e73dda9e4cfbf [inlined] core::ops::function::impls::_$LT$impl$u20$core..ops..function..FnOnce$LT$A$GT$$u20$for$u20$$RF$F$GT$::call_once::h550e38530c433952(self=&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe) @ 0x0000020ce57ffa40, args=<unavailable>) at function.rs:280:13
    frame #12: 0x00007ffe6bc1d54c std-ccff071d958fbfba.dll`std::rt::lang_start_internal::had6e73dda9e4cfbf at panicking.rs:492:40
    frame #13: 0x00007ffe6bc1d54c std-ccff071d958fbfba.dll`std::rt::lang_start_internal::had6e73dda9e4cfbf at panicking.rs:456:19
    frame #14: 0x00007ffe6bc1d54c std-ccff071d958fbfba.dll`std::rt::lang_start_internal::had6e73dda9e4cfbf [inlined] std::panic::catch_unwind::h3a649e653df8901d(f=&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe) @ 0x0000020ce57ff920) at panic.rs:137:14
    frame #15: 0x00007ffe6bc1d54c std-ccff071d958fbfba.dll`std::rt::lang_start_internal::had6e73dda9e4cfbf [inlined] std::rt::lang_start_internal::_$u7b$$u7b$closure$u7d$$u7d$::he16ae33b413ef39e at rt.rs:128:48
    frame #16: 0x00007ffe6bc1d54c std-ccff071d958fbfba.dll`std::rt::lang_start_internal::had6e73dda9e4cfbf [inlined] std::panicking::try::do_call::h812c079faedc97e2(data=<unavailable>) at panicking.rs:492:40
    frame #17: 0x00007ffe6bc1d54c std-ccff071d958fbfba.dll`std::rt::lang_start_internal::had6e73dda9e4cfbf at panicking.rs:456:19
    frame #18: 0x00007ffe6bc1d54c std-ccff071d958fbfba.dll`std::rt::lang_start_internal::had6e73dda9e4cfbf [inlined] std::panic::catch_unwind::h0dc70af1a9b3ec48(f={closure_env#2} @ 0x0000020ce57ff920) at panic.rs:137:14
    frame #19: 0x00007ffe6bc1d54c std-ccff071d958fbfba.dll`std::rt::lang_start_internal::had6e73dda9e4cfbf(main=&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe) @ 0x0000020ce3f61dc0, argc=<unavailable>, argv=<unavailable>) at rt.rs:128:20
    frame #20: 0x00007ff611d41560 rustc.exe`main + 40
    frame #21: 0x00007ff611d4146c rustc.exe`__tmainCRTStartup at crtexe.c:329:15
    frame #22: 0x00007ff611d414c8 rustc.exe`mainCRTStartup at crtexe.c:206:9
    frame #23: 0x00007ffecf342020 kernel32.dll`BaseThreadInitThunk + 48
    frame #24: 0x00007ffed2a62d8c ntdll.dll`RtlUserThreadStart + 60

Will try 4 later

@jeremyd2019
Copy link
Member

4 is same as 3

@mstorsjo
Copy link
Contributor

mstorsjo commented Oct 6, 2022

Basically Rust has to plug somehow into libgcc/libunwind to catch them, on x86_64 I think it's done by _GCC_specific_handler that maps SEH to GCC? What I tried was basically to push various knobs (without knowing details on how this works) with build 0 trying to use Linux (was curious if it will replicate hang on x86_64) and next iterations were trying to mimic on AArch64 what already works with x86_64 (I think code-wise build 2 for AArch64 should call the same functions as on x86_64).

Ok - how does it plug into _GCC_specific_handler?

0 ends up hanging, unwinding without making any real progress. From WINEDEBUG=+seh wine .../rustc.exe --verison:

0170:trace:seh:RtlVirtualUnwind type 0 pc 80026874 sp 1218d0 func 8002685c
0170:trace:seh:unwind_full_data function 8002685c-80026950: len=0x3d ver=0 X=1 E=0 epilogs=0 codes=8
0170:trace:seh:RtlVirtualUnwind ret: lr=0 sp=121990 handler=000000018005CE54 
0170:trace:seh:call_handler calling handler 000000018005CE54 (rec=000000000011D608, frame=0x11d790 context=000000000011CD20, dispatch=000000000011D0B0)
0170:trace:seh:call_handler handler at 000000018005CE54 returned 3 
0170:trace:seh:RtlVirtualUnwind type 0 pc 80026874 sp 121990 func 8002685c
0170:trace:seh:unwind_full_data function 8002685c-80026950: len=0x3d ver=0 X=1 E=0 epilogs=0 codes=8
0170:trace:seh:RtlVirtualUnwind ret: lr=0 sp=121a50 handler=000000018005CE54 
0170:trace:seh:call_handler calling handler 000000018005CE54 (rec=000000000011D608, frame=0x11d790 context=000000000011CD20, dispatch=000000000011D0B0)
0170:trace:seh:call_handler handler at 000000018005CE54 returned 3 
0170:trace:seh:RtlVirtualUnwind type 0 pc 80026874 sp 121a50 func 8002685c

So here it gets to pc= 80026874 which is a valid function with SEH unwind info. After one unwind step, it has incremented sp by 0xc0 but otherwise is stuck in the same place. I guess this would eventually crash too, when sp runs out of allocated address space - but it just takes a long time before it does that.

1 does seem to fare better; it first executes KiUserExceptionDispatcher from the first RaiseException call, then it later gets to calling RtlUnwindEx (which gets called by _GCC_specific_handler) - 0 didn't get this far. It then does a bunch of consecutive RtlUnwindEx which looks typical for how the libunwind/libgcc/itanium style unwinding is implemented on top of SEH. Then it finally fails like this:

02bc:trace:seh:RtlUnwindEx found builtin frame 000000000011F6A8 handler 00000000
7BC53E5C
02bc:trace:seh:call_teb_unwind_handler calling TEB handler 000000007BC53E5C (rec
=000000000011FB98, frame=000000000011F6A8 context=000000000011E8D0, dispatch=000
000000011E768)
02bc:trace:seh:unwind_exception_handler detected collided unwind
02bc:trace:seh:call_teb_unwind_handler handler at 000000007BC53E5C returned 3
02bc:trace:seh:RtlVirtualUnwind type 0 pc df685c sp 11fd10 func df6764
02bc:trace:seh:unwind_full_data function df6764-df6964: len=0x80 ver=0 X=1 E=0 e
pilogs=0 codes=8
02bc:trace:seh:RtlVirtualUnwind ret: lr=40001544 sp=11fd90 handler=0000000005330
CC4
02bc:trace:seh:call_unwind_handler calling handler 0000000005330CC4 (rec=0000000
00011FB98, frame=0x11fd90 context=000000000011E8D0, dispatch=000000000011E768)
02bc:trace:seh:call_unwind_handler handler 0000000005330CC4 returned 1
02bc:trace:seh:RtlRestoreContext returning to df68bc stack 11fd10 
02bc:trace:seh:KiUserExceptionDispatcher code=c0000005 flags=0 addr=0000000000DF68C8 pc=df68c8 tid=02bc
02bc:trace:seh:KiUserExceptionDispatcher  info[0]=0000000000000000 
02bc:trace:seh:KiUserExceptionDispatcher  info[1]=0000000000000018 
02bc:warn:seh:KiUserExceptionDispatcher EXCEPTION_ACCESS_VIOLATION exception (code=c0000005) raised 

The tricky thing here is that when you get a crash during unwinding, this crash is converted into an NT exception which also gets passed via the same unwinding mechanism, even though you're in a place where unwinding seems to have broken down. So after this, the output has a lot more unwinding where it tries to handle the EXCEPTION_ACCESS_VIOLATION - but this is the spot we're interested in.

When looking at the big picture, further up in the log, we'd have this:

0330:trace:seh:RtlUnwindEx code=20474343 flags=2 end_frame=000000000011FD90 target_ip=0000000000DF697C pc=000000007bc526b0

So the target of the whole unwind is to unwind the stack up to 000000000011FD90. After this it will do a couple more consecutive RtlUnwindEx, every second one will be with this same target, and every second one will be to an intermediate point that needs to execute destructors or similar. At the end, it has reached the target frame (with frame pointer 11fd90, and it tries to resume execution there):

02bc:trace:seh:RtlRestoreContext returning to df68bc stack 11fd10 

(Here, 11fd10 is the stack pointer, where the frame pointer was 11fd90.) It starts executing at df68bc and crashed at df68c8 which is just a couple instructions later.

TODO for next steps: Disassemble df68bc to df68c8 and see what's causing the crash there, and what might have gone wrong along the unwind that causes it to crash.

@jeremyd2019
Copy link
Member

Gut feeling, it is trying to access an offset of 0x18 from a NULL object pointer. I was hoping the line numbers in the backtrace I was able to get with debug info would indicate which pointer is NULL (and then can reason about how it ended up NULL). Maybe the exception object got lost somehow.

@mstorsjo
Copy link
Contributor

mstorsjo commented Oct 6, 2022

Basically Rust has to plug somehow into libgcc/libunwind to catch them, on x86_64 I think it's done by _GCC_specific_handler that maps SEH to GCC? What I tried was basically to push various knobs (without knowing details on how this works) with build 0 trying to use Linux (was curious if it will replicate hang on x86_64) and next iterations were trying to mimic on AArch64 what already works with x86_64 (I think code-wise build 2 for AArch64 should call the same functions as on x86_64).

Ok - how does it plug into _GCC_specific_handler?

I presume that Rust on aarch64 in MSVC mode does work, so the most probable clue for what is broken would be this libunwind integration. Can you point towards the code where this is done for x86_64 today, and include what you have for aarch64 so far? In particular, what do the differences between binary 0 and 1 look like in the code?

@mati865
Copy link
Collaborator Author

mati865 commented Oct 6, 2022

Ok - how does it plug into _GCC_specific_handler?

Can you point towards the code where this is done for x86_64 today, and include what you have for aarch64 so far?

Here is C FFI declaration: https://github.com/rust-lang/rust/blob/91f128baf7704a477ab7c499143a160fb069b3ad/library/unwind/src/libunwind.rs#L272
Those opaque types seem fishy but it works on x86_64...

It's called in https://github.com/rust-lang/rust/blob/91f128baf7704a477ab7c499143a160fb069b3ad/library/std/src/personality/gcc.rs#L222 with rust_eh_personality_impl defined above as the personality.

and include what you have for aarch64 so far?

I have changed few cfgs (sort of #if in C) to make AArch64 follow the same path as x86_64 Windows target and also changed few cfgs that are enabled on Linux when linking to libunwind, IIUC they only pull the symbols to make sure linker pulls them in.
I can create commit with those changes on some branch if you want.

In particular, what do the differences between binary 0 and 1 look like in the code?

I have missed 1 cfg in build 0 so that was using the same path as Linux with libunwind (instead of default Linux with libgcc). With either build 1 or 2 (should have written it down...) AArch64 should use the same functions with the same arguments as x86_64.

This is interesting:

  * frame #0: 0x00007ffe3e2169e8 rustc_driver-929564227a0ca87e.dll`rustc_driver::main [inlined] <dyn core::any::Any>::is::<rustc_span::fatal_error::FatalErrorMarker> at any.rs:263:24
    frame #1: 0x00007ffe3e2169e8 rustc_driver-929564227a0ca87e.dll`rustc_driver::main [inlined] <dyn core::any::Any + core::marker::Send>::is::<rustc_span::fatal_error::FatalErrorMarker> at any.rs:418:9
    frame #2: 0x00007ffe3e2169e8 rustc_driver-929564227a0ca87e.dll`rustc_driver::main [inlined] rustc_driver::catch_fatal_errors::<rustc_driver::main::{closure#0}, core::result::Result<(), rustc_errors::ErrorGuaranteed>>::{closure#0} at lib.rs:1126:12

I don't really have a clue what happened here but it appears that unwinding caught the exception but something is corrupted? Relevant code: https://github.com/rust-lang/rust/blob/e1d7dec558d863fb76f98453088b36cb1a926d48/compiler/rustc_driver/src/lib.rs#L1160

I presume that Rust on aarch64 in MSVC mode does work, so the most probable clue for what is broken would be this libunwind integration.

Haven't checked myself but since it was added quite long time ago by Microsoft folks IIRC I guess it's fine.
MSVC SEH is special cased all over the place though (including codegen).

@mstorsjo
Copy link
Contributor

mstorsjo commented Oct 7, 2022

Ok - how does it plug into _GCC_specific_handler?

Can you point towards the code where this is done for x86_64 today, and include what you have for aarch64 so far?

Here is C FFI declaration: https://github.com/rust-lang/rust/blob/91f128baf7704a477ab7c499143a160fb069b3ad/library/unwind/src/libunwind.rs#L272 Those opaque types seem fishy but it works on x86_64...

It's called in https://github.com/rust-lang/rust/blob/91f128baf7704a477ab7c499143a160fb069b3ad/library/std/src/personality/gcc.rs#L222 with rust_eh_personality_impl defined above as the personality.

and include what you have for aarch64 so far?

I have changed few cfgs (sort of #if in C) to make AArch64 follow the same path as x86_64 Windows target and also changed few cfgs that are enabled on Linux when linking to libunwind, IIUC they only pull the symbols to make sure linker pulls them in. I can create commit with those changes on some branch if you want.

Thanks for these pointers - that was useful to look at. This looks very much similar to how it is hooked up in libcxxabi and similar.

TODO for next steps: Disassemble df68bc to df68c8 and see what's causing the crash there, and what might have gone wrong along the unwind that causes it to crash.

When the unwinding finishes and it resumes execution, it picks up here:

1800869dc: 9514e942     bl      0x1845c0ee4 <_ZN3std9panicking3try7cleanup17h636ba3a1c03ba0ccE>
1800869e0: aa0003f5     mov     x21, x0
1800869e4: aa0103f6     mov     x22, x1
1800869e8: f9400c28     ldr     x8, [x1, #24]

First it calls _ZN3std9panicking3try7cleanup17h636ba3a1c03ba0ccE and then it reads from [x1, #24]. Presumably, x0 and x1 are return values set up by that function. Looking at _ZN3std9panicking3try7cleanup17h636ba3a1c03ba0ccE, it does this:

0000000180026358 <_ZN3std9panicking3try7cleanup17h636ba3a1c03ba0ccE>:
180026358: d100c3ff     sub     sp, sp, #48
18002635c: a90153f3     stp     x19, x20, [sp, #16]
180026360: f90013fe     str     x30, [sp, #32]
180026364: 9400dc0f     bl      0x18005d3a0 <__rust_panic_cleanup> 
180026368: f0000568     adrp    x8, 0x1800d5000 <_ZN93_$LT$std..panicking..begin_panic_handler..StrPanicPayload$u20$as$u20$core..panic..BoxMeUp$GT$3get17hda0859bd6e556b82E>
18002636c: aa0003f3     mov     x19, x0
180026370: aa0103f4     mov     x20, x1
[...]
1800263a0: aa1303e0     mov     x0, x19
1800263a4: aa1403e1     mov     x1, x20
[...]
1800263bc: d65f03c0     ret 

So here, x0/x1 are forwarded as return values from __rust_panic_cleanup. That function does this:

000000018005d3a0 <__rust_panic_cleanup>:
18005d3a0: a9be53f3     stp     x19, x20, [sp, #-32]!
18005d3a4: f9000bfe     str     x30, [sp, #16]
18005d3a8: d28a6a89     mov     x9, #21332
18005d3ac: f9400008     ldr     x8, [x0]
18005d3b0: f2aa4aa9     movk    x9, #21077, lsl #16
18005d3b4: f2cb4009     movk    x9, #23040, lsl #32
18005d3b8: f2e9a9e9     movk    x9, #19791, lsl #48
18005d3bc: eb09011f     cmp     x8, x9
18005d3c0: 54000141     b.ne    0x18005d3e8 <__rust_panic_cleanup+0x48>
18005d3c4: a9425013     ldp     x19, x20, [x0, #32]                     <---- Important
18005d3c8: 52800601     mov     w1, #48
18005d3cc: 52800102     mov     w2, #8
18005d3d0: 97ffff19     bl      0x18005d034 <__rust_dealloc>
18005d3d4: aa1303e0     mov     x0, x19                   <---- Also important
18005d3d8: aa1403e1     mov     x1, x20
18005d3dc: f9400bfe     ldr     x30, [sp, #16]
18005d3e0: a8c253f3     ldp     x19, x20, [sp], #32
18005d3e4: d65f03c0     ret
18005d3e8: 9400b162     bl      0x180089970 <_Unwind_DeleteException>
18005d3ec: 97ff21ba     bl      0x180025ad4 <__rust_foreign_exception>
18005d3f0: d4200020     brk     #0x1

In short, this function reads out two values from [x0, #32] into x19 and x20, and then moves them over to x0/x1 to be returned there, and then passed on to the caller and used there later.

At this point, I cloned the rust source and sat down to look at this in the form of the rust source. In https://github.com/rust-lang/rust/blob/91f128baf7704a477ab7c499143a160fb069b3ad/library/panic_unwind/src/lib.rs#L95, __rust_panic_cleanup calls something which seems to be this: https://github.com/rust-lang/rust/blob/91f128baf7704a477ab7c499143a160fb069b3ad/library/panic_unwind/src/gcc.rs#L73-L82 - this implementation looks very much like this disassembly. And at https://github.com/rust-lang/rust/blob/91f128baf7704a477ab7c499143a160fb069b3ad/library/panic_unwind/src/gcc.rs#L79-L80, it casts a generic _Unwind_Exception into a rust-specific Exception and looks at the cause field. The rust specific Exception, https://github.com/rust-lang/rust/blob/91f128baf7704a477ab7c499143a160fb069b3ad/library/panic_unwind/src/gcc.rs#L45-L48, is an _Unwind_Exception with some extra data tacked on at the end.

However, the two pieces of data read from [x0, #32] which are set into x0 and x1, and it later tries to read from [x1, #24], and that fails since x1 is null. Why is x1 null?

At this point, I instrumented Wine's unwinder to let it look at the pieces of data I want to, before it returns the control over to the rust code to do the cleanup.

When Wine has finished unwinding, the pointer to the _Unwind_Exception is what is passed in as x0. If I dump the contents of [x0, #32] and [x0, #40] at that point, the latter is null - that's what ends up as x1 in the end, which is getting derefereced.

Now something might clobber these fields during the unwind, but I instrumented Wine at the very start of the unwind too, when rust/libunwind hands control over to the windows unwinder (via _Unwind_RaiseException which calls Windows RaiseException which is visible as KiUserExceptionDispatcher in the wine log). There, I can see the same pointer to _Unwind_Exception there too, and by instrumenting Wine in this spot, I can see that [x0, #32] and [x0, #40] both are null at this point. But that's where the rust specific extra fields at the end of _Unwind_Exception are supposed to be.

At the start _Unwind_RaiseException, it clears the private_ fields of _Unwind_Exception: https://github.com/llvm/llvm-project/blob/llvmorg-15.0.0/libunwind/src/Unwind-seh.cpp#L341
However, the _Unwind_Exception private_ fields cover bytes 16-64 of the struct: https://github.com/llvm/llvm-project/blob/llvmorg-15.0.0/libunwind/include/unwind_itanium.h#L21-L27

So whatever rust specific private data that rust set in _Unwind_Exception + 40 gets wiped out by libunwind. Note how the SEH definition of _Unwind_Exception has got 6 entries in private_. Then looking at Rust's definition of the struct: https://github.com/rust-lang/rust/blob/91f128baf7704a477ab7c499143a160fb069b3ad/library/unwind/src/libunwind.rs#L27-L77

For x86_64, it has got unwinder_private_data_size: usize = 6, while for aarch64 it is unwinder_private_data_size: usize = 2. So that's the bug. Whenever the libgcc/libunwind/itanium style unwinding uses SEH, unwinder_private_data_size must be set to >= 6. (It is benign to overallocate this struct for usecases that don't use SEH, it just allocates a few more unused bytes in the middle.) This matches with how this value was increased from 2 to 6 for x86_64 when unwinding was implemented on Windows: rust-lang/rust@5a24ee8#diff-698c629995c6d73852b1f89091513361bf9cd26d6d3e54e4937a2fcb66075e37

Thus: TL/DR: Bump unwinder_private_data_size from 2 to 6 for aarch64, and I believe this particular issue should be fixed.

@mati865
Copy link
Collaborator Author

mati865 commented Oct 7, 2022

So that is the answer for:

I don't really have a clue what happened here but it appears that unwinding caught the exception but something is corrupted?

Great job and what a fantastic writeup!
I think you could turn it into blog like post if you are into it 😃

I have uploaded build 5 to the same directory (direct link here).
If it works I'll upload it to GH and return working on building this in MSYS2.

@mstorsjo
Copy link
Contributor

mstorsjo commented Oct 7, 2022

So that is the answer for:

I don't really have a clue what happened here but it appears that unwinding caught the exception but something is corrupted?

Great job and what a fantastic writeup! I think you could turn it into blog like post if you are into it 😃

I don't really blog, I just write long replies on github :P But thanks - it's nice to hear that it's not all wasted words :-)

I have uploaded build 5 to the same directory (direct link here). If it works I'll upload it to GH and return working on building this in MSYS2.

This version does indeed seem to run correctly - rustc.exe --version runs correctly and prints the version as it did before, and rustc.exe --verison prints the error nicely and exits with error code 1. And when running it in wine with WINEDEBUG=+seh, it looks like the unwinding goes well and it manages to continue as it should. Good stuff!

@mati865
Copy link
Collaborator Author

mati865 commented Oct 10, 2022

Superseded by #13513

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants