Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[darwin aarch64 cgo] regression in 0.8.0-1366-gfc302f00a #10299

Closed
motiejus opened this issue Dec 8, 2021 · 16 comments · Fixed by #10301
Closed

[darwin aarch64 cgo] regression in 0.8.0-1366-gfc302f00a #10299

motiejus opened this issue Dec 8, 2021 · 16 comments · Fixed by #10301
Labels
arch-aarch64 64-bit ARM bug Observed behavior contradicts documented or intended behavior os-macos regression It worked in a previous version of Zig, but stopped working.
Milestone

Comments

@motiejus
Copy link
Contributor

motiejus commented Dec 8, 2021

Zig Version

0.8.0-1921-g83a668195

Steps to Reproduce

Identical to #10297, but substitute:

  • x86_64 with aarch64.
  • amd64 with arm64.

Step-by-step:

hello.go

package main

// #include <stdio.h>
// void helloworld() { printf("hello, world\n"); }
import "C"

func main() {
	C.helloworld()
}

~/test

#!/bin/bash
set -xe

pushd $HOME/dev/zig/build
cmake ..
make -j$(nproc)
popd
GOOS=darwin GOARCH=arm64 CC=$HOME/zcc CGO_ENABLED=1 go build -a -buildmode=pie -ldflags "-s -w" $HOME/hello.go

~/zcc

user@motiejus:~/dev/zig$ cat ~/zcc 
#!/bin/bash
zig=$HOME/dev/zig/build/zig
exec "$zig" cc -target aarch64-macos-gnu "$@"

Bisect Result

fc302f0 is the first bad commit, which doesn't revert cleanly.

$ git bisect good dde0adcb363f3a3f306c0fc9eaec511cc3b74965
$ git bisect bad master
$ git bisect run ~/test
<...>
fc302f00a9de5de0490f4a66720e75946763c695 is the first bad commit
commit fc302f00a9de5de0490f4a66720e75946763c695
Author: Jakub Konka <kubkon@jakubkonka.com>
Date:   Sun Oct 10 10:33:15 2021 +0200

    macho: redo relocation handling and lazy bind globals

    * apply late symbol resolution for globals - instead of resolving
      the exact location of a symbol in locals, globals or undefs,
      we postpone the exact resolution until we have a full picture
      for relocation resolution.
    * fixup stubs to defined symbols - this is currently a hack rather
      than a final solution. I'll need to work out the details to make
      it more approachable. Currently, we preemptively create a stub
      for a lazy bound global and fix up stub offsets in stub helper
      routine if the global turns out to be undefined only. This is quite
      wasteful in terms of space as we create stub, stub helper and lazy ptr
      atoms but don't use them for defined globals.
    * change log scope to .link for macho.
    * remove redundant code paths from Object and Atom.
    * drastically simplify the contents of Relocation struct (i.e., it is
      now a simple superset of macho.relocation_info), clean up relocation
      parsing and resolution logic.

 src/codegen.zig            |   45 +-
 src/link/MachO.zig         |  478 +++++++-------
 src/link/MachO/Archive.zig |    2 +-
 src/link/MachO/Atom.zig    | 1538 +++++++++++++++++---------------------------
 src/link/MachO/Dylib.zig   |    2 +-
 src/link/MachO/Object.zig  |   11 +-
 6 files changed, 857 insertions(+), 1219 deletions(-)
bisect run success

Expected Behavior

~/test should produce a working darwin aarch64 executable.

Actual Behavior

Running ~/test on master fails to produce an executable:

$ GOOS=darwin GOARCH=arm64 CC=$HOME/zcc CGO_ENABLED=1 go build -a -buildmode=pie -ldflags "-s -w" hello.go
# runtime/cgo
warning(link): framework not found for '-framework CoreFoundation'
warning(link): Framework search paths:
# runtime/cgo
/tmp/go-build2151189485/b003/_cgo_import.go:2:3: usage: //go:cgo_import_dynamic local [remote ["library"]]
/tmp/go-build2151189485/b003/_cgo_import.go:4:3: usage: //go:cgo_import_dynamic local [remote ["library"]]
/tmp/go-build2151189485/b003/_cgo_import.go:6:3: usage: //go:cgo_import_dynamic local [remote ["library"]]

cc @kubkon

@motiejus motiejus added the bug Observed behavior contradicts documented or intended behavior label Dec 8, 2021
@andrewrk andrewrk added this to the 0.9.0 milestone Dec 8, 2021
@andrewrk andrewrk added the regression It worked in a previous version of Zig, but stopped working. label Dec 8, 2021
@motiejus motiejus changed the title [darwin aarch64] regression in 0.8.0-1366-gfc302f00a [darwin aarch64 cgo] regression in 0.8.0-1366-gfc302f00a Dec 8, 2021
@kubkon
Copy link
Member

kubkon commented Dec 8, 2021

@motiejus thanks! Tracking this down took a little longer than expected, but got the fix in #10301.

@kubkon
Copy link
Member

kubkon commented Dec 8, 2021

Oh and FYI, I'm not sure if you noticed but that linker is unable to find CoreFoundation framework. This is an intended change in the way zig automatically detects an SDK if available. Currently, when explicitly specifying the compilation target, here aarch64-macos-gnu, means zig enters cross-compilation mode and it is the user's responsibility to specify the path to the sysroot via zig cc --sysroot <path>. However, if you leave out -target aarch64-macos-gnu and build natively on an arm64 macOS, zig will autodetect the SDK (if available).

@motiejus
Copy link
Contributor Author

motiejus commented Dec 8, 2021

Oh and FYI, I'm not sure if you noticed but that linker is unable to find CoreFoundation framework. This is an intended change in the way zig automatically detects an SDK if available. Currently, when explicitly specifying the compilation target, here aarch64-macos-gnu, means zig enters cross-compilation mode and it is the user's responsibility to specify the path to the sysroot via zig cc --sysroot <path>. However, if you leave out -target aarch64-macos-gnu and build natively on an arm64 macOS, zig will autodetect the SDK (if available).

I did notice the warning, but it cross-compiled the executable successfully (i.e. produced a binary that file thinks is an arm64 macho executable, weighing about 1MB) without the sysroot. I don't have an M1 to test this with (this should come in 2022), so I assumed it "would work".

A few follow-up questions:

  • What is the status of the executable if it's not compiled with a --sysroot? Would it even run?
  • How do we acquire a sysroot? Is there "a tarball somewhere at apple.com", or should I copy from an existing host? Any pointers to legal implications, if you know?
  • [curiosity] what is the sysroot used for during the compilation phase?

@kubkon
Copy link
Member

kubkon commented Dec 8, 2021

Oh and FYI, I'm not sure if you noticed but that linker is unable to find CoreFoundation framework. This is an intended change in the way zig automatically detects an SDK if available. Currently, when explicitly specifying the compilation target, here aarch64-macos-gnu, means zig enters cross-compilation mode and it is the user's responsibility to specify the path to the sysroot via zig cc --sysroot <path>. However, if you leave out -target aarch64-macos-gnu and build natively on an arm64 macOS, zig will autodetect the SDK (if available).

I did notice the warning, but it cross-compiled the executable successfully (i.e. produced a binary that file thinks is an arm64 macho executable, weighing about 1MB) without the sysroot. I don't have an M1 to test this with (this should come in 2022), so I assumed it "would work".

A few follow-up questions:

* What is the status of the executable if it's not compiled with a `--sysroot`? Would it even run?

This depends. By default, on macOS the linker is supposed to always link every shared object from the linker line. While it may not be strictly necessary since for instance you don't resolve any symbols in the said shared object, some runtime mechanism might depend on it, for instance you could use ObjC runtime to send a message as a string to poll the runtime address of the loaded Foundation framework. In this case, if you didn't link any symbols from Foundation and your linker stripped this dylib from the binary (treating it as unused), ObjC will return address 0x0 of Foundation since it wasn't loaded at all.

This situation reminds of this in that while this particular Hello World example didn't require the CoreFoundation framework, some inner Go mechanism might which will manifest is a really nasty and difficult to track runtime exception. The question now is: should zig warn the user or perhaps throw an error instead? Initially I assumed the former is enough, but if zig is used as a drop-in C/C++ compiler replacement and is used inside another build tool (like go build) then perhaps an error would be a better choice?

More on this topic: #10192

* How do we acquire a sysroot? Is there "a tarball somewhere at apple.com", or should I copy from an existing host? Any pointers to legal implications, if you know?

That's an interesting one, and I don't really have an answer to that. Some folks have packaged a sysroot for a version of macOS already, for instance https://github.com/hexops/sdk-macos-11.3. Legal implications - I have no clue, however, I'd recommend packaging only tbd files which are text-based descriptors for actual shared libraries (in fact, there might not even be any other option these days).

* [curiosity] what is the sysroot used for during the compilation phase?

We use the sysroot in a few places mainly when building natively (we then pull the libc headers from there together with framework stubs), but it's also required when linking in frameworks on macOS (or targeting macOS). When linking in dylibs we do a recursive linking of any dylib that the original dylib re-exported, and to resolve that we often require a sysroot to prepend to the path. Otherwise the resolution might fail.

@motiejus
Copy link
Contributor Author

motiejus commented Dec 9, 2021

  • What is the status of the executable if it's not compiled with a --sysroot? Would it even run?

This depends. By default, on macOS the linker is supposed to always link every shared object from the linker line. While it may not be strictly necessary since for instance you don't resolve any symbols in the said shared object, some runtime mechanism might depend on it, for instance you could use ObjC runtime to send a message as a string to poll the runtime address of the loaded Foundation framework. In this case, if you didn't link any symbols from Foundation and your linker stripped this dylib from the binary (treating it as unused), ObjC will return address 0x0 of Foundation since it wasn't loaded at all.

This situation reminds of this in that while this particular Hello World example didn't require the CoreFoundation framework, some inner Go mechanism might which will manifest is a really nasty and difficult to track runtime exception.

Let me try to re-phrase to see if I understood this correctly: Foundation needs to be linked in a Mach-O binary; but, if SDK is not specified, zld will set it's address to 0x0, which is problematic as described in #9542. And there is no way to detect whether that Foundation will ever be resolved (and thus cause a mysterious crash).

The question now is: should zig warn the user or perhaps throw an error instead? Initially I assumed the former is enough, but if zig is used as a drop-in C/C++ compiler replacement and is used inside another build tool (like go build) then perhaps an error would be a better choice?

If my re-phrasing above is correct, then I definitely agree we should be more vocal:

  1. Always require sysroot when compiling for darwin (or darwin/aarch64? see below). If sysroot is not specified, bail.
  2. Add an extra --darwin-no-foundation, which keeps the current behavior: i.e. link Foundation to 0x0. The dangerous/possibly incomplete binary should be opt-in.

In our case, zig cc is even deeper: bazel -> rules_go -> go build -> $CXX -> bazel-zig-cc -> zig cc. The user normally interacts with the first two only; the rest are very much behind the scenes. Failing early is much more preferred if we are doing something wrong.

* How do we acquire a sysroot? Is there "a tarball somewhere at apple.com", or should I copy from an existing host? Any pointers to legal implications, if you know?

That's an interesting one, and I don't really have an answer to that. Some folks have packaged a sysroot for a version of macOS already, for instance https://github.com/hexops/sdk-macos-11.3. Legal implications - I have no clue, however, I'd recommend packaging only tbd files which are text-based descriptors for actual shared libraries (in fact, there might not even be any other option these days).

To be clear: is this required for x86_64 too, not only aarch64? I am clarifying, because compiling x86_64 does not produce warning(link): framework not found for '-framework CoreFoundation', only aarch64 does.

Only the *.tlbs weigh ~1.3MB, which is good. I will figure out the legal/licensing part.

* [curiosity] what is the sysroot used for during the compilation phase?

We use the sysroot in a few places mainly when building natively (we then pull the libc headers from there together with framework stubs)

For the record, our use case (bazel-zig-cc) is always "cross-compilation" mode, that's how we get hermetic builds. So our builds don't have to rely on anything in the system (and, by extension, produced binaries are the same, regardless where they are compiled).

@uhthomas
Copy link

uhthomas commented Dec 9, 2021

Hey @motiejus it looks like I'm seeing some similar messages for macOS x86-64 -> macOS x86-64. I would imagine it's not just an aarch64 thing in this case?

#10158 (comment)

@motiejus
Copy link
Contributor Author

motiejus commented Dec 9, 2021

Hey @motiejus it looks like I'm seeing some similar messages for macOS x86-64 -> macOS x86-64. I would imagine it's not just an aarch64 thing in this case?

#10158 (comment)

Yes, sounds about right. I guess the SDK is necessary for darwin-any then.

@kubkon
Copy link
Member

kubkon commented Dec 9, 2021

@uhthomas so here's the rule of thumb to remember and follow:

  1. if you simply invoke zig cc (no -target flag) then the compiler assumes native to native and if you happen to be on macOS, zig will try autodetecting the SDK (if it's required, e.g., when linking frameworks)
  2. if you set the compiler for cross-compilation like zig cc -target x86_64-macos, then you will need to pass the sysroot yourself as part of the build command; e.g., zig cc -target x86_64-macos --sysroot <path>

@kubkon
Copy link
Member

kubkon commented Dec 9, 2021

  • What is the status of the executable if it's not compiled with a --sysroot? Would it even run?

This depends. By default, on macOS the linker is supposed to always link every shared object from the linker line. While it may not be strictly necessary since for instance you don't resolve any symbols in the said shared object, some runtime mechanism might depend on it, for instance you could use ObjC runtime to send a message as a string to poll the runtime address of the loaded Foundation framework. In this case, if you didn't link any symbols from Foundation and your linker stripped this dylib from the binary (treating it as unused), ObjC will return address 0x0 of Foundation since it wasn't loaded at all.
This situation reminds of this in that while this particular Hello World example didn't require the CoreFoundation framework, some inner Go mechanism might which will manifest is a really nasty and difficult to track runtime exception.

Let me try to re-phrase to see if I understood this correctly: Foundation needs to be linked in a Mach-O binary; but, if SDK is not specified, zld will set it's address to 0x0, which is problematic as described in #9542. And there is no way to detect whether that Foundation will ever be resolved (and thus cause a mysterious crash).

Not quite. When you specify -framework Foundation on the linker line, regardless if you actually import any symbol from it, it will always get a load command in the final MachO binary. If we don't do that, then the loader (dyld) will not load Framework at runtime meaning code like this:

#include <assert.h>
#include <objc/runtime.h>

int main() {
  assert(objc_getClass("NSObject") > 0);
  assert(objc_getClass("NSApplication") > 0);
}

If you don't link Foundation, then the loader will not load it, and the second assert will fail. Note that we use plain strings for class names meaning we actually don't try and import NSApplication symbol.

The question now is: should zig warn the user or perhaps throw an error instead? Initially I assumed the former is enough, but if zig is used as a drop-in C/C++ compiler replacement and is used inside another build tool (like go build) then perhaps an error would be a better choice?

If my re-phrasing above is correct, then I definitely agree we should be more vocal:

1. Always require sysroot when compiling for darwin (or darwin/aarch64? see below). If sysroot is not specified, bail.

But that's the thing. It's not always required. For example, if you don't link in any frameworks, Zig's got you covered as it includes libSystem.tbd stub and libc headers.

2. Add an extra `--darwin-no-foundation`, which keeps the current behavior: i.e. link Foundation to `0x0`. The dangerous/possibly incomplete binary should be opt-in.

In our case, zig cc is even deeper: bazel -> rules_go -> go build -> $CXX -> bazel-zig-cc -> zig cc. The user normally interacts with the first two only; the rest are very much behind the scenes. Failing early is much more preferred if we are doing something wrong.

In that case, we might make it a hard error if you want to link a framework and zig cannot find it. I'll need to check what Apple's ld64 does in this case, but I think it's fair to just hard error - @andrewrk thoughts?

* How do we acquire a sysroot? Is there "a tarball somewhere at apple.com", or should I copy from an existing host? Any pointers to legal implications, if you know?

That's an interesting one, and I don't really have an answer to that. Some folks have packaged a sysroot for a version of macOS already, for instance https://github.com/hexops/sdk-macos-11.3. Legal implications - I have no clue, however, I'd recommend packaging only tbd files which are text-based descriptors for actual shared libraries (in fact, there might not even be any other option these days).

To be clear: is this required for x86_64 too, not only aarch64? I am clarifying, because compiling x86_64 does not produce warning(link): framework not found for '-framework CoreFoundation', only aarch64 does.

No, the behaviour should be common on any architecture targeting macOS - we should only look for the SDK if compiling natively. Lemme check quickly locally on my Intel MBP why there's no warning.

Only the *.tlbs weigh ~1.3MB, which is good. I will figure out the legal/licensing part.

* [curiosity] what is the sysroot used for during the compilation phase?

We use the sysroot in a few places mainly when building natively (we then pull the libc headers from there together with framework stubs)

For the record, our use case (bazel-zig-cc) is always "cross-compilation" mode, that's how we get hermetic builds. So our builds don't have to rely on anything in the system (and, by extension, produced binaries are the same, regardless where they are compiled).

@kubkon
Copy link
Member

kubkon commented Dec 9, 2021

@motiejus I just checked and Go doesn't link in the framework on x86_64 macOS, but does on aarch64 macOS. Fancy that!

EDIT:

  • verbose out x86_64-macos (no framework flag):
zig ld -dynamic $WORK/b003/_cgo_main.o $WORK/b003/_x001.o $WORK/b003/_x002.o $WORK/b003/_x003.o $WORK/b003/_x004.o $WORK/b003/_x005.o $WORK/b003/_x006.o $WORK/b003/_x007.o $WORK/b003/_x008.o $WORK/b003/_x009.o /Users/jakubkonka/.cache/zig/o/572f8929e536c7d45cc10990145776b5/libcompiler_rt.a -o /Users/jakubkonka/.cache/zig/o/c91e408491bbfd1274cd51f679cc9dea/_cgo_.o -lSystem -lc
  • verbose out aarch64-macos (framework CoreFoundation required):
zig ld -dynamic $WORK/b003/_cgo_main.o $WORK/b003/_x001.o $WORK/b003/_x002.o $WORK/b003/_x003.o $WORK/b003/_x004.o $WORK/b003/_x005.o $WORK/b003/_x006.o $WORK/b003/_x007.o $WORK/b003/_x008.o $WORK/b003/_x009.o /Users/jakubkonka/.cache/zig/o/c8d02e5cfb2617226cb04b7d32ca5ce0/libcompiler_rt.a -o /Users/jakubkonka/.cache/zig/o/d50879c2d00812cecb9e2d159893722d/_cgo_.o -lSystem -lc -framework CoreFoundation

I don't use Go nor know enough about it, but it might be worth checking with Go devs where the difference comes from between the two targets.

@uhthomas
Copy link

uhthomas commented Dec 9, 2021

@uhthomas so here's the rule of thumb to remember and follow:

  1. if you simply invoke zig cc (no -target flag) then the compiler assumes native to native and if you happen to be on macOS, zig will try autodetecting the SDK (if it's required, e.g., when linking frameworks)
  2. if you set the compiler for cross-compilation like zig cc -target x86_64-macos, then you will need to pass the sysroot yourself as part of the build command; e.g., zig cc -target x86_64-macos --sysroot <path>

I see thank you. Apologies if you're having to repeat yourself. Do you have any advice on where to get a sysroot when targeting macOS? It sounds to me like there's no standard way of doing this.

@kubkon
Copy link
Member

kubkon commented Dec 9, 2021

@uhthomas so here's the rule of thumb to remember and follow:

  1. if you simply invoke zig cc (no -target flag) then the compiler assumes native to native and if you happen to be on macOS, zig will try autodetecting the SDK (if it's required, e.g., when linking frameworks)
  2. if you set the compiler for cross-compilation like zig cc -target x86_64-macos, then you will need to pass the sysroot yourself as part of the build command; e.g., zig cc -target x86_64-macos --sysroot <path>

I see thank you. Apologies if you're having to repeat yourself. Do you have any advice on where to get a sysroot when targeting macOS? It sounds to me like there's no standard way of doing this.

No problem at all. So if you're on macOS, there are two ways: 1) install CLT (I bet you already did), 2) install Xcode. Either way, you can then use the xcrun tool to get the path to the sysroot/SDK like so:

zig cc --sysroot $(xcrun --show-sdk-path) -target aarch64-macos-gnu -framework Foundation

If you're on a non-macOS host, then you will need to supply the sysroot yourself. Something like this for instance https://github.com/hexops/sdk-macos-11.3.

Finally, note that if you don't require to link against frameworks, you don't need to specify the sysroot as Zig ships both libSystem.tbd stubs and libc headers (if some headers are missing though, please feel free to submit an issue).

@uhthomas
Copy link

uhthomas commented Dec 9, 2021

It looks like Apple is putting more effort into open source contribution. I wonder if it's available as part of their releases somewhere.

https://opensource.apple.com/releases/
https://github.com/apple-oss-distributions
https://github.com/apple-oss-distributions/distribution-macOS
https://github.com/apple-oss-distributions/distribution-Developer_Tools

@andrewrk
Copy link
Member

andrewrk commented Dec 9, 2021

In that case, we might make it a hard error if you want to link a framework and zig cannot find it. I'll need to check what Apple's ld64 does in this case, but I think it's fair to just hard error - @andrewrk thoughts?

Yes if you pass -framework foo and framework foo cannot be found, that should be a hard error.

@slimsag
Copy link
Contributor

slimsag commented Dec 10, 2021

If you're on a non-macOS host, then you will need to supply the sysroot yourself. Something like this for instance https://github.com/hexops/sdk-macos-11.3.

Since it was mentioned, I should note there's a much nicer Zig file to consume these: https://github.com/hexops/mach/blob/main/glfw/system_sdk.zig

It lets you just write system_sdk.include(b, step, .{}) and it'll automatically git clone the SDK for you, check out a specific revision of the SDK, handle git fetching updates, etc. and does the same for Linux + different macOS SDK versions when using versioned mac target triples.

By default it also won't set the b.sysroot, but rather just adds to the include paths so that if someone isn't cross-compiling they can still include native libs on their system. You can pass .{.set_sysroot = true} as the last options param to set the sysroot, which is useful when checking if you depend on any libs not in the native SDKs.

motiejus added a commit to motiejus/zig that referenced this issue Dec 11, 2021
If `-framework` is requested, but not found, the linker will err
instead of creating a strange executable.

ziglang#10299 (comment)

Refs ziglang#9542
Refs ziglang#10299
Refs ziglang#10158
@steeve
Copy link

steeve commented Apr 9, 2024

That's an interesting one, and I don't really have an answer to that. Some folks have packaged a sysroot for a version of macOS already, for instance https://github.com/hexops/sdk-macos-11.3. Legal implications - I have no clue, however, I'd recommend packaging only tbd files which are text-based descriptors for actual shared libraries (in fact, there might not even be any other option these days).

It is possible to download the SDKs directly from Apple:

I found them by searching for .pkg in the macOS console when performing a xcode-select --install.

To extract them:
$ pkgutil --expand-full CLTools_macOSNMOS_SDK.pkg <tmpdirthatdoesntexistyet>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-aarch64 64-bit ARM bug Observed behavior contradicts documented or intended behavior os-macos regression It worked in a previous version of Zig, but stopped working.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants