-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build failures at HEAD on Apple ARM64 #10132
Comments
To clarify: does it build with a newer TF tree from the WORKSPACE? i.e., is this just a case of us using a newer TF snapshot? |
Just pulled in I think I am bitten by two different things here - first is Bazel 5.1.0 being downloaded locally and failing to resolve toolchains properly (if these are related, I cannot really say, due to limited Bazel experience), second is that with my Bazel 5.0.0 on PATH, I am getting errors from Tensorflow's BUILD. |
5.1.0 is now required because of other changes in the TF tree's BUILD files. So we need to figure out what to do to get Bazel 5.1.0 building successfully on ARM. (I don't have access to such a machine myself.) |
FYI, the
|
I wonder whether bazelbuild/bazel#14995 (which is mentioned in the Bazel 5.1.0 release notes) is relevant. |
Okay, I can take this on if you want, as it is my personal machine. Do you have some insights on toolchain resolution off the top of your head that may be useful for debugging? I assume this will be resolved once Homebrew updates the Bazel bottle. Still it might be worth investigating how to enable macOS builds with downloaded in-tree Bazel snapshots. (Otherwise, this functionality can't really be considered working on macOS ARM64 in my opinion) |
No, I don't think Homebrew updates will fix this. My guess is this is some sort of legitimate Mac/ARM/Bazel 5.1.0 problem. The released Bazel should work without any homebrew stuff. |
Fair point. I went digging into the Bazel cache and found that apparently the host was autodetected to have constraints
which would explain all of the toolchain mismatches. Something apparently went wrong when setting the host platform constraints, can I specifically ask Bazel to build for a |
I'd guess that comes from: https://github.com/bazelbuild/bazel/blob/e2853223f429ee30731c1015f83baed1570fcbe6/src/main/java/com/google/devtools/build/lib/bazel/repository/LocalConfigPlatformFunction.java#L117 I'm wondering if we should just report this to upstream Bazel and ask them. |
More problems: After removing the entire cache folder (to rule out cache conflicts between 5.0.0 and 5.1.0) and running with Bazel 5.1.0, I get a different error this time:
This occurs inside the (I would have posted you the new constraints file, but Bazel crashed with the above before writing it out.) |
I filed bazelbuild/bazel#15175, and referenced this thread. I hope I presented the issue in a reasonable way given our conversation - if you feel like anything is wrong about what I wrote there, please let me know, so I can correct it. Thanks! |
Hey @hawkinsp, your first suggestion was absolutely correct. Reverting bazelbuild/bazel#14995 in a local Bazel source build makes |
By the way, the local repo approach finally worked when I checked out the same commit SHA as the HTTP repo snapshot above it in the WORKSPACE. How can I check whether a given TensorFlow commit will work for a Jaxlib build? I went through numerous errors with Bazel and LLVM dependencies yesterday evening on the way. |
@nicholasjng Normally, most TF versions work. I'm wondering if something went wrong with cached state on your machine. |
Could be. TensorFlow master@HEAD does not build. I manually removed the whole bazel repository cache, but that didn't change anything.
EDIT: Pulled again, this time it builds. No idea what's going on, but in case someone on Apple ARM64 wants to build jaxlib, TensorFlow @ ed9b4612299de07881ede418fc7fbf809e6911e8 apparently works. |
Quick update @hawkinsp: Thanks to the platforms update in TFRT, jaxlin with the new pinned TensorFlow archive builds fine even without the platforms repo pin. I also retested with my local fork, and TensorFlow @ 3f4c773f94a4936c5749764b7bfcc2a567deef0e (even newer than the currently pinned TF) builds without the platforms, too. So you can drop the explicit platforms import in JAX's workspace again, if you prefer. |
@nicholasjng Thanks, I'll revert that change. |
Current top of main does not build on my machine (Apple M1 Pro, macOS 12.3.1):
Some notes / observations:
.bazelversion
(and, as of recently,build.py
) to use Bazel 5.0.0 also fails, with a different error (it complains about awin_ver_file
setting in Tensorflow'sBUILD
file, which I guess was not supported with v5.0.0).--toolchain_resolution_debug=@bazel_tools//tools/cpp:toolchain_type
flag into the build args. Somehow, all candidate toolchains are rejected, but I don't know why.master
for me, at or near current HEAD). So I pulled in some of the recent changes in Tensorflow's build config.The text was updated successfully, but these errors were encountered: