Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64 support #13514

Open
jwnimmer-tri opened this issue Jun 8, 2020 · 50 comments
Open

ARM64 support #13514

jwnimmer-tri opened this issue Jun 8, 2020 · 50 comments
Assignees
Labels
component: build system Bazel, CMake, dependencies, memory checkers, linters priority: backlog type: feature request

Comments

@jwnimmer-tri
Copy link
Collaborator

jwnimmer-tri commented Jun 8, 2020

We should consider supporting ARM64 / AArch64 for a subset of Drake, and what timeline might make sense to offer such support.

See #10435 and #20075 for sample requests, though I've also received a couple of offline request pings as well.

Given that the core Drake developer team does not currently use this architecture, I'd expect such support to be facilitated by the core team (e.g., code review and Jenkins), but spearheaded via PRs from community contributors.

Please feel free to add additional thoughts below.


Edited to add:

For a while in the past, we advocated Running Drake on macOS ARM hardware via Rosetta 2 on macOS while ARM64 support matured. As of today, macOS is supported on ARM64 natively; see https://drake.mit.edu/installation.html for details.

The remaining focus of this issue is ARM64 for Ubuntu platforms.

One particularly helpful outcome would be to publish Linux ARM64 wheel files to PyPI.

@jwnimmer-tri
Copy link
Collaborator Author

Related to bazelbuild/bazel#8833.

@jamiesnape
Copy link
Contributor

FTR https://www.apple.com/newsroom/2020/06/apple-announces-mac-transition-to-apple-silicon/.

The Apple developer documentation also has various information on the differences between the arm64 and x86_64 architectures that are applicable to Linux too, e.g., float-to-int conversions.

@joshuagruenstein
Copy link

I think this would be especially useful for people running lots of cloud experiments (eg RL people), given AWS EC2 now offers low-cost ARM compute.

@aiyer-commits
Copy link

This would enable running drake on android as well I imagine? Would be interested in contributing, but not sure where to start

@jamiesnape
Copy link
Contributor

This would enable running drake on android as well I imagine?

It is necessary, but not necessarly sufficient. Difficult to say what else would be needed at this stage.

Would be interested in contributing, but not sure where to start.

Looking at some of the prerequisites that Drake builds would be a start. I know FCL has issues: flexible-collision-library/fcl#474

@aiyer-commits
Copy link

aiyer-commits commented Dec 21, 2020 via email

@gizatt
Copy link
Contributor

gizatt commented Jun 9, 2021

As part of a side project, I just tried building Drake (arbitrarily, sha 9d5b2e from yesterday) on a Rasberry Pi 4b, which has a pretty dinky 1.5Mhz 4-core ARM64 CPU. I have the version with 4GB of RAM + added 8GB of swap. I'm running Ubuntu 20.04.2 on it.

Kudos to the build folks, it built with relatively superficial modifications! I just tested a 3D sim (simulating a simple point-foot quadruped with collision) with meshcat viz and some interactive sliders, so a lot of the moving parts seem intact and pleasantly performant (i.e. that sim was near-real-time) given that I'm running Drake on a potato. I'll poke at IK and optimization code when I put more hours on that project.

The specific diffs are here. The changes I needed to make were:

  • Installing dependencies:
    • libquadmath0 doesn't exist on Ubuntu 20.04 --> can't built SNOPT Fortran. I hacked around this dependency install and didn't ask for SNOPT support to be built.
    • I needed to point to arm64 rather than amd64 bazel during the bazel install step.
  • Build:
    • I needed to rebuild the vtk binary archive. The provided Docker workflow for doing so worked out of the box.
    • drake-visualizer needed the same, but I had issues running the Docker workflow, so I disabled the installation step for it.
    • dreal didn't build correctly -- it tried to access some x86_64 instructions during build -- so I disabled it.
    • I deeded to disable -march broadwell to build the math folder.
    • tools/LCM needed -pthread forcefully specified, for some reason.
    • pybind had a hard x86_64 libclang reference that needed to be corrected.

Notes:

  • The build took a couple of hours, with the worst offender being the Python bindings: tree_py.cc alone took a bit over 2 hours.
  • I haven't done any homework w.r.t. running tests -- e.g. wouldn't be surprised if those FCL issues reared their head here -- and given how slow the Pi has been to finish a single build, I'm not going to try for now.

@jamiesnape
Copy link
Contributor

You can open an issue about drake-visualizer if the Dockerfile was not working out for you, whether it is ARM-specific or not. My hope is that each Dockerfile is agnostic to the build architecture and will work on pretty much anything 64-bit that Ubuntu supports.

@jamiesnape
Copy link
Contributor

jamiesnape commented Jun 10, 2021

  • I deeded to disable -march broadwell to build the math folder.

Not great that we pass that unconditionally. Bazel does support passing flags specific to processor architecture. Probably worthy of an issue.

@jwnimmer-tri
Copy link
Collaborator Author

Thanks @gizatt for your experience report and posting the branch. Feedback like this is super helpful for us to understand user's experiences.

@mwoehlke-kitware
Copy link
Contributor

So, I gave this a spin recently. Other than having to disable ibex (which required removing some test stuff; --define=NO_DREAL=ON was not sufficient) due to x86 assembly, and bludgeoning a whole lot of /usr/local paths that need to be /opt/homebrew, it mostly worked. There are 24 test failures, however. At least one of those I believe is a formatting issue and thus probably uninteresting, but it looked like a lot of numeric tests were returning slightly different results.

@RussTedrake
Copy link
Contributor

FTR -- My now extremely small patch to enable bazel build //tools/install/libdrake:libdrake.so //bindings/pydrake and install on apple silicon is here.

@jakewelde
Copy link

Hi folks. I've been trying to build Drake in Ubuntu 20.04.03 on a Raspberry Pi Compute Module 4 with 4GB ram + 8GB swap, using @gizatt's diff above. The only differences I'm aware of between their setup and my own are that I'm in 20.04.03 (instead of 20.04.02), and I'm on a Compute Module 4 instead of 4b, but the underlying hardware is the same to my knowledge.

When I run bazel build //..., I get a whole bunch of "conflicting action" errors like the below:

ERROR: file 'systems/framework/_objs/diagram_context_test/diagram_context_test.pic.o' is generated by these conflicting actions:
Label: //systems/framework:diagram_context_test
RuleClass: cc_test rule
Configuration: 7da1b208fd0017bf8859b340625d8b4847fa88eed586ecfbb47bbaa2d0112fcd, 09f6f628bf7b1967fa0878bd8781bf7134141996d34ea5c493b3f294007260c9
Mnemonic: CppCompile
Action key: e0ee33cd4e44065dd0124f31e1b041bf698192be8daad9fc7db6b9657c38f3df
Progress message: Compiling systems/framework/test/diagram_context_test.cc
PrimaryInput: File:[/home/ubuntu/drake[source]]systems/framework/test/diagram_context_test.cc
PrimaryOutput: File:[[<execution_root>]bazel-out/aarch64-opt/bin]systems/framework/_objs/diagram_context_test/diagram_context_test.pic.o
Owner information: ConfiguredTargetKey{label=//systems/framework:diagram_context_test, config=BuildConfigurationValue.Key[7da1b208fd0017bf8859b340625d8b4847fa88eed586ecfbb47bbaa2d0112fcd]}, ConfiguredTargetKey{label=//systems/framework:diagram_context_test, config=BuildConfigurationValue.Key[09f6f628bf7b1967fa0878bd8781bf7134141996d34ea5c493b3f294007260c9]}
MandatoryInputs: are equal
Outputs: are equal

and when I run bazel build //tools/install/libdrake:libdrake.so, after a while I run into the following errors (is this second one possibly related to the x86 assembly issues @mwoehlke-kitware encountered?):

INFO: Analyzed target //tools/install/libdrake:libdrake.so (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
ERROR: /home/ubuntu/.cache/bazel/_bazel_ubuntu/0e5a44dfdb0fa037a3af759ace4c4232/external/dreal/dreal/solver/BUILD.bazel:35:17: Compiling dreal/solver/brancher.cc failed: (Exit 1): gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 83 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox
/tmp/ccbci1Fk.s: Assembler messages:
/tmp/ccbci1Fk.s:2964: Error: unknown mnemonic `ldmxcsr' -- `ldmxcsr [x22]'
/tmp/ccbci1Fk.s:2965: Error: unknown mnemonic `subsd' -- `subsd v0,v1'
/tmp/ccbci1Fk.s:2966: Error: unknown mnemonic `ldmxcsr' -- `ldmxcsr [x21]'
Target //tools/install/libdrake:libdrake.so failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 2527.997s, Critical Path: 809.85s
INFO: 409 processes: 12 internal, 397 linux-sandbox.
FAILED: Build did NOT complete successfully

Has anyone else encountered similar problems? I know that Drake isn't officially supported on arm64, but it sounds like other folks are having more success with this off-label use than I am. I also wanted to add my support for future official arm64 support if possible - in my lab we are hoping to use Drake on board small aerial vehicles, and that will be very difficult if limited to x86 arch.

Thanks in advance for any suggestions you might be able to offer, and sorry if this is too "off-label" to ask for support!

@jwnimmer-tri
Copy link
Collaborator Author

I think @adeeb10abbas could give a more definitive answer on the current state of affairs, but as I understand it as of #18264 or later, installing from source using cmake && make will succeed on Ubuntu 22 on arm64, except that:

(1) You need to install Bazel by hand first; the setup/ubuntu/install_prereqs doesn't install it automatically.
(2) You need to pass --define NO_DREAL=ON to Bazel somehow, maybe a user.bazelrc file in the Drake source tree.

Writing that out now, I realize we can probably fix (2) to happen automatically. I'll see what I can do.

If you have a build error, I think it's fine to post it in this thread and the people involved can try to help. (Be sure to include a complete recipe to reproduce the problem.)


As for the support roadmap, there's nothing set in stone. We're willing to accept PRs that make off-label usage work better (assuming they don't regress the on-label uses), as in #18261, #18264, #18221. Given that we have macOS arm64 under official support, it seems likely that Ubuntu arm64 will remain in pretty good shape.

The question of official support is really just about CI costs, since anything official requires regular CI coverage, nightly builds, etc. I haven't yet started to create a budget for those costs, but I can have the team start investigating that soon.

@jakewelde
Copy link

Thank you very much @jwnimmer-tri for the detailed and helpful response. I will give this another shot with these new tips and report back if I have issues. Really appreciate your candor and assistance!

@adeeb10abbas
Copy link
Member

Hi everyone, so as Jeremy mentioned, you should be able to build from source with NO_DREAL=ON as of #18264. Apart from that, everything works as it would on an x86_64 machine.

  1. For Bazel, I used bazelisk to install which worked great for me!
  2. One more thing @jakewelde, I'd recommend compiling on an Apple Silicon machine if possible unless you know your ARM64 machine is more powerful than that because it's gonna take a loooong time to compile :) My Parallels VM with single core M1 pro with 8 gigs of RAM took 2 hours :D

I can also share the binaries with ya Jack if you really need to hack something soon :)

  1. @jwnimmer-tri regarding the support roadmap (please correct me if I am wrong) - instead of provisioning an ARM64 machine, maybe we can look into cross-compiling on the instances we already have for x86 systems? I would imagine it shouldn't cost significantly more than what it already does?

@jwnimmer-tri
Copy link
Collaborator Author

(3) Our Ubuntu CI is pay-by-the-minute (cloud) not dedicated machines, so will almost certainly be cheaper to rent the EC2 machines with arm64 hardware versus cross-compiling. It's also not only a question of hardware cost, but also the staff cost to set up, babysit, and debug the CI over time. We might be able to minimize that by only supporting a narrower subset of CI builds at the start.

@jakewelde
Copy link

Thank you very much for the additional insights @adeeb10abbas! I don't currently have access to an arm64 mac, but I'm not in a huge rush either. I'll give this a shot compiling on my embedded hardware and see where it gets me. Thanks for the tips! Assuming this works out, I think it's very likely we'll go this route.

@jwnimmer-tri
Copy link
Collaborator Author

One quick update:

Writing that out now, I realize we can probably fix --define NO_DREAL=ON to happen automatically. I'll see what I can do.

In a couple weeks fiddling with dReal will no longer be necessary; we've deprecated it for removal on 2023-02-01 (#18156).

@LordAcrobaticTurtle
Copy link

Hi everyone

Just wanted clarification on the --define NO_DREAL=ON argument
I'm attempting to build drake on a Nvidia Jetson TX2 in an Ubuntu 22.04 Docker container.
I am using cmake and make to compile the project.
The .bazelrc imports the user.bazelrc file which contains the following two lines.

build --define NO_DREAL=ON
build --jobs=1

However, I'm still getting these errors during compilation

[ 12%] Performing build step for 'drake_cxx_python'
INFO: Analyzed target //:install (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
ERROR: /home/dev/drake/solvers/BUILD.bazel:758:25: Compiling solvers/no_dreal.cc failed: (Exit 1): cc failed: error executing command (from target //solvers:_dreal_solver_compiled_cc_impl) /usr/bin/cc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 102 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
solvers/no_dreal.cc:10:28: error: 'DrealSolver' is deprecated: \nDRAKE DEPRECATED: dReal support is being withdrawn from Drake; for details, see https://github.com/RobotLocomotion/drake/pull/18156\nThe deprecated code will be removed from Drake on or after 2023-02-01. [-Werror=deprecated-declarations]
   10 | std::optional<DrealSolver::IntervalBox> DrealSolver::CheckSatisfiability(
      |                            ^~~~~~~~~~~
In file included from solvers/no_dreal.cc:2:
bazel-out/aarch64-opt/bin/solvers/_virtual_includes/_dreal_solver_compiled_cc_impl/drake/solvers/dreal_solver.h:25:1: note: declared here
   25 | DrealSolver final : public SolverBase {
      | ^~~~~~~~~~~
solvers/no_dreal.cc:17:28: error: 'DrealSolver' is deprecated: \nDRAKE DEPRECATED: dReal support is being withdrawn from Drake; for details, see https://github.com/RobotLocomotion/drake/pull/18156\nThe deprecated code will be removed from Drake on or after 2023-02-01. [-Werror=deprecated-declarations]
   17 | std::optional<DrealSolver::IntervalBox> DrealSolver::Minimize(
      |                            ^~~~~~~~~~~
In file included from solvers/no_dreal.cc:2:
bazel-out/aarch64-opt/bin/solvers/_virtual_includes/_dreal_solver_compiled_cc_impl/drake/solvers/dreal_solver.h:25:1: note: declared here
   25 | DrealSolver final : public SolverBase {
      | ^~~~~~~~~~~
cc1plus: some warnings being treated as errors
Target //:install failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 2333.051s, Critical Path: 355.40s
INFO: 103 processes: 10 internal, 92 linux-sandbox, 1 worker.
FAILED: Build did NOT complete successfully
make[2]: *** [CMakeFiles/drake_cxx_python.dir/build.make:86: drake_cxx_python-prefix/src/drake_cxx_python-stamp/drake_cxx_python-build] Error 1
make[1]: *** [CMakeFiles/Makefile2:839: CMakeFiles/drake_cxx_python.dir/all] Error 2
make: *** [Makefile:166: all] Error 2

I'm pretty new to bazel so I assume that I'm passing the commands incorrectly. Does anyone have any advice?

@jwnimmer-tri
Copy link
Collaborator Author

jwnimmer-tri commented Jan 30, 2023

Apologies, you've hit two new bugs.

(1) Drake's CMake build logic ended up using -Werror=deprecated-declarations outside of Drake CI. Ideally, we would not promote warnings to errors in your case. It would have warned, but ideally the build would have proceeded. That's logged as #18691 now.

(2) The "no dReal" config switch is not tested in CI when using GCC on arm64, so I missed some spurious deprecation warnings that it emits.

Easiest is probably if you could cherry-pick #18686 in your local build. It's scheduled to merge in ~48 hours, and will remove dReal entirely.

Alternatively, something along the lines of this patch should suppress the warnings:

--- a/solvers/BUILD.bazel
+++ b/solvers/BUILD.bazel
@@ -762,6 +762,7 @@ drake_cc_variant_library(
     srcs_enabled = ["dreal_solver.cc"],
     srcs_disabled = ["no_dreal.cc"],
     hdrs = ["dreal_solver.h"],
+    copts = ["-Wno-deprecated-declarations"],
     interface_deps = [
         ":solver_base",
         "//common:essential",

@LordAcrobaticTurtle
Copy link

LordAcrobaticTurtle commented Jan 30, 2023

Hi Jeremy

Thanks for the response.
I think there is a bit more work to do :(
Do you have any more suggestions?
This compile error was thrown after applying that cherry pick commit

[ 12%] Performing build step for 'drake_cxx_python'
INFO: Analyzed target //:install (1 packages loaded, 627 targets configured).
INFO: Found 1 target...
INFO: From Compiling multibody/benchmarks/free_body/free_body.cc:
In file included from multibody/benchmarks/free_body/free_body.cc:1:
bazel-out/aarch64-opt/bin/multibody/benchmarks/free_body/_virtual_includes/free_body_only/drake/multibody/benchmarks/free_body/free_body.h: In member function 'std::pair<double, double> drake::multibody::benchmarks::free_body::FreeBody::CalcAngularRates_s_p() const':
bazel-out/aarch64-opt/bin/multibody/benchmarks/free_body/_virtual_includes/free_body_only/drake/multibody/benchmarks/free_body/free_body.h:182:58: note: parameter passing for argument of type 'std::pair<double, double>' when C++17 is enabled changed to match C++14 in GCC 10.1
  182 |   std::pair<double, double> CalcAngularRates_s_p() const {
      |                                                          ^
ERROR: /root/.cache/bazel/_bazel_/58387062a7d0fbb04042d57e5e514448/external/ibex/BUILD.bazel:200:11: Compiling external/ibex/filibsrc-3.0.2.2/fp_traits/fp_traits_sse_const.cpp failed: (Exit 1): cc failed: error executing command (from target @ibex//:filib) /usr/bin/cc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 30 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
/tmp/ccblV1KD.s: Assembler messages:
/tmp/ccblV1KD.s:20: Error: unknown mnemonic `ldmxcsr' -- `ldmxcsr [x0]'
/tmp/ccblV1KD.s:34: Error: unknown mnemonic `ldmxcsr' -- `ldmxcsr [x0]'
/tmp/ccblV1KD.s:45: Error: unknown mnemonic `ldmxcsr' -- `ldmxcsr [x0]'
/tmp/ccblV1KD.s:56: Error: unknown mnemonic `ldmxcsr' -- `ldmxcsr [x0]'
Target //:install failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 1880.896s, Critical Path: 747.88s
INFO: 394 processes: 9 internal, 385 linux-sandbox.
FAILED: Build did NOT complete successfully
make[2]: *** [CMakeFiles/drake_cxx_python.dir/build.make:86: drake_cxx_python-prefix/src/drake_cxx_python-stamp/drake_cxx_python-build] Error 1
make[1]: *** [CMakeFiles/Makefile2:839: CMakeFiles/drake_cxx_python.dir/all] Error 2
make: *** [Makefile:166: all] Error 2

@adeeb10abbas
Copy link
Member

adeeb10abbas commented Jan 30, 2023

@LordAcrobaticTurtle Do you have a small repro that I could give a shot? On my Parallels Focal Jammy VM on M1 Pro (Ubuntu aarch64), I am able to build and run drake-external-examples/drake_cmake_external just fine. I can give Jammy a try later

EDIT - I am actually on a Jammy VM, not focal.

@jwnimmer-tri
Copy link
Collaborator Author

@LordAcrobaticTurtle that error message seems impossible after the cherry-pick, because the external/ibex/BUILD.bazel file it's complaining about was deleted in #18686.

@LordAcrobaticTurtle
Copy link

@adeeb10abbas I'm trying to build the drake library so it can be used in an external project, unfortunately, I don't have any code I can share. I have not modified the build at all.

@jwnimmer-tri Yes, my apologies, I made a mistake when cherry picking that commit.
I'm fairly certain I cherry-picked correctly this time, however I am still receiving this error.

[ 12%] Performing build step for 'drake_cxx_python'
INFO: Build option --define has changed, discarding analysis cache.
INFO: Analyzed target //:install (0 packages loaded, 14561 targets configured).
INFO: Found 1 target...
ERROR: /home/dev/drake.og/bindings/pydrake/systems/BUILD.bazel:88:21: Compiling bindings/pydrake/systems/primitives_py.cc failed: (Exit 1): cc failed: error executing command (from target //bindings/pydrake/systems:primitives.cpython-310-aarch64-linux-gnu.so) /usr/bin/cc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 228 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
cc: fatal error: Killed signal terminated program cc1plus
compilation terminated.
Target //:install failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 218.381s, Critical Path: 189.47s
INFO: 7 processes: 7 internal.
FAILED: Build did NOT complete successfully
make[2]: *** [CMakeFiles/drake_cxx_python.dir/build.make:86: drake_cxx_python-prefix/src/drake_cxx_python-stamp/drake_cxx_python-build] Error 1
make[1]: *** [CMakeFiles/Makefile2:839: CMakeFiles/drake_cxx_python.dir/all] Error 2
make: *** [Makefile:166: all] Error 2
 

Happy to provide clarification/more info if needed 🙂

@jwnimmer-tri
Copy link
Collaborator Author

@LordAcrobaticTurtle see the Out Of Memory tip.

@jwnimmer-tri
Copy link
Collaborator Author

Updates: The build system work at #17231 will be helpful here, and is nearing completion. In particular, the VTK change in #16502 will remove the use of precompiled custom binaries as part of a Drake install. The only dependencies necessary to build, install, and run Drake will be things already in Ubuntu.

@alsocapprin
Copy link
Contributor

Latest in this saga, I'm trying to install Drake on a Jetson Orin. After following the guidance above, building almost succeeds, but I run into a linking error. I'm not tremendously experienced with this process, so any help would be appreciated. If this could be made easier with #17231 as commented above, I could potentially try a pre-release.

I'm eventually interested in getting the python bindings to build, but I figured I would try first with the base project.

Error Log
cbass@amp-jetson-01:~/Projects/drake$ bazel build //... --define NO_DREAL=ON
INFO: Analyzed 11957 targets (0 packages loaded, 0 targets configured).
INFO: Found 11957 targets...
ERROR: /home/cbass/Projects/drake/common/BUILD.bazel:1079:20: Linking common/text_logging_no_spdlog_test failed: (Exit 1): gcc failed: error executing command (from target //common:text_logging_no_spdlog_test) /usr/bin/gcc @bazel-out/aarch64-opt/bin/common/text_logging_no_spdlog_test-2.params

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
bazel-out/aarch64-opt/bin/external/gtest/_objs/without_main/gtest.pic.o:gtest.cc:function testing::internal::ThreadLocal<testing::TestPartResultReporterInterface*>::~ThreadLocal(): error: undefined reference to 'pthread_getspecific'
bazel-out/aarch64-opt/bin/external/gtest/_objs/without_main/gtest.pic.o:gtest.cc:function testing::internal::ThreadLocal<testing::TestPartResultReporterInterface*>::~ThreadLocal(): error: undefined reference to 'pthread_key_delete'
bazel-out/aarch64-opt/bin/external/gtest/_objs/without_main/gtest.pic.o:gtest.cc:function testing::internal::ThreadLocal<std::vector<testing::internal::TraceInfo, std::allocator<testing::internal::TraceInfo> > >::~ThreadLocal(): error: undefined reference to 'pthread_getspecific'
bazel-out/aarch64-opt/bin/external/gtest/_objs/without_main/gtest.pic.o:gtest.cc:function testing::internal::ThreadLocal<std::vector<testing::internal::TraceInfo, std::allocator<testing::internal::TraceInfo> > >::~ThreadLocal(): error: undefined reference to 'pthread_key_delete'
bazel-out/aarch64-opt/bin/external/gtest/_objs/without_main/gtest.pic.o:gtest.cc:function testing::internal::UnitTestImpl::~UnitTestImpl(): error: undefined reference to 'pthread_getspecific'
bazel-out/aarch64-opt/bin/external/gtest/_objs/without_main/gtest.pic.o:gtest.cc:function testing::internal::UnitTestImpl::~UnitTestImpl(): error: undefined reference to 'pthread_key_delete'
bazel-out/aarch64-opt/bin/external/gtest/_objs/without_main/gtest.pic.o:gtest.cc:function testing::internal::UnitTestImpl::~UnitTestImpl(): error: undefined reference to 'pthread_getspecific'
bazel-out/aarch64-opt/bin/external/gtest/_objs/without_main/gtest.pic.o:gtest.cc:function testing::internal::UnitTestImpl::~UnitTestImpl(): error: undefined reference to 'pthread_key_delete'
bazel-out/aarch64-opt/bin/external/gtest/_objs/without_main/gtest.pic.o:gtest.cc:function testing::internal::ThreadLocal<std::vector<testing::internal::TraceInfo, std::allocator<testing::internal::TraceInfo> > >::GetOrCreateValue() const: error: undefined reference to 'pthread_setspecific'
bazel-out/aarch64-opt/bin/external/gtest/_objs/without_main/gtest.pic.o:gtest.cc:function testing::internal::UnitTestImpl::UnitTestImpl(testing::UnitTest*): error: undefined reference to 'pthread_key_create'
bazel-out/aarch64-opt/bin/external/gtest/_objs/without_main/gtest.pic.o:gtest.cc:function testing::internal::UnitTestImpl::UnitTestImpl(testing::UnitTest*): error: undefined reference to 'pthread_key_create'
bazel-out/aarch64-opt/bin/external/gtest/_objs/without_main/gtest.pic.o:gtest.cc:function testing::internal::UnitTestImpl::UnitTestImpl(testing::UnitTest*): error: undefined reference to 'pthread_key_create'
bazel-out/aarch64-opt/bin/external/gtest/_objs/without_main/gtest.pic.o:gtest.cc:function testing::internal::UnitTestImpl::GetTestPartResultReporterForCurrentThread(): error: undefined reference to 'pthread_setspecific'
bazel-out/aarch64-opt/bin/external/gtest/_objs/without_main/gtest.pic.o:gtest.cc:function testing::UnitTest::AddTestPartResult(testing::TestPartResult::Type, char const*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&): error: undefined reference to 'pthread_setspecific'
bazel-out/aarch64-opt/bin/external/gtest/_objs/without_main/gtest.pic.o:gtest.cc:function testing::UnitTest::AddTestPartResult(testing::TestPartResult::Type, char const*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&): error: undefined reference to 'pthread_setspecific'
collect2: error: ld returned 1 exit status
INFO: Elapsed time: 1722.646s, Critical Path: 303.34s
INFO: 2421 processes: 878 internal, 1543 linux-sandbox.
FAILED: Build did NOT complete successfully

@alsocapprin
Copy link
Contributor

alsocapprin commented Dec 14, 2023

Update: I managed to get a build working by adding the -pthread linker option to failing components. At that point, running a bazel build //... succeeds.

When I run bazel test //..., all but ~300 tests succeed, which appear related to the python bindings and VTK. This comment above appears to have the tools to address at least the VTK issues, but I won't replicate them given that #17321 stands a chance of doing so for me. I've tested several of the included examples as well, which make me think that "normal" use of Drake works, as long as I'm not using things like VTK.
(edit: Some of these failures are due to exec format errors; others have to do with the fact that I'm running tests on a headless machine and there is no display)

One final note: When I use the CMake workflow to build the Python bindings, the build succeeds, but install fails because strip can't read one of the .so files related to transformations - there may be more. I assume that this is an x86 binary that isn't being built locally, like VTK.

That's all I've done up until now. I'm happy to try suggestions to get all the tests passing or the python bindings installing correctly; otherwise, this is probably all I'll pull on this thread for now.

@jwnimmer-tri
Copy link
Collaborator Author

jwnimmer-tri commented Dec 14, 2023

When building Drake >= v1.21.0, VTK is compiled from source (and so no longer uses the pre-compiled x86-specific binaries). The only downloaded object code that's not built from source are some opt-in commercial solvers (mosek, gurobi), a docs-only tool (doxygen), and a test-only tool (buildifier; see #20638 #20579).

@alsocapprin
Copy link
Contributor

Thanks for the response - I think then my VTK test failures are related to the lack of display, and the other test failures are related to buildifier.

@jwnimmer-tri
Copy link
Collaborator Author

When building Drake >= v1.21.0, VTK is compiled from source (and so no longer uses the pre-compiled x86-specific binaries). The only downloaded object code that's not built from source are some opt-in commercial solvers (mosek, gurobi), a docs-only tool (doxygen), and a test-only tool (buildifier).

As of #20579, Drake's @buildifier repository rule now supports arm64 natively, so this is no longer a problem.

I managed to get a build working by adding the -pthread linker option to failing components.

We'd welcome pull request(s) with fixes like that.

@alsocapprin
Copy link
Contributor

@jwnimmer-tri Apologies for the delay, but opened a PR (above) with our additional linker options. I haven't had the time yet to run extensive testing (apart from our own day-to-day use and an initial set of bazel tests), but I'll do some more wringout in the next day or two.

@jwnimmer-tri
Copy link
Collaborator Author

Per #21030, we have some CI jobs defined now for arm64 here.

Unsurprisingly, they indicate not everything is working yet. In particular, install_bazel.sh is a no-op on arm64 per #18261. Solving that is the next step here.

@jwnimmer-tri
Copy link
Collaborator Author

jwnimmer-tri commented Apr 2, 2024

The CI jobs are up and running, with Bazel installed properly.

To see the current status, check linux-arm-jammy-unprovisioned-gcc-bazel-experimental-release for the most recent build log.

As of today, there are around 40 failing tests. The next job for our arm64 user community is to open pull requests with fixes for those tests. It's fine (and preferable) to chip away a them one or a few at a time, not all at once. As long as a PR makes at least one test better, and no tests worse, we're still happy to merge the PR even with overall failing CI.

@jwnimmer-tri
Copy link
Collaborator Author

As of the most recent build, there are now linker/loader errors. It looks to me like maybe some quadmath library is no longer being linked properly? Not sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: build system Bazel, CMake, dependencies, memory checkers, linters priority: backlog type: feature request
Projects
None yet
Development

No branches or pull requests