Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] [python-package] add arm64 macOS wheels #6391

Merged
merged 25 commits into from
Jun 14, 2024
Merged

Conversation

jameslamb
Copy link
Collaborator

@jameslamb jameslamb commented Mar 29, 2024

Contributes to #5328
Contributes to #4229
Might help with #5764

  • adds arm64 macOS wheels for the Python package (for the M1/M2/M3 Macs)
  • patches OpenMP linking details in lib_lightgbm.dylib so that a pip-installed lightgbm won't segfault when used in a conda environment

Notes for Reviewers

Proposed approach for building arm64 wheels

It seems that Azure DevOps does not support arm64 macOS runners... only GitHub Actions: actions/runner-images#8971.

So I'm proposing using GitHub Actions new macos-14 runners, which are arm64 (#5328 (comment)).

Proposed approach for fixing loading errors

The problem "using a pip-installed lightgbm inside a conda environment segfaults" is not specific to arm64. It's been around in lightgbm's macOS support for a few years. I decided to try to fix it here to be able to test the new arm64 wheels.

I'm proposing the following approach:

  • continue dynamically linking to OpenMP
  • do not vendor a copy of libomp
  • patch the search paths in lib_lightgbm.dylib to be more forgiving about where libomp.dylib is found

In short:

  • the hard-coded path like /opt/homebrew/opt/libomp/lib/libomp.dylib currently stored in lib_lightgb.dylib forces the macOS loader to load a library from that exact path, even if there is already a libomp.dylib loaded from some other source (like the llvm-openmp shipped y conda)
  • this PR replaces that with @rpath/libomp.dylib, which means that at runtime of lightgbm, the loader will do the following (in order)
    1. if there is a libomp.dylib already loaded, just use that
    2. if not, first search for one in /opt/homebrew/opt/libomp/lib/
    3. if there isn't one there, search wherever one was found at build time (which might also be /opt/homebrew/opt/libomp/lib/)
    4. if one still hasn't been found, search standard paths like /usr/local/lib and /usr/lib

See "References" section for the MANY resources I consulted to get to this point. Particularly:

How I tested this

downloaded the wheel built in CI (click me)

Got a run ID from CI, looking at the URL.

e.g. for this build: https://github.com/microsoft/LightGBM/actions/runs/9476495757/job/26109485388?pr=6391

the run id is 9476495757.

Then, following https://docs.github.com/en/actions/managing-workflow-runs/downloading-workflow-artifacts?tool=cli, used the GitHub CLI to download the wheels. A fine-grained access token with just read-only permission on Actions should be enough.

From the root of the repo, on my M2 Mac laptop:

rm -rf ./whl-tmp
mkdir -p ./whl-tmp

RUN_ID=9476495757
gh run download \
    --repo microsoft/LightGBM \
    "${RUN_ID}" \
    -n macosx-arm64-wheel \
    --dir ./whl-tmp
created a new conda environment (click me)

Created a conda environment.

conda create \
  --name delete-me-lgb-dev \
  --yes \
  --file ./.ci/conda-envs/ci-core.txt \
  python=3.10 \
  pydistcheck

source activate delete-me-lgb-dev
Inspected the wheel with `pydistcheck` (click me)
pydistcheck \
    --inspect ./whl-tmp/lightgbm-4.3.0.99-py3-none-macosx_14_0_arm64.whl
==================== running pydistcheck ====================                                                                                                  
                      
checking './whl-tmp/lightgbm-4.3.0.99-py3-none-macosx_14_0_arm64.whl'
----- package inspection summary -----
file size
  * compressed size: 1.5M
  * uncompressed size: 6.5M
  * compression space saving: 76.7%
contents
  * directories: 0
  * files: 16 (1 compiled)
size by extension
  * .dylib - 6.0M (93.2%)
  * .py - 0.4M (6.4%)
  * no-extension - 21.7K (0.3%)
  * .txt - 9.0B (0.0%)
  * .typed - 0.0B (0.0%)
largest files
  * (6.0M) lightgbm/lib/lib_lightgbm.dylib
  * (0.2M) lightgbm/basic.py
  * (65.4K) lightgbm/dask.py
  * (63.6K) lightgbm/sklearn.py
  * (35.7K) lightgbm/engine.py
------------ check results -----------
errors found while checking: 0

Installed from the wheel (on my M2 mac, using macOS 14.4.1).

pip install \
    --no-deps \
    ./whl-tmp/lightgbm-4.3.0.99-py3-none-macosx_14_0_arm64.whl

Ran the tests

pytest tests/python_package_test

Those all passed! Removed the brew-installed libomp on my system and ran that again.

brew uninstall libomp
pytest tests/python_package_test

They passed again 🎉

Can lightgbm really safely link to any libomp.dylib?

This project's macOS wheels have already been doing that for years, by shipping with an absolute path that evaluates to "whatever brew install libomp installs", and no limits comparing the version linked against at build time to the one used at runtime.

The changes in this PR are actually safer than the existing behavior of lightgbm's macOS wheels because they avoid a well-known source of runtime issue (loading a libomp.dylib that conflicts with the one loaded by scikit-learn).

Why not just vendor libomp.dylib like other libraries do?

Other projects like lightgbm bundle a copy of LLVM OpenMP (libomp.dylib) in their wheels.

That comes with a significantly reduced risk of "cannot find libomp.dylib" types of runtime issues, but in exchange for:

  • risk of conflicts caused by multiple copies of the same library being loaded
  • larger wheels (and therefore environments)
  • lack of coordination across libraries (e.g. one's omp_set_num_threads() does not affect all the loaded OpenMP libraries, and so projects' parallelism can fight each other and lead to oversubscription)
  • greater risk of ABI incompatibilities (e.g. a version loaded at runtime having a different interface than the one that was built against)

Since lightgbm has already been using dynamic linking + not vendoring for several years and since "you have to brew install libomp" has generally not been a problem for macOS users, I'm proposing continuing those practices in this release.

Will these CMake changes break the lightgbm conda package?

They shouldn't.

conda-build rewrites paths in binaries it produces to correctly link between its packages. See "Making packages relocatable" (conda docs)

References

@jameslamb jameslamb mentioned this pull request May 1, 2024
33 tasks
@jameslamb jameslamb changed the title WIP: [ci] [python-package] add arm64 macOS wheels [ci] [python-package] add arm64 macOS wheels Jun 12, 2024
@jameslamb jameslamb marked this pull request as ready for review June 12, 2024 05:05
@jameslamb
Copy link
Collaborator Author

tagging some others who I think might be interested in this approach / topic:

@hcho3 @trivialfis @thomasjpfan

Copy link
Collaborator

@guolinke guolinke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, LGTM!

@hcho3
Copy link
Contributor

hcho3 commented Jun 13, 2024

Thanks for pinging me. I would like to adopt a similar approach for XGBoost so that we don't have to vendor libomp.

@jameslamb
Copy link
Collaborator Author

Looks like this needs a bit more work on the R side.

[100%] Linking CXX shared library /private/var/folders/vy/h7r6h43j203gstfh6fj9_tyh0000gn/T/RtmpuVEgw6/Rbuild34b875aa9c05/lightgbm/src/lightgbm.so
Replacing hard-coded OpenMP install_name with '@rpath/libgomp.dylib'...
error: /Applications/Xcode_14.3.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/install_name_tool: can't open file: /private/var/folders/vy/h7r6h43j203gstfh6fj9_tyh0000gn/T/RtmpuVEgw6/Rbuild34b875aa9c05/lightgbm/src/lib_lightgbm.dylib (No such file or directory)
make[3]: *** [/private/var/folders/vy/h7r6h43j203gstfh6fj9_tyh0000gn/T/RtmpuVEgw6/Rbuild34b875aa9c05/lightgbm/src/lightgbm.so] Error 1
make[3]: *** Deleting file `/private/var/folders/vy/h7r6h43j203gstfh6fj9_tyh0000gn/T/RtmpuVEgw6/Rbuild34b875aa9c05/lightgbm/src/lightgbm.so'

https://github.com/microsoft/LightGBM/actions/runs/9499965097/job/26181969335?pr=6391

I'll look later, but I suspect the issue is my unconditional use of .dylib here:

"${PROJECT_SOURCE_DIR}/lib_lightgbm.dylib"

It should probably be templating through CMAKE_SHARED_LIBRARY_SUFFIX, to account for this:

shlib_ext_arg <- sprintf("-DCMAKE_SHARED_LIBRARY_SUFFIX_CXX='%s'", SHLIB_EXT)

I really like that those tests caught the use of gcc + libgomp on macOS! Thankful every day that @StrikerRUS prioritized compiler coverage so much when setting up all the testing in this repo.

# echo "done running pre-commit checks"
echo "running pre-commit checks"
pre-commit run --all-files || exit 1
echo "done running pre-commit checks"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had commented this out in #6377 (comment), to keep merging things in preparation for the release.

Looks like the issues with filelock have been resolved by a new release of that project, and this now runs successfully again 🎉

@jameslamb
Copy link
Collaborator Author

Given @guolinke's approval + @jmoralez 's 🚀 , I'm going to merge this once CI passes.

Then we'll just be onto the final release tasks over in #6439, like checking the docs and adding verisonadded markers (I'll do those).

Getting close!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants