Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add arm64 wheel to travis-ci #3421

Closed
wants to merge 1 commit into from

Conversation

mattsplats
Copy link

@mattsplats mattsplats commented Sep 29, 2020

This PR adds builds and Python wheels for arm64 ("aarch64"). Here's an example build with these changes:
https://travis-ci.com/github/mattsplats/LightGBM/builds/187693808

It appears test.sh is creating wheels and uploading them to an Azure location: $BUILD_ARTIFACTSTAGINGDIRECTORY.
Some internal changes may be required to get arm64 wheels uploading to PyPI. Would appreciate any help/guidance here.

Other details:

  • Some dependencies for arm64 require Python >=3.6, so the build / wheel for arm64 is marked as Python 3 only.
  • No attempt was made to build with GPU compatibility for aarch64. The libraries used for this appear to be specific to x86. Can be addressed in a future PR.

@jameslamb
Copy link
Collaborator

Hi @mattsplats , thanks for your interest in LightGBM!

Can you explain more about the value of this change? Was there an issue in this project where you've previously discussed this?

I think we need more context to review this.

@mattsplats
Copy link
Author

Thanks for looking @jameslamb - the value is adding a Python wheel for Arm users:

With the increase of Arm CPUs in datacenters and the upcoming Apple migration to Arm, the use of Python on these platforms is growing. However, installing a Python package without a wheel often fails or is very slow. The error messages users see do not clearly identify the problem as a missing build dependency. Publishing an Arm wheel is typically low effort, just a few lines in the build script. For users this saves significant time by avoiding troubleshooting and not having to wait for build processes to finish.

LightGBM currently only publishes wheels for x86 platforms. Be glad to open an issue to this effect if it is helpful.

@jameslamb
Copy link
Collaborator

Thanks for that.

Won't users have the issues you are talking about if lightgbm's dependencies don't publish arm wheels? I don't see any on the most recent releases of pandas or numpy, for example

@mattsplats
Copy link
Author

Actually, if you take another look at numpy you'll see releases for Arm (called "aarch64"): https://files.pythonhosted.org/packages/8a/78/22ab67c0cf07301be5433903c3ca865dd2af16a73784a1028fcf3646d1ee/numpy-1.19.2-cp36-cp36m-manylinux2014_aarch64.whl

Pandas is currently migrating to travis-ci.com and should be available soon.

@ghost
Copy link

ghost commented Oct 1, 2020

CLA assistant check
All CLA requirements met.

@mattsplats mattsplats marked this pull request as ready for review October 5, 2020 16:07
@mattsplats
Copy link
Author

CLA signed, commit and description have been updated.

@jameslamb
Copy link
Collaborator

Actually, if you take another look at numpy you'll see releases for Arm (called "aarch64"): https://files.pythonhosted.org/packages/8a/78/22ab67c0cf07301be5433903c3ca865dd2af16a73784a1028fcf3646d1ee/numpy-1.19.2-cp36-cp36m-manylinux2014_aarch64.whl

Pandas is currently migrating to travis-ci.com and should be available soon.

oh ok cool, thanks!

This is a bit outside of my experience, so I'd like to hear from other maintainers. Thanks for taking the time to contribute!

@mattsplats
Copy link
Author

This is a bit outside of my experience, so I'd like to hear from other maintainers. Thanks for taking the time to contribute!

No worries, I suspect some changes will be desired anyway. We might want to throw an auditwheel repair in there, for example.

@guolinke
Copy link
Collaborator

guolinke commented Oct 9, 2020

cc @StrikerRUS for the CI part.
BTW, all our binaries are built by Azure Pipeline, can this wheel build by azure pipeline too?

@StrikerRUS
Copy link
Collaborator

BTW, all our binaries are built by Azure Pipeline, can this wheel build by azure pipeline too?

Yeah, we used to utilize Travis and Appveyor to produce build artifacts that we upload to package managers later. But then we switched to Azure Pipelines and setup Docker image with the aim to produce consistent artifacts across all platforms. The main reason was in that users were suffering from too recent glibc dependency in their environments which they cannot upgrade. See some links in #2760 (comment), for example.

So I don't think we want to rollback to Travis with its' uncontrolled environment. It will mean that artifacts for ARM and x86 platform will be inconsistent in terms of supported OSes, will introduce diversity in reported bugs and increase maintenance burden to track all (breaking) changes happening in Travis environment.

For now, I can propose only two alternative ways:

  • (preferred) build ARM artifact on Azure Pipelines as we do for all other our artifacts, but it seems there are no free ARM agents there. Possibly they will introduce them soon given the recent growing ARM popularity and we should just wait. At least they will have to do something with their macOS agents when Apple will switch to ARM. FYI, ARM support at Travis is just 1 year old and is still in beta.
  • Try to use the same Docker image at Travis that we use at Azure Pipelines.

@mattsplats
Copy link
Author

mattsplats commented Oct 13, 2020

Apologies for a long post, lot of things to unpack here.

First, issues with glibc and similar dependencies are likely to persist with the current Docker image. For example, the manylinux1 tag on the current x86 wheels isn't correct, since the wheel isn't PEP 513 compliant:

> auditwheel show lightgbm-3.0.0-py2.py3-none-manylinux1_x86_64.whl

lightgbm-3.0.0-py2.py3-none-manylinux1_x86_64.whl is consistent with
the following platform tag: "linux_x86_64".

The wheel references external versioned symbols in these system-
provided shared libraries: libgcc_s.so.1 with versions {'GCC_3.0'},
libpthread.so.0 with versions {'GLIBC_2.2.5', 'GLIBC_2.3.4'},
libgomp.so.1 with versions {'OMP_1.0', 'GOMP_1.0'}, libc.so.6 with
versions {'GLIBC_2.14', 'GLIBC_2.17', 'GLIBC_2.3.4', 'GLIBC_2.3',
'GLIBC_2.2.5', 'GLIBC_2.16', 'GLIBC_2.4', 'GLIBC_2.6'}, libm.so.6 with
versions {'GLIBC_2.2.5'}, libstdc++.so.6 with versions
{'CXXABI_1.3.5', 'CXXABI_1.3.7', 'GLIBCXX_3.4.14', 'GLIBCXX_3.4.9',
'GLIBCXX_3.4.18', 'CXXABI_1.3.3', 'GLIBCXX_3.4.11', 'GLIBCXX_3.4',
'GLIBCXX_3.4.19', 'CXXABI_1.3'}, libdl.so.2 with versions
{'GLIBC_2.2.5'}

This constrains the platform tag to "manylinux2014_x86_64". In order
to achieve a more compatible tag, you would need to recompile a new
wheel from source on a system with earlier versions of these
libraries, such as a recent manylinux image.

(I can open an issue on this bug, if it's useful.)

I think the way to build consistent artifacts across platforms is to use the manylinux containers. With these images you can use any CI, but Travis additionally provides Arm instances. Using the multibuild library makes maintaining consistent and correct wheels across architectures and Python versions very easy.

A lot of projects use multibuild with Travis, like numpy, scipy, and pandas, and are now producing Arm wheels. Further examples: https://github.com/MacPython

If you're not open to the above, an alternative is to use QEMU via Docker and emulate other architectures on your x86 CI. Some projects are doing this now on Azure pipelines. Multidict and yarl added Arm wheels this way via Github Actions.

@mattsplats
Copy link
Author

@StrikerRUS What's your preference to move forward? Happy to help add the manylinux containers to whatever CI you prefer.

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Oct 16, 2020

@mattsplats Thank you very much for your detailed response!

Speaking about Linux tag, we are using the following container based on Ubuntu 14.04 right now
https://github.com/guolinke/lightgbm-ci-docker/blob/master/dockers/ubuntu-14.04/Dockerfile
as it allowed us to workaround the most reported issues related to compatibility. I remember I had a strong wish to migrate to manylinux image or at least to CentOS 6 one, but haven't found time yet :-( . We will really appreciate any help from you!
Or at the worst case we can re-tag our current wheels. BTW, don't you now, does it mean that all users which currently are using our wheels in "incompatible" environment will have to build the package from sources (pip will not allow them to download manylinux2014_x86_64 wheel)?

For the ARM, it looks like you are very experienced in this area! Could you please clarify are there any drawbacks of using QEMU compared to building on native ARM machines provided by Travis. I'm asking because I think I'd prefer to go with QEMU and produce all artifacts from one CI service due to less maintenance burden.

Multidict and yarl added Arm wheels this way via Github Actions.

Wow, nice! GitHub actions and Azure Pipelines are very similar and if I'm not mistaken even share the same base images. I'll become acquainted with CI pipelines of these projects and read more about QEMU this weekend. Thanks a lot for the info!

@StrikerRUS
Copy link
Collaborator

@mattsplats Regarding manylinux tag, please let me share my initial thoughts. I've read some PEPs and it seems that we cannot use manylinux2010 tag.

LightGBM requires glibc >= 2.14.

manilinux2010 allows only

GLIBC_2.12

https://www.python.org/dev/peps/pep-0571/#the-manylinux2010-policy.

The next manylinux tag is manylinux2014 and it allows

GLIBC_2.17

https://www.python.org/dev/peps/pep-0599/#the-manylinux2014-policy
This tag is not suitable too due to quite high version and will result in inability of wheel installation for users with 2.14 <= glibc < 2.17.

Right now we manually audit our library files in artifacts by the following script: https://github.com/microsoft/LightGBM/blob/master/helpers/check_dynamic_dependencies.py.

New iteration of manylinux tag refactoring allows to specify arbitrary version of glibc: https://www.python.org/dev/peps/pep-0600/#core-definition. I believe it suits our case perfectly! However, it seems that PEP600 is very young and Python world hasn't adapted for this tagging schema yet. For example, take a look how many unchecked boxes has the following corresponding pip meta-issue: pypa/manylinux#542. Even with merged pypi/warehouse#7853 a lot of users users will not be able to install wheels with new schema just because their pip version is not the latest.

I will really appreciate any your thoughts about the current situation with manylinux tag.

@mattsplats
Copy link
Author

mattsplats commented Oct 19, 2020

@StrikerRUS I think you can safely use the manylinux2014 tag here.

CentOS 6 is reaching EOL, and the manylinux2010 tag will be deprecated in a few weeks:

CentOS 6 is now the oldest supported CentOS release, and will receive maintenance updates through November 30th, 2020, [1] at which point it will reach end-of-life, and no further updates such as security patches will be made available. All wheels built under the manylinux2010 images will remain at obsolete versions after that point.

Also, glibc 2.17 was released only a year after 2.14: https://sourceware.org/glibc/wiki/Glibc%20Timeline

@StrikerRUS
Copy link
Collaborator

@mattsplats Thanks a lot for your response!

Also, glibc 2.17 was released only a year after 2.14

I remember there was strong demand on exact 2.14 version. So we cannot simply bump version without any reason.

CentOS 6 is reaching EOL

Unfortunately, a lot of our users are using even more outdated environments. FYI, EOL of Python 2 was about year ago but take a look at the following figure 😬 :

image

@seekingdeep
Copy link

seekingdeep commented Dec 28, 2020

@StrikerRUS @guolinke hi there,
Clearly running an arm based os using qemu should solve the build issue.
But what seems to be the issue, why arm64 is still not merged?
Noting that Microsoft have already released a Windows 10 version that runs on raspberry pi. Along with Ubuntu for arm.

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants