Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting microarchitecture-specific builds #59

Open
chenghlee opened this issue Jul 15, 2023 · 9 comments
Open

Supporting microarchitecture-specific builds #59

chenghlee opened this issue Jul 15, 2023 · 9 comments

Comments

@chenghlee
Copy link

chenghlee commented Jul 15, 2023

In various cases, users would like to take advantage of various CPU-specific features or instructions for improved performance, e.g., AVX* or AES/crytography.

xref issues:

cc: @ltalirz

@ltalirz
Copy link

ltalirz commented Jul 15, 2023

Other relevant links:

@isuruf
Copy link

isuruf commented Jul 15, 2023

With __archspec you can create a meta-package like x86_64_feature_level that depend on specific __archspec values and downstreams will depend on x86_64_feature_level.

I think a CEP is needed for how to handle virtual packages in a conda-build sense. Currently, virtual packages are one of the main things from a build machine that affect compilation.

For eg: let's say A was built with AVX512f and B needs A built with AVX512f. Then B can be compiled only on a build machine with AVX512f. We can workaround by setting CONDA_OVERRIDE_ARCHSPEC, but then, BUILD_PREFIX also gets that architecture and if there is a build tool relying on AVX512f, that build tool cannot be run on a machine without AVX512f.
Therefore CONDA_OVERRIDE_ARCHSPEC should affect only host and not build. (Even host is problematic when considering python etc). This is also an issue with cross-compilation.

@ltalirz
Copy link

ltalirz commented Jul 15, 2023

Two questions/thoughts:

  1. What is the minimum microarchitecture of your current x86_64 CI build hardware (not the one you choose to build for)?

  2. Would it be acceptable to do cross-microarchitecture-compiles (e.g. compile x86_64_v4 on a runner that is just x86_64_v3), and let the feedstock maintainers who want this feature worry about making sure that they continue to compile any build tools with the lowest-common-denominator microarchitecture? It would unfortunately mean that one cannot run the tests for these builds, but it may still be better to standardize in some way rather than have people who want this feature run off into different directions.

@isuruf
Copy link

isuruf commented Jul 16, 2023

  1. Ivybridge

  2. Python blurs the line with build tools. For eg: numpy needs to be importable to get numpy.get_include() which means numpy needs to be built with the lowest-common-denominator architecture.

@h-vetinari
Copy link

Very interested in this for arrow, faiss, pillow, onednn, etc.

Numpy actually has a very elaborate dispatch mechanism (for most relevant functions) based on run-time detection of the available CPU features, so we likely wouldn't have to worry about numpy specifically - i.e. can keep building it for the lowest CPU feature level without losing much.

I guess for other feedstocks it should be based on an analysis whether the impact is worth the blow up in the build matrix, but it would be an amazing tool to have.

The one thing that this makes worse is overloading build string semantics (e.g. cpu vs. cuda, license family, openmp flavour, now arch spec, see conda/conda#11053), which is even more relevant due conda/conda#11612 still being open.

@ltalirz
Copy link

ltalirz commented Jul 16, 2023

Numpy actually has a very elaborate dispatch mechanism (for most relevant functions) based on run-time detection of the available CPU features

Ah, thanks for pointing this out, I was not aware - so this is even before you hit backends like the MKL (which also do this), correct?

While adding this dispatch capability may not be achievable for some packages (due to amount of work/knowledge required), technologically it would be the best solution in most cases, and I suspect that maintainers of smaller packages may often not be aware of the possibilities here (I certainly am/was not). When documenting the feature we propose in this thread, it would probably be wise to also point maintainers to resources on methods for selecting the best code for the microarchitecture at runtime, so they can avoid having to create separate builds for each microarchitecture altogether (e.g. here is an article from the Guix blog on function-multi-versioning that I found interesting).

@h-vetinari
Copy link

I don't think the numpy dispatch mechanism is feasible for others to reimplement, it's a very elaborate piece of engineering by some domain experts (not sure how realistic it would be to library-ize it for wider use... 🤔).

so this is even before you hit backends like the MKL (which also do this), correct?

It cannot affect external calls like to LAPACK, it just works for numpy-internal functions that get precompiled for various CPU features and then selected at runtime based on availability.

So this won't help for optimising blas/LAPACK etc. I think that for such compute-heavy libraries, we should just build v2/v3/v4 by default (once the infrastructure is in place, of course).

My point about doing this on an as-needed (resp. as-justified) basis is that not every package will have double-digit performance gains for higher CPU feature levels, and so we (in conda-forge) should not blindly multiply our build matrix by a factor of 3 on every feedstock.

@isuruf
Copy link

isuruf commented Jul 17, 2023

If Linux and glibc is all you are targeting, rpath token expansion is the easiest way to have x86_64 feature level specific binaries. For eg: https://github.com/conda-forge/gmp-feedstock/blob/main/recipe/build.sh#L28-L45 produces power8 and power9 binaries. It can be extended to x86_64 feature levels as well.

@tnabtaf
Copy link

tnabtaf commented Jul 17, 2023

Longer term, if we want to support RISC-V then we will probably want to support extensions as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants