-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supporting microarchitecture-specific builds #59
Comments
Other relevant links:
|
With I think a CEP is needed for how to handle virtual packages in a conda-build sense. Currently, virtual packages are one of the main things from a build machine that affect compilation. For eg: let's say A was built with AVX512f and B needs A built with AVX512f. Then |
Two questions/thoughts:
|
|
Very interested in this for arrow, faiss, pillow, onednn, etc. Numpy actually has a very elaborate dispatch mechanism (for most relevant functions) based on run-time detection of the available CPU features, so we likely wouldn't have to worry about numpy specifically - i.e. can keep building it for the lowest CPU feature level without losing much. I guess for other feedstocks it should be based on an analysis whether the impact is worth the blow up in the build matrix, but it would be an amazing tool to have. The one thing that this makes worse is overloading build string semantics (e.g. cpu vs. cuda, license family, openmp flavour, now arch spec, see conda/conda#11053), which is even more relevant due conda/conda#11612 still being open. |
Ah, thanks for pointing this out, I was not aware - so this is even before you hit backends like the MKL (which also do this), correct? While adding this dispatch capability may not be achievable for some packages (due to amount of work/knowledge required), technologically it would be the best solution in most cases, and I suspect that maintainers of smaller packages may often not be aware of the possibilities here (I certainly am/was not). When documenting the feature we propose in this thread, it would probably be wise to also point maintainers to resources on methods for selecting the best code for the microarchitecture at runtime, so they can avoid having to create separate builds for each microarchitecture altogether (e.g. here is an article from the Guix blog on function-multi-versioning that I found interesting). |
I don't think the numpy dispatch mechanism is feasible for others to reimplement, it's a very elaborate piece of engineering by some domain experts (not sure how realistic it would be to library-ize it for wider use... 🤔).
It cannot affect external calls like to LAPACK, it just works for numpy-internal functions that get precompiled for various CPU features and then selected at runtime based on availability. So this won't help for optimising blas/LAPACK etc. I think that for such compute-heavy libraries, we should just build v2/v3/v4 by default (once the infrastructure is in place, of course). My point about doing this on an as-needed (resp. as-justified) basis is that not every package will have double-digit performance gains for higher CPU feature levels, and so we (in conda-forge) should not blindly multiply our build matrix by a factor of 3 on every feedstock. |
If Linux and glibc is all you are targeting, rpath token expansion is the easiest way to have x86_64 feature level specific binaries. For eg: https://github.com/conda-forge/gmp-feedstock/blob/main/recipe/build.sh#L28-L45 produces power8 and power9 binaries. It can be extended to x86_64 feature levels as well. |
Longer term, if we want to support RISC-V then we will probably want to support extensions as well. |
In various cases, users would like to take advantage of various CPU-specific features or instructions for improved performance, e.g., AVX* or AES/crytography.
xref issues:
cc: @ltalirz
The text was updated successfully, but these errors were encountered: