Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out better way for dealing with CPU arches #23

Open
4 tasks
h-vetinari opened this issue Dec 22, 2020 · 6 comments
Open
4 tasks

Figure out better way for dealing with CPU arches #23

h-vetinari opened this issue Dec 22, 2020 · 6 comments

Comments

@h-vetinari
Copy link
Member

While debugging #22, I found out that the upstream build-refactor that I tried to follow in #17 changed things so that upstream always compiles for both AVX2 & default profiles, and then switches at runtime using:
https://github.com/facebookresearch/faiss/blob/v1.6.5/faiss/python/loader.py#L27-L39

Of course we could mimick that, but given the infrastructure added in conda/conda#9930 by @isuruf, I'm thinking we could do better. Not sure if this is ready for primetime yet -- as in, does CF infra support this already?

I see the following todos (restricted to this feedstock):

  • Figure out if we want to build for AVX2, resp. other profiles
  • If yes, decide trade-off between packaging size impact for packaging several (probably negligible) and multiplication of CI jobs
  • Patch the upstream loader depending on outcome
    • If we keep the upstream approach, add windows detection (how?) and upstream the patch

CC @beckermr @mbargull @jakirkham

@h-vetinari h-vetinari mentioned this issue Dec 22, 2020
5 tasks
@isuruf
Copy link
Member

isuruf commented Dec 22, 2020

I'd suggest going with upstream and shipping both. archspec based packages are not ready yet.

@h-vetinari
Copy link
Member Author

Thanks @isuruf for that update! Is there any place to track the archspec-based approach so I can subscribe (or possibly help) and test/apply it once done?

facebook-github-bot pushed a commit to facebookresearch/faiss that referenced this issue Feb 3, 2021
Summary:
In the context of conda-forge/faiss-split-feedstock#23, I discussed with some of the conda-folks how we should support AVX2 (and potentially other builds) for faiss. In the meantime, we'd like to follow the model that faiss itself is using (i.e. build with AVX2 and without and then load the corresponding library at runtime depending on CPU capabilities).

Since windows support for this is missing (and the other stuff is also non-portable in `loader.py`), I chased down `numpy.distutils.cpuinfo`, which is pretty outdated, and opened: numpy/numpy#18058

While the [private API](numpy/numpy#18058 (comment)) is obviously something that _could_ change at any time, I still think it's better than platform-dependent shenanigans.

Opening this here to ideally upstream this right away, rather than carrying patches in the conda-forge feedstock.

TODO:
* [ ] adapt conda recipe for windows in this repo to also build avx2 version

Pull Request resolved: #1600

Reviewed By: beauby

Differential Revision: D25994705

Pulled By: mdouze

fbshipit-source-id: 9986bcfd4be0f232a57c0a844c72ec0e308fff19
@h-vetinari
Copy link
Member Author

Fortunately, there's now an issue to track progress on this: conda-forge/conda-forge.github.io#1261

@ngam
Copy link

ngam commented Apr 29, 2022

I'm thinking we could do better

In what way? Size and unnecessary complexity?

I am often tempted to think about this in general like the noarch situation --- one package for all, but maybe that's the naive way to think about this 😅

@h-vetinari
Copy link
Member Author

I'm thinking we could do better

In what way? Size and unnecessary complexity?

Maximize performance & minimize binary size, by building different package variants (e.g. for the x86-v2 - x86-v4 levels) and having each user automatically download the highest-supported binary according to their CPU, via the __archspec virtual package.

one package for all, but maybe that's the naive way to think about this 😅

Doing that doubles (or triples, if building for AVX512 as well) build time & binary size for no other reason other than that we cannot (currently) do better infrastructurally, and don't want to completely forgo the performance benefits that newer CPU arches have via-à-vis SSE4.

@h-vetinari
Copy link
Member Author

Now that the feedstock has been revived, this can be tackled. Should be easy to do (c.f. the docs), but need to figure out how to do this on windows. We'll also need to patch the upstream builds a bit, but we were already doing that anyway (and there's a bunch of patches to drop as well...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants