Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make AVX2-detection platform-independent #1600

Closed
wants to merge 6 commits into from

Conversation

h-vetinari
Copy link
Contributor

@h-vetinari h-vetinari commented Dec 30, 2020

In the context of conda-forge/faiss-split-feedstock#23, I discussed with some of the conda-folks how we should support AVX2 (and potentially other builds) for faiss. In the meantime, we'd like to follow the model that faiss itself is using (i.e. build with AVX2 and without and then load the corresponding library at runtime depending on CPU capabilities).

Since windows support for this is missing (and the other stuff is also non-portable in loader.py), I chased down numpy.distutils.cpuinfo, which is pretty outdated, and opened: numpy/numpy#18058

While the private API is obviously something that could change at any time, I still think it's better than platform-dependent shenanigans.

Opening this here to ideally upstream this right away, rather than carrying patches in the conda-forge feedstock.

TODO:

  • adapt conda recipe for windows in this repo to also build avx2 version

@facebook-github-bot
Copy link
Contributor

Hi @h-vetinari!

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file.

In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

@h-vetinari
Copy link
Contributor Author

@wickedfoo @mdouze
I'd like to build avx2 for windows too in conda-forge (see OP), and having your approval on this approach would make me more confident that I'm not going against your ideas somehow. PTAL

@mdouze
Copy link
Contributor

mdouze commented Jan 12, 2021

That would be great! Adding @beauby for the status of AVX2 on Windows (not sure if it is tested at all).
NB that in the Faiss Circle CI we do not test AVX2 because not all CI machines support it.
This means that a lot of codepaths are not exercised. For AVX2 on Linux we rely on the FB internal tests.

@h-vetinari
Copy link
Contributor Author

This means that a lot of codepaths are not exercised.

I was under the impression that it's just a compiler flag that's changing between building for avx2 or not...?



logger = logging.getLogger(__name__)

try:
instr_set = instruction_set()
if instr_set == "AVX2":
has_AVX2 = supports_AVX2()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to keep the instruction_set name here generic since Faiss also compiles on ARM and PPC.

@h-vetinari
Copy link
Contributor Author

Responded to the feedback @mdouze, PTAL :)

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdouze has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdouze has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@mdouze mdouze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment.

{"VSX": True, "VSX2": False, ...}
>>> instruction_set() # for ARM
{"NEON": True, "ASIMD": False, ...}
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is constrained on the numpy side but what about returning a set with the flags set to True?

For example, below the has_AVX2 check will fail on ARM with KeyError.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about the key error. I think the cleanest fix would just be to use .get("AVX2", False) on the output, which is fine even if some keys are not there.

Alternatively, we could "pad" the dictionary with the keys from the other architectures with value False. I don't like this idea though, because we'd be duplicating the list of instruction sets compared to numpy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also return a set in every case (in the case of numpy, selecting the keys for which the value is True).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's a good idea as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for using the set. Will get it through the FB internal checks.

It is a set of instruction sets, and "instruction_set" is not great for that
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdouze has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mdouze merged this pull request in 16b4e88.

facebook-github-bot pushed a commit that referenced this pull request Feb 12, 2021
Summary:
This patch is to provide log information in case of failing to load the AVX2-enabled lib,
and also to log the successful loading of the libraries.

Otherwise, the programmatic detection of which library was loaded also needs to take
into account which other log statements were executed, if the original import failed due to an exception,
cf. the tests in conda-forge/faiss-split-feedstock#27

Finally, this reduces the number of import statements by one - one non-AVX2 fallback is enough.

Related to #1600, #1680, #1681, #1682.

Pull Request resolved: #1683

Reviewed By: mdouze

Differential Revision: D26424333

Pulled By: beauby

fbshipit-source-id: 16beddec7e0c098b913a7f5420cbb02d1cf515ad
facebook-github-bot pushed a commit that referenced this pull request Feb 17, 2021
Summary:
Upstreaming patches from conda-forge/faiss-split-feedstock#27, follow-up (sorta) to #1600.

All these should be fairly uncontroversial, I think (mostly just oversights or stuff that never got
triggered on windows due to not having #1680 so far).

Things work without `#include <faiss/impl/platform_macros.h>`, but I preferred to be explicit here,
because the `__SSEx__` macros (that are used in the affected files) are only defined there for windows.

Pull Request resolved: #1681

Reviewed By: beauby

Differential Revision: D26454427

Pulled By: mdouze

fbshipit-source-id: 345e0ef45888f338e71bba004454a701572f9afb
facebook-github-bot pushed a commit that referenced this pull request Feb 17, 2021
Summary:
Upstreaming patches from conda-forge/faiss-split-feedstock#27, follow-up (sorta) to #1600.

Not sure if there are more CMake-native tricks to use here, but given that the flags don't have
an equivalent on the MSVC side, I think this approach is reasonable.

Without this patch, we would get:
```
cl : Command line warning D9002: ignoring unknown option '-mavx2'
cl : Command line warning D9002: ignoring unknown option '-mfma'
cl : Command line warning D9002: ignoring unknown option '-mf16c'
cl : Command line warning D9002: ignoring unknown option '-mpopcnt'
```

Pull Request resolved: #1680

Reviewed By: wickedfoo

Differential Revision: D26484347

Pulled By: beauby

fbshipit-source-id: 2803132f2d81fe37dc494fc4c824b6e240ae973b
facebook-github-bot pushed a commit that referenced this pull request Feb 17, 2021
Summary:
While working on conda-forge/faiss-split-feedstock#27, it turned out I needed
to patch `setup.py` anyway. In order to unify how the extension of the built lib is set, I fell back
to another patch that would/will become necessary if faiss ever wants to support PyPy
(see discussion in conda-forge/faiss-split-feedstock#22).

It would be nice if this was done natively by CMake, but as far as I can tell from
https://gitlab.kitware.com/cmake/cmake/-/issues/21070, cmake is not likely to do that right away.

I didn't particularly expect this patch to be upstreamed (especially if there is no interest for PyPy support, for example),
but beauby [invited](conda-forge/faiss-split-feedstock#27 (comment))
me to post it so here goes (plus necessary adaptations to the conda recipes)

Related to #1600, #1680, #1681

PS. I thought about using `logger.INFO` in case of an import failure for AVX2, but since it's a setup file,
I thought `print` would actually be more useful. Happy to change or remove if desired.

Pull Request resolved: #1682

Reviewed By: wickedfoo

Differential Revision: D26484393

Pulled By: beauby

fbshipit-source-id: 6cd2598838c4070dbf83d6f27ce15ce9faa6bf20
facebook-github-bot pushed a commit that referenced this pull request Feb 18, 2021
Summary:
Trying to compile windows for AVX2 in conda-forge/faiss-split-feedstock#27
(after #1600) surfaced a bunch of things (#1680, #1681, #1682), but the most voluminous problem
was MSVC being much worse at dealing with operator overloads and casts around `__m128` / `__m256`.

This lead to loads of errors that looked as follows:
```
[...]\faiss\utils\distances_simd.cpp(411): error C2676: binary '+=': '__m128' does not define this operator or a conversion to a type acceptable to the predefined operator
[...]\faiss\utils\distances_simd.cpp(440): error C2676: binary '-': '__m256' does not define this operator or a conversion to a type acceptable to the predefined operator
[...]\faiss\utils\distances_simd.cpp(441): error C2676: binary '*': 'const __m256' does not define this operator or a conversion to a type acceptable to the predefined operator
[...]\faiss\utils\distances_simd.cpp(446): error C2676: binary '+=': '__m128' does not define this operator or a conversion to a type acceptable to the predefined operator
[...]\faiss\utils\distances_simd.cpp(451): error C2676: binary '-': '__m128' does not define this operator or a conversion to a type acceptable to the predefined operator
[...]\faiss\utils\distances_simd.cpp(452): error C2676: binary '*': 'const __m128' does not define this operator or a conversion to a type acceptable to the predefined operator
[...]\faiss\utils\distances_simd.cpp(459): error C2676: binary '-': '__m128' does not define this operator or a conversion to a type acceptable to the predefined operator
[...]\faiss\utils\distances_simd.cpp(460): error C2676: binary '*': '__m128' does not define this operator or a conversion to a type acceptable to the predefined operator
[...]\faiss\utils\distances_simd.cpp(471): error C2440: '<function-style-cast>': cannot convert from '__m256i' to '__m256'
```

I've followed https://software.intel.com/sites/landingpage/IntrinsicsGuide/ to try to replace everything correctly,
but this will surely require close review, because I'm not sure how well these code-paths are checked by the
test suite.

In any case, with the commits from #1600 #1666 #1680 #1681 #1682, I was able to build `libfaiss` & `faiss`
for AVX2 on windows (while remaining "green" on linux/osx, both with & without AVX2).

Sidenote: the issues in the last commit (26fc7cf)
were uncovered by adding the `__SSE3__` compat macros in #1681.

Pull Request resolved: #1684

Test Plan: buck test //faiss/tests/...

Reviewed By: beauby

Differential Revision: D26454443

Pulled By: mdouze

fbshipit-source-id: 70df0818e357f1ecea6a056d619618df0236e0eb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants