-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make AVX2-detection platform-independent #1600
Conversation
Hi @h-vetinari! Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks! |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
@wickedfoo @mdouze |
That would be great! Adding @beauby for the status of AVX2 on Windows (not sure if it is tested at all). |
I was under the impression that it's just a compiler flag that's changing between building for avx2 or not...? |
faiss/python/loader.py
Outdated
|
||
|
||
logger = logging.getLogger(__name__) | ||
|
||
try: | ||
instr_set = instruction_set() | ||
if instr_set == "AVX2": | ||
has_AVX2 = supports_AVX2() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to keep the instruction_set
name here generic since Faiss also compiles on ARM and PPC.
Responded to the feedback @mdouze, PTAL :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mdouze has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mdouze has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comment.
{"VSX": True, "VSX2": False, ...} | ||
>>> instruction_set() # for ARM | ||
{"NEON": True, "ASIMD": False, ...} | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is constrained on the numpy side but what about returning a set with the flags set to True?
For example, below the has_AVX2 check will fail on ARM with KeyError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about the key error. I think the cleanest fix would just be to use .get("AVX2", False)
on the output, which is fine even if some keys are not there.
Alternatively, we could "pad" the dictionary with the keys from the other architectures with value False
. I don't like this idea though, because we'd be duplicating the list of instruction sets compared to numpy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also return a set in every case (in the case of numpy, selecting the keys for which the value is True
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that's a good idea as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for using the set. Will get it through the FB internal checks.
It is a set of instruction sets, and "instruction_set" is not great for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mdouze has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: This patch is to provide log information in case of failing to load the AVX2-enabled lib, and also to log the successful loading of the libraries. Otherwise, the programmatic detection of which library was loaded also needs to take into account which other log statements were executed, if the original import failed due to an exception, cf. the tests in conda-forge/faiss-split-feedstock#27 Finally, this reduces the number of import statements by one - one non-AVX2 fallback is enough. Related to #1600, #1680, #1681, #1682. Pull Request resolved: #1683 Reviewed By: mdouze Differential Revision: D26424333 Pulled By: beauby fbshipit-source-id: 16beddec7e0c098b913a7f5420cbb02d1cf515ad
Summary: Upstreaming patches from conda-forge/faiss-split-feedstock#27, follow-up (sorta) to #1600. All these should be fairly uncontroversial, I think (mostly just oversights or stuff that never got triggered on windows due to not having #1680 so far). Things work without `#include <faiss/impl/platform_macros.h>`, but I preferred to be explicit here, because the `__SSEx__` macros (that are used in the affected files) are only defined there for windows. Pull Request resolved: #1681 Reviewed By: beauby Differential Revision: D26454427 Pulled By: mdouze fbshipit-source-id: 345e0ef45888f338e71bba004454a701572f9afb
Summary: Upstreaming patches from conda-forge/faiss-split-feedstock#27, follow-up (sorta) to #1600. Not sure if there are more CMake-native tricks to use here, but given that the flags don't have an equivalent on the MSVC side, I think this approach is reasonable. Without this patch, we would get: ``` cl : Command line warning D9002: ignoring unknown option '-mavx2' cl : Command line warning D9002: ignoring unknown option '-mfma' cl : Command line warning D9002: ignoring unknown option '-mf16c' cl : Command line warning D9002: ignoring unknown option '-mpopcnt' ``` Pull Request resolved: #1680 Reviewed By: wickedfoo Differential Revision: D26484347 Pulled By: beauby fbshipit-source-id: 2803132f2d81fe37dc494fc4c824b6e240ae973b
Summary: While working on conda-forge/faiss-split-feedstock#27, it turned out I needed to patch `setup.py` anyway. In order to unify how the extension of the built lib is set, I fell back to another patch that would/will become necessary if faiss ever wants to support PyPy (see discussion in conda-forge/faiss-split-feedstock#22). It would be nice if this was done natively by CMake, but as far as I can tell from https://gitlab.kitware.com/cmake/cmake/-/issues/21070, cmake is not likely to do that right away. I didn't particularly expect this patch to be upstreamed (especially if there is no interest for PyPy support, for example), but beauby [invited](conda-forge/faiss-split-feedstock#27 (comment)) me to post it so here goes (plus necessary adaptations to the conda recipes) Related to #1600, #1680, #1681 PS. I thought about using `logger.INFO` in case of an import failure for AVX2, but since it's a setup file, I thought `print` would actually be more useful. Happy to change or remove if desired. Pull Request resolved: #1682 Reviewed By: wickedfoo Differential Revision: D26484393 Pulled By: beauby fbshipit-source-id: 6cd2598838c4070dbf83d6f27ce15ce9faa6bf20
Summary: Trying to compile windows for AVX2 in conda-forge/faiss-split-feedstock#27 (after #1600) surfaced a bunch of things (#1680, #1681, #1682), but the most voluminous problem was MSVC being much worse at dealing with operator overloads and casts around `__m128` / `__m256`. This lead to loads of errors that looked as follows: ``` [...]\faiss\utils\distances_simd.cpp(411): error C2676: binary '+=': '__m128' does not define this operator or a conversion to a type acceptable to the predefined operator [...]\faiss\utils\distances_simd.cpp(440): error C2676: binary '-': '__m256' does not define this operator or a conversion to a type acceptable to the predefined operator [...]\faiss\utils\distances_simd.cpp(441): error C2676: binary '*': 'const __m256' does not define this operator or a conversion to a type acceptable to the predefined operator [...]\faiss\utils\distances_simd.cpp(446): error C2676: binary '+=': '__m128' does not define this operator or a conversion to a type acceptable to the predefined operator [...]\faiss\utils\distances_simd.cpp(451): error C2676: binary '-': '__m128' does not define this operator or a conversion to a type acceptable to the predefined operator [...]\faiss\utils\distances_simd.cpp(452): error C2676: binary '*': 'const __m128' does not define this operator or a conversion to a type acceptable to the predefined operator [...]\faiss\utils\distances_simd.cpp(459): error C2676: binary '-': '__m128' does not define this operator or a conversion to a type acceptable to the predefined operator [...]\faiss\utils\distances_simd.cpp(460): error C2676: binary '*': '__m128' does not define this operator or a conversion to a type acceptable to the predefined operator [...]\faiss\utils\distances_simd.cpp(471): error C2440: '<function-style-cast>': cannot convert from '__m256i' to '__m256' ``` I've followed https://software.intel.com/sites/landingpage/IntrinsicsGuide/ to try to replace everything correctly, but this will surely require close review, because I'm not sure how well these code-paths are checked by the test suite. In any case, with the commits from #1600 #1666 #1680 #1681 #1682, I was able to build `libfaiss` & `faiss` for AVX2 on windows (while remaining "green" on linux/osx, both with & without AVX2). Sidenote: the issues in the last commit (26fc7cf) were uncovered by adding the `__SSE3__` compat macros in #1681. Pull Request resolved: #1684 Test Plan: buck test //faiss/tests/... Reviewed By: beauby Differential Revision: D26454443 Pulled By: mdouze fbshipit-source-id: 70df0818e357f1ecea6a056d619618df0236e0eb
In the context of conda-forge/faiss-split-feedstock#23, I discussed with some of the conda-folks how we should support AVX2 (and potentially other builds) for faiss. In the meantime, we'd like to follow the model that faiss itself is using (i.e. build with AVX2 and without and then load the corresponding library at runtime depending on CPU capabilities).
Since windows support for this is missing (and the other stuff is also non-portable in
loader.py
), I chased downnumpy.distutils.cpuinfo
, which is pretty outdated, and opened: numpy/numpy#18058While the private API is obviously something that could change at any time, I still think it's better than platform-dependent shenanigans.
Opening this here to ideally upstream this right away, rather than carrying patches in the conda-forge feedstock.
TODO: