-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AVX512-IFMA intrinsics. #676
Conversation
Not sure if the appveyor failures are related to these changes; the |
Yeah I think AppVeyor can be ignored but looks like some instruction assertions are failing on Travis? |
It appears that a It appears to be a bug in the @hdevalence note that for the |
Ooops, thanks for pointing this out, I'll fix it! |
Fixed; tested locally with |
Progress! It seems that the build is mostly green now :) The verification of the intrinsics is failing:
It seems that in the intel-x86.xml file states that these intrinsics require the |
I added a fixup pass for the CPUID names, hopefully it's OK; I'm not sure what to do about the remaining failure re: |
@hdevalence wow nice work, only one failure to go !
You can compile for an I'm not 100% sure, but I think that error message is trying to tell us that |
Hmm, I don't think I really understand what's going on. From your comment I gather that the reason that AVX512, AVX2 etc. are in I noticed that there's a hardcoded check here: https://github.com/rust-lang-nursery/stdsimd/blob/master/crates/stdsimd-verify/tests/x86-intel.rs#L362 which includes Sorry for all of the confusion on this; I'm happy to do whatever works or is correct. |
The main criteria here has always been empirical. When compiling for 32-bit
Ah yes, since the test pass on |
For x86/x86_64 the intention was that if you'd be able to compile code for i686-unknown-linux-gnu and run that with intrinsics and whatnot. It looks like avx/avx2/etc all work on i686-unknown-linux-gnu so long as the CPU does For intrinsics that take 64-bit types, though, they can't map to an instruction on i686-unknown-linux-gnu because there's no 64-bit registers, so they're moved into the x86_64 mod. It should be fine to define most avx-512 things in x86, but anything taking a u64/i64 argument should go into x86_64 |
@alexcrichton the weird thing here is that the |
Oh sure yeah LLVM "does the right thing" in that it gets the code to work on i686-unknown-linux-gnu, but it's sort of a lie and it exposes how LLVM has tons of polyfills for all sorts of SIMD operations. There's no native implementation of taking a 64-bit integer and broadcasting it to 512 bits on i686-unknown-linux-gnu, but LLVM can still codegen one. (we only want to stabilize official things, though, not things that LLVM happens to be able to do) |
So @hdevalence the safest thing is to just move that intrinsic to the |
Hmm, wouldn't the safest thing be to move all the AVX-512 intrinsics into the |
I don't really understand what this means; the "native" implementation according to https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm512_set1_epi64&expand=4927 is |
With We've had problems in the past with the code generated for functions using 64-bit integers in those targets, even though functions using SIMD registers are lowered just fine. My recommendation is to stick to the policy of moving functions using 64-bit integers to the |
Per rust-lang#676 (comment) , LLVM is able to generate code for this intrinsic on `x86` targets.
Updated so that |
Thank you so much! |
Awesome! I can try adding the mask variants later. Do you know what the average latency on "merged into stdsimd" -> "appears in nightly" is? |
Once this PR lands (rust-lang/rust#58373) the nightly afterwards should contain these changes. |
Cool, thanks for the information and the help with this PR! |
@hdevalence is any library using these already? We probably can start thinking about stabilizing these if that would help. |
I'm using them in |
I see, well let us know if that changes! |
Progress on #310, adding all the (unmasked) versions of the IFMA intrinsics.
Because AVX512VL extends the AVX512 intrinsics to
ymm
andxmm
operands, there's a question about where the 256-bit and 128-bit variants of the instructions should live. My feeling (mentioned in discussion in #310) is that the cleanest thing to do is to keep the length-extended intrinsics with their "parents", which is what I did here.I don't know what the CI situation is, but I can confirm that these tests do pass on my Cannonlake machine (and don't pass when the numbers are changed).