-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIMD intrinsics often fail to inline #53069
Comments
As far as I can tell all the intrinsics that fail to inline require target_features that aren't enabled by default (SSE3 for abs_epi16, SSE4.1 for blendv_epi8). So the fix is to add the appropriate Not inlining between functions with different target_feature sets is generally required for correctness, but in this case (as in many others) it's unfortunately a big performance footgun. |
This is working as intended, and the things that we could improve to make this better already have issues / RFCs filled, so we should probably close this as working as intended / dupplicate.
Executing SSE3 and SSE4.1 intrinsics on hardware that does not support them is undefined behavior and not inlining these intrinsics is required for correctness when, for example, doing run-time feature detection. Otherwise, code like this could fail: // sse2 function
if has_sse3 { // detect sse3 at run-time
// executing this on sse2 hardware is UB
// so this cannot be speculatively executed, re-ordered out of the if, etc.
sse3_intrinsics();
} If you know, your function will only be executed on SSE4.1 hardware, you can use These solutions are far from perfect, but the top docs of Also, we should obviously be warning about this, but it was decided that doing so would be the job of the portability lint, and warning about this is hard: fn foo(x: bool) { // SSE2
avx(); // WARNING
if detect("avx") {
avx(); // OK (NO WARNING)
}
let b = detect("avx");
if b { avx(); } // OK (NO WARNING)
if x { avx() } // ??? MIGHT BE OK
} but that should probably be raised there: https://github.com/rust-lang-nursery/portability-wg (EDIT: reported this here: rust-lang-nursery/portability-wg#17 (comment)) |
This can be closed as working as intended. |
Here's my code:
Note that
mm_blendv_epi8
fails to inline, ruining performance.This happens a lot and it makes using SIMD intrinsics very annoying. I have to start using inline asm.
The text was updated successfully, but these errors were encountered: