-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
_mm_loadl_epi64 doesn't allow reads aligned to 8-byte boundaries #582
Comments
You can ask Intel for clarification here: https://software.intel.com/en-us/forums/intel-isa-extensions/topic/363747 It might be interesting to survey what GCC, MSVC, and the Intel compiler do here. If they all support unaligned loads, then doing the same here is the right call. |
On a related note, |
Might be worth it to fill a separate issue to track that (or just send a PR). |
I went ahead and asked in that thread, and someone responded to point out that the manual says that in general, alignment isn't required unless otherwise specified. I'm not sure how much stock to put in that for this case, though. I've been poking at godbolt to try to come up with a convincing argument about what C does one way or the other, but I'm finding the output is hard to reason about, and one _mm_loadl_epi64 seems to yield on average 5 instructions. |
So IIUC, because
You mentioned before that clang supports unaligned loads via the intrinsic. Does GCC support unaligned loads as well? If so, we should probably support them too and you could just send a PR since that would be a backwards compatible change. |
Sure.
|
I'm looking at the code in sse2.rs but I don't think I actually get how this works. The body of these functions is the fallback code used if sse2 isn't supported, right? I'm not sure how to modify how the intrinsic itself works. |
This works now. Thank you! |
This PR (rust-lang/rust#55610) updates stdsimd in Rust to a version containing this fix. Once that is merged you should be able to use |
std::arch::x86_64::_mm_loadl_epi64
is kind of a weird case. Intel guides say it takes a*const __m128i
, but the documentation is unclear on whether this needs to be aligned https://software.intel.com/en-us/node/524242 Note thatload
(must be aligned) andloadu
(need not be aligned) are defined whileloadl
isn't. I believe the correct definition is that it need be aligned, but only to an 8-byte boundary (instead of the full 16) but I haven't been able to find documentation backing this up.Clang's intrinsics header actually does go out of its way to allow this to be aligned to an 8-byte boundary, not a 16-byte boundary https://github.com/llvm-mirror/clang/blob/master/lib/Headers/emmintrin.h#L3587
This instruction produces the expected result on a 8- but not 16-byte aligned pointer in clang, but yields a segmentation fault on the same class of pointer in Rust.
The text was updated successfully, but these errors were encountered: