-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
simd intrinsics: add simd_shuffle_generic and other missing intrinsics #119213
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -190,14 +190,27 @@ extern "platform-intrinsic" { | |
/// | ||
/// `T` must be a vector. | ||
/// | ||
/// `U` must be a const array of `i32`s. | ||
/// `U` must be a **const** array of `i32`s. This means it must either refer to a named | ||
/// const or be given as an inline const expression (`const { ... }`). | ||
/// | ||
/// `V` must be a vector with the same element type as `T` and the same length as `U`. | ||
/// | ||
/// Concatenates `x` and `y`, then returns a new vector such that each element is selected from | ||
/// the concatenation by the matching index in `idx`. | ||
/// Returns a new vector such that element `i` is selected from `xy[idx[i]]`, where `xy` | ||
/// is the concatenation of `x` and `y`. It is a compile-time error if `idx[i]` is out-of-bounds | ||
/// of `xy`. | ||
pub fn simd_shuffle<T, U, V>(x: T, y: T, idx: U) -> V; | ||
|
||
/// Shuffle two vectors by const indices. | ||
/// | ||
/// `T` must be a vector. | ||
/// | ||
/// `U` must be a vector with the same element type as `T` and the same length as `IDX`. | ||
/// | ||
/// Returns a new vector such that element `i` is selected from `xy[IDX[i]]`, where `xy` | ||
/// is the concatenation of `x` and `y`. It is a compile-time error if `IDX[i]` is out-of-bounds | ||
/// of `xy`. | ||
workingjubilee marked this conversation as resolved.
Show resolved
Hide resolved
|
||
pub fn simd_shuffle_generic<T, U, const IDX: &'static [u32]>(x: T, y: T) -> U; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I haven't seen this intrinsic before 🤔 this is what we need in std::simd There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's an experiment to see how far we can get with the current const-generic support. I don't think std::simd can use it yet, that would need generic_const_exprs which is still a highly experimental feature. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it would be nice if we could stabilize a small enough subset of GCE that this sort of thing becomes feasible. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From talking with @lcnr that seems pretty far out currently. And in particular reference types in const generics have pretty thorny unsolved theoretical questions as well. |
||
|
||
/// Read a vector of pointers. | ||
/// | ||
/// `T` must be a vector. | ||
|
@@ -232,6 +245,9 @@ extern "platform-intrinsic" { | |
/// corresponding value in `val` to the pointer. | ||
/// Otherwise if the corresponding value in `mask` is `0`, do nothing. | ||
/// | ||
/// The stores happen in left-to-right order. | ||
/// (This is relevant in case two of the stores overlap.) | ||
Comment on lines
+248
to
+249
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ( probably not relevant, but fun facts to know and tell: if the machine instruction gets executed, this guarantee also affects the state of the CPU during exception handling. ) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wait, exception handles can observe these instructions as non-atomic? Wow. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, on Intel CPUs at least, for most variants of these instructions, the operands (esp. the mask operand) get updated so that continuing with the same operands will complete the operation, completing a single store for each index. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do you mean "get updated"? Does this change the contents of some other register, where the mask operand is stored? I hope it restored the original value when the instruction is done? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, the mask register.
uhhh probably! 😁 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
nope, it's defined to clear the mask register: https://www.felixcloutier.com/x86/vscatterdps:vscatterdpd:vscatterqps:vscatterqpd There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay... well as long as LLVM takes that into account it's all good. |
||
/// | ||
/// # Safety | ||
/// Unmasked values in `T` must be writeable as if by `<ptr>::write` (e.g. aligned to the element | ||
/// type). | ||
|
@@ -468,4 +484,36 @@ extern "platform-intrinsic" { | |
/// | ||
/// `T` must be a vector of integers. | ||
pub fn simd_cttz<T>(x: T) -> T; | ||
|
||
/// Round up each element to the next highest integer-valued float. | ||
/// | ||
/// `T` must be a vector of floats. | ||
pub fn simd_ceil<T>(x: T) -> T; | ||
|
||
/// Round down each element to the next lowest integer-valued float. | ||
/// | ||
/// `T` must be a vector of floats. | ||
pub fn simd_floor<T>(x: T) -> T; | ||
|
||
/// Round each element to the closest integer-valued float. | ||
/// Ties are resolved by rounding away from 0. | ||
/// | ||
/// `T` must be a vector of floats. | ||
pub fn simd_round<T>(x: T) -> T; | ||
|
||
/// Return the integer part of each element as an integer-valued float. | ||
/// In other words, non-integer values are truncated towards zero. | ||
/// | ||
/// `T` must be a vector of floats. | ||
pub fn simd_trunc<T>(x: T) -> T; | ||
|
||
/// Takes the square root of each element. | ||
/// | ||
/// `T` must be a vector of floats. | ||
pub fn simd_fsqrt<T>(x: T) -> T; | ||
|
||
/// Computes `(x*y) + z` for each element, but without any intermediate rounding. | ||
/// | ||
/// `T` must be a vector of floats. | ||
pub fn simd_fma<T>(x: T, y: T, z: T) -> T; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, good catch. ^^;