-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change Mask element to match Simd element #322
base: master
Are you sure you want to change the base?
Conversation
This seems like a pretty big downside to this approach... |
There is still the Take a look at how |
this makes me think Rust needs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the changes look mostly good, though imho these changes will make working with masks much more verbose, i'm not sure if this is a good idea...
@@ -313,11 +313,11 @@ where | |||
#[inline] | |||
pub fn gather_select( | |||
slice: &[T], | |||
enable: Mask<isize, LANES>, | |||
enable: Mask<usize, LANES>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note x86 has mask sizes match the data size, not the index/address size...we should probably match that:
https://www.felixcloutier.com/x86/vgatherdps:vgatherqps#vgatherqps--vex-128-version-
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opened #323 to track this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. With this change it would be easy to make this take Mask<T, LANES>
instead, but I'll leave that to a separate PR.
I don't think it will be "much" more verbose--in most situations the mask matches the vector type. In the cases where it doesn't, I think you're still typically going to do all of your operations with a single mask type, and have some It could definitely make some code more verbose, but I think this is an acceptable tradeoff because it doesn't have any performance implications, and I think it's significantly easier to explain |
Hmm. Having to drop |
In retrospect, I wouldn't even implement This change doesn't affect compilation, the layouts etc are still identical. As far as simpler I'm not aiming for "less verbose" but "less cognitive load". Imagine this as a new fn foo(x: Simd<f32, 4>, p: Simd<*const u8, 4>) -> Simd<*const u8, 4> {
let mask: Mask<f32, 4> = x.is_nan(); // would this make more sense as Mask<i32, 4> when there's no i32 anywhere?
mask.cast::<*const u8>().select(Simd::splat(std::ptr::null()), p) // what about isize here?
} IMO using signed integers also implies that the masks are just vectors (I know we document otherwise, but it's not helping). I think sometimes requiring a cast for |
Just a silly example of this being a flawed hint in fn foo(x: f32x8, y: i32x8, z: i32x8) -> i32x8 {
x.is_sign_positive().select(y, y + z)
} |
i wouldn't blame masks for that, i'd instead blame 2 out of 3 of the operations you're trying to do not being supported by AVX, requiring AVX2:
LLVM has therefore reasonably decided using SSE operations throughout is faster than using AVX load, conversion to SSE, SSE shift, conversion to AVX, AVX select, conversion to SSE, SSE int add, conversion to AVX, and finally AVX store. if you change it to the following, it uses AVX operations throughout because fp comparisons and fp/int bitwise logic are fully supported by AVX: pub fn foo(x: f32x8, y: i32x8, z: i32x8) -> i32x8 {
x.simd_gt(Simd::splat(0.0f32)).select(y, y ^ z)
}
note that the SSE <-> AVX moves are because of moving data from the high 128 bits to a separate register so it can be used for SSE ops since SSE instructions can only read/write the lower 128 bits (they technically can write zeros to the high 128-bits if encoded using the AVX encoding), it has nothing to do with that data being a This mess is all caused by Intel deciding the first AVX and SSE extensions only need fp and bitwise ops, no integer ops until AVX2/SSE2. IMHO it's completely reasonable to not use AVX at all unless AVX2 is available. |
I don't disagree with any of that--my point is that "f32 and i32 should use the same mask type because they are always compatible" isn't quite true. Any particular architecture will have varying support for different element types (altivec and v7 neon not supporting f64 is another example). I just don't think the API should be so opinionated to particularly accommodate some architectures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, your point that optimal mask types can vary based on element type, not just element size is a good one...
we'll probably want to wait and see what others think of your point before merging |
I'm still kinda mulling this over and I agree that what you cite is a bit of an oddity. I don't think the I agree that ideally we would have something that encourages either not caring much about the architecture's specifics or being aware that the architecture's specifics are... well, specific. Hm. |
No, I don't think it's a dealbreaker. Their absence will be badly missed, though... |
I found a trick with macros to implement all of the scalars, but still ran into an issue with:
because this still overlaps with How does everyone feel about merging this as-is, and hopefully getting |
sounds ok to me, as long as others are fine with it. |
Apologies for the delay in response. I have thought it over and I think this points to a need to revise our approach to Masks on a more fundamental level (sadly) (again!) but I have no objection to this change as-is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Thom approves I will go ahead and merge this.
I'm not a fan. Don't get me wrong, I don't love the old design either... but I think this might be the wrong direction. Here are some specific concerns around it:
A less specific (more vibes-based) concern is that each additional level of genericity and type-level computation we perform here complicates error messages, slows compiles, and monomorphizations users will have to suffer through. My feeling and experience is that that highly generic APIs like where this is headed are very divisive in the community. Things like As for the argument that perhaps an ISA will have different mask types based on vector element type, I think it is uncompelling. IMO it's a bad pattern to try to design for every possible hypothetical, as it's an unbounded set. A new SIMD ISA could be released that behaves in any possible way, and even if it's narrowed down to APIs that seem plausible, you still risk making the code of every user of the API more complicated for the benefit of something which may never exist. IOW, at a certain point, I think it's fine to say " That said, for this specific case I'm not convinced the change would make a difference even if such an ISA comes to pass. Concretely, compare this with mask representations that vary based on operation2, rather than the more common things like elem size, lane count, and so on. We don't worry about this so much because we assume that most of the time the API usage should be inlined, and if things are inlined we're hoping that LLVM can handle it3. ISTM like the same logic should apply. P.S. Very tired, apologies for any typos. Comment made anyway to avoid this landing without me saying something. Footnotes |
imho it's the other way around, currently we tend to require |
Mmk. I agree that we should go back to the drawing board on this part, honestly. I just was fine with trying to implement a redesign on top of a HEAD with this commit. I've been experimenting with bits of redesigns that use associated types, but in ways that more strongly bind the operations we are allowing to the types of I think we probably need more code examples in this repo of using our own API so that the consequences of our diffs will be more immediately obvious on user code to help settle this kind of concern in the future, rather than feeling like we have to hemm and haww for weeks. It should be more transparent whether something feels like a good or bad idea based on those examples. |
Regarding error messages etc, I can't see anything more straightforward than "the mask matches the vector element". We have already seen examples of people misunderstanding the masks and assuming the mask matches the vector and seeing very strange messages like "expected It would be nice if masks didn't depend on the vector element, but there's nothing we can do about that. If they must depend on the element type, this actually seems the least abstracted to me. I agree with @programmerjake regarding generics, I'm in the process of implementing |
If the real splitting point is on error messages, then maybe we need some ui test examples, then? We should do what we need to do to feel confident shipping stuff. |
I wonder how difficult it would be to add support for AFAIK, in llvm codegen any masks are truncated anyway to |
@jhorstmann I have been working on an RFC and implementation of generic integers directly into rustc and the language so that we can simply use that (but I have been frying a lot of fish, lately). |
It's a little confusing for the mask element type to be generic over "equivalently sized" integers, rather than the actual element type. This PR changes e.g.
Mask<isize, N>
toMask<*const u8, N>
, orMask<usize, N>
, or whatever the element type actually is. Another PR in the future could probably remove theMaskElement
trait entirely, but I think this is a good start.Also, I removed the
From
implementation for converting masks, because with this many valid element types it's not really reasonable to implement.