Change Mask element to match Simd element #322

calebzulawski · 2022-12-14T05:40:10Z

It's a little confusing for the mask element type to be generic over "equivalently sized" integers, rather than the actual element type. This PR changes e.g. Mask<isize, N> to Mask<*const u8, N>, or Mask<usize, N>, or whatever the element type actually is. Another PR in the future could probably remove the MaskElement trait entirely, but I think this is a good start.

Also, I removed the From implementation for converting masks, because with this many valid element types it's not really reasonable to implement.

thomcc · 2022-12-14T05:42:00Z

Also, I removed the From implementation for converting masks, because with this many valid element types it's not really reasonable to implement.

This seems like a pretty big downside to this approach...

calebzulawski · 2022-12-14T05:45:13Z

There is still the cast function, which is probably more ergonomic because you can do something like mask.cast::<u8>(). This corresponds to how Simd is converted.

Take a look at how From was previously implemented--it's manually implemented over every combination of types, so it explodes quadratically. This is only a problem because we can't write the implementation impl<T, U, const N: usize> From<Mask<T, N>> for Mask<U, N> that conflicts with the blanket implementation.

programmerjake · 2022-12-14T06:11:02Z

This is only a problem because we can't write the implementation impl<T, U, const N: usize> From<Mask<T, N>> for Mask<U, N> that conflicts with the blanket implementation.

this makes me think Rust needs where T != U bounds, so we can implement From for everything not covered by From<T> for T

programmerjake

the changes look mostly good, though imho these changes will make working with masks much more verbose, i'm not sure if this is a good idea...

programmerjake · 2022-12-14T06:21:25Z

crates/core_simd/src/vector.rs

@@ -313,11 +313,11 @@ where
    #[inline]
    pub fn gather_select(
        slice: &[T],
-        enable: Mask<isize, LANES>,
+        enable: Mask<usize, LANES>,


note x86 has mask sizes match the data size, not the index/address size...we should probably match that:
https://www.felixcloutier.com/x86/vgatherdps:vgatherqps#vgatherqps--vex-128-version-

opened #323 to track this.

Good point. With this change it would be easy to make this take Mask<T, LANES> instead, but I'll leave that to a separate PR.

calebzulawski · 2022-12-15T23:15:18Z

I don't think it will be "much" more verbose--in most situations the mask matches the vector type. In the cases where it doesn't, I think you're still typically going to do all of your operations with a single mask type, and have some casts before or after (much like into).

It could definitely make some code more verbose, but I think this is an acceptable tradeoff because it doesn't have any performance implications, and I think it's significantly easier to explain Mask<*const T, N> vs Mask<isize, N>, not to mention it carries much more semantic meaning. We've already seen some confusion as to what the mask element type is and why.

workingjubilee · 2022-12-19T04:47:43Z

Hmm. Having to drop From really hurts, still, and I'm a little worried about all the extra instances of .cast() that have to be thrown around. What are some examples of code that this makes simpler? Or is there any code that would be compiled better with this suite of implementations?

calebzulawski · 2022-12-19T05:39:44Z

In retrospect, I wouldn't even implement From with our current masks (the number of implementations doesn't sit right with me, and they just call cast anyway). If that's the dealbreaker I can still implement it for this PR.

This change doesn't affect compilation, the layouts etc are still identical. As far as simpler I'm not aiming for "less verbose" but "less cognitive load". Imagine this as a new std::simd user who might not even be particularly well versed in SIMD:

fn foo(x: Simd<f32, 4>, p: Simd<*const u8, 4>) -> Simd<*const u8, 4> {
    let mask: Mask<f32, 4> = x.is_nan(); // would this make more sense as Mask<i32, 4> when there's no i32 anywhere?
    mask.cast::<*const u8>().select(Simd::splat(std::ptr::null()), p) // what about isize here?
}

IMO using signed integers also implies that the masks are just vectors (I know we document otherwise, but it's not helping). I think sometimes requiring a cast for select etc, but not always, cements in the API that cast is expensive, rather than target-specific. Considering all of the newer instruction sets seem to use bitmasks (where cast is always completely free), I don't think that's the right implication.

calebzulawski · 2022-12-19T06:02:54Z

Just a silly example of this being a flawed hint in std::simd today, this is not so great on AVX (not AVX2). The f32 section drops to SSE (despite no cast hinting that something funny might happen). With more complex code it might still use AVX and require an extra move (cheap, but not free like an AVX-512 cast) to create two SSE masks:

fn foo(x: f32x8, y: i32x8, z: i32x8) -> i32x8 {
    x.is_sign_positive().select(y, y + z)
}

programmerjake · 2022-12-19T11:35:04Z

Just a silly example of this being a flawed hint in std::simd today, this is not so great on AVX (not AVX2).

i wouldn't blame masks for that, i'd instead blame 2 out of 3 of the operations you're trying to do not being supported by AVX, requiring AVX2:

is_sign_positive -- not actually a fp operation, is really transmute(v) >> 31 -- requires AVX2
i32x8::add -- not a fp operation -- requires AVX2

LLVM has therefore reasonably decided using SSE operations throughout is faster than using AVX load, conversion to SSE, SSE shift, conversion to AVX, AVX select, conversion to SSE, SSE int add, conversion to AVX, and finally AVX store.

if you change it to the following, it uses AVX operations throughout because fp comparisons and fp/int bitwise logic are fully supported by AVX:
https://rust.godbolt.org/z/nr484jdhv

pub fn foo(x: f32x8, y: i32x8, z: i32x8) -> i32x8 {
    x.simd_gt(Simd::splat(0.0f32)).select(y, y ^ z)
}

The f32 section drops to SSE (despite no cast hinting that something funny might happen). With more complex code it might still use AVX and require an extra move (cheap, but not free like an AVX-512 cast) to create two SSE masks:

note that the SSE <-> AVX moves are because of moving data from the high 128 bits to a separate register so it can be used for SSE ops since SSE instructions can only read/write the lower 128 bits (they technically can write zeros to the high 128-bits if encoded using the AVX encoding), it has nothing to do with that data being a Mask or not.

This mess is all caused by Intel deciding the first AVX and SSE extensions only need fp and bitwise ops, no integer ops until AVX2/SSE2. IMHO it's completely reasonable to not use AVX at all unless AVX2 is available.

calebzulawski · 2022-12-19T15:36:47Z

I don't disagree with any of that--my point is that "f32 and i32 should use the same mask type because they are always compatible" isn't quite true. Any particular architecture will have varying support for different element types (altivec and v7 neon not supporting f64 is another example). I just don't think the API should be so opinionated to particularly accommodate some architectures.

programmerjake

ah, your point that optimal mask types can vary based on element type, not just element size is a good one...

programmerjake · 2022-12-19T15:58:12Z

we'll probably want to wait and see what others think of your point before merging

workingjubilee · 2022-12-28T01:47:46Z

I'm still kinda mulling this over and I agree that what you cite is a bit of an oddity.

I don't think the From implementations are a dealbreaker. Maybe Thom would?

I agree that ideally we would have something that encourages either not caring much about the architecture's specifics or being aware that the architecture's specifics are... well, specific. Hm.

thomcc · 2022-12-28T02:27:13Z

Maybe Thom would?

No, I don't think it's a dealbreaker. Their absence will be badly missed, though...

calebzulawski · 2023-01-21T19:33:26Z

I found a trick with macros to implement all of the scalars, but still ran into an issue with:

impl<T, U, const LANES: usize> From<Mask<*const T, LANES>> for Mask<*const U, LANES>

because this still overlaps with From<T> for T.

How does everyone feel about merging this as-is, and hopefully getting From in the future?

programmerjake · 2023-01-22T03:20:21Z

How does everyone feel about merging this as-is, and hopefully getting From in the future?

sounds ok to me, as long as others are fine with it.

workingjubilee · 2023-02-06T13:54:58Z

Apologies for the delay in response. I have thought it over and I think this points to a need to revise our approach to Masks on a more fundamental level (sadly) (again!) but I have no objection to this change as-is.

workingjubilee

If Thom approves I will go ahead and merge this.

thomcc · 2023-03-11T00:17:03Z

I'm not a fan. Don't get me wrong, I don't love the old design either... but I think this might be the wrong direction. Here are some specific concerns around it:

This change is already causing us friction in our APIs (e.g From can't be implemented the way we would like), and I would be unsurprised if user code wouldn't hit issue in their own traits after the change.
From the patch diff, it seems likely users of std::simd will end needing to use complicated types and projections in their functions and/or signatures like Simd<<T as SimdElement>::Mask, LANES> or <Simd<T, LANES> as SimdPartialEq>::Mask in order to express the operations they want to perform. This is very complex, and IMO needs much stronger motivation than given.
Several cases where Mask::foo used to turbofish it now fails to longer being able to be inferred without additional turbofishing. This will probably result in confusion and bad error messages.

A less specific (more vibes-based) concern is that each additional level of genericity and type-level computation we perform here complicates error messages, slows compiles, and monomorphizations users will have to suffer through.

My feeling and experience is that that highly generic APIs like where this is headed are very divisive in the community. Things like rand, diesel, etc are great pieces of engineering. This isn't at those levels yet, for sure but this is pushing us further beyond anything currently in the stdlib (with the possible exception of the std::ops machinery for ?, which is a smaller surface and has some tricky compatibility constraints).

As for the argument that perhaps an ISA will have different mask types based on vector element type, I think it is uncompelling. IMO it's a bad pattern to try to design for every possible hypothetical, as it's an unbounded set. A new SIMD ISA could be released that behaves in any possible way, and even if it's narrowed down to APIs that seem plausible, you still risk making the code of every user of the API more complicated for the benefit of something which may never exist. IOW, at a certain point, I think it's fine to say "std::arch is still available"¹.

That said, for this specific case I'm not convinced the change would make a difference even if such an ISA comes to pass. Concretely, compare this with mask representations that vary based on operation², rather than the more common things like elem size, lane count, and so on. We don't worry about this so much because we assume that most of the time the API usage should be inlined, and if things are inlined we're hoping that LLVM can handle it³. ISTM like the same logic should apply.

P.S. Very tired, apologies for any typos. Comment made anyway to avoid this landing without me saying something.

Hell, I already have to use std::arch just get good performance out of std::simd in many cases... ↩
That is, instruction used; some may use bitmasks, others full size masks, etc. ↩
if it can remains to be seen in both cases, admittedly... ↩

programmerjake · 2023-03-11T00:22:45Z

From the patch diff, it seems likely users of std::simd will end needing to use complicated types and projections in their functions and/or signatures like Simd<<T as SimdElement>::Mask, LANES> or <Simd<T, LANES> as SimdPartialEq>::Mask in order to express the operations they want to perform. This is very complex, and IMO needs much stronger motivation than given.

imho it's the other way around, currently we tend to require Mask<T::Mask, N> to be able to mask Simd<T, N> operations, with this we can just use Mask<T, N> for Simd<T, N>, no complex associated types needed by the users.

workingjubilee · 2023-03-11T00:28:37Z

Mmk.

I agree that we should go back to the drawing board on this part, honestly. I just was fine with trying to implement a redesign on top of a HEAD with this commit.

I've been experimenting with bits of redesigns that use associated types, but in ways that more strongly bind the operations we are allowing to the types of Simd without losing genericity over lanes, and with, yes, less imports and projections needed by users (the signature might look slightly jank in std but... uh, nothing Diesel-tier). We've gained some powerful tools recently, we should start using them and trying to retackle the problems we kind of fudged earlier.

I think we probably need more code examples in this repo of using our own API so that the consequences of our diffs will be more immediately obvious on user code to help settle this kind of concern in the future, rather than feeling like we have to hemm and haww for weeks. It should be more transparent whether something feels like a good or bad idea based on those examples.

calebzulawski · 2023-03-11T01:12:33Z

Regarding error messages etc, I can't see anything more straightforward than "the mask matches the vector element". We have already seen examples of people misunderstanding the masks and assuming the mask matches the vector and seeing very strange messages like "expected i64, got u64" and the thought process isn't "maybe masks have different element types" but instead "why does the compiler think my u64x4 is actually i64x4?".

It would be nice if masks didn't depend on the vector element, but there's nothing we can do about that. If they must depend on the element type, this actually seems the least abstracted to me.

I agree with @programmerjake regarding generics, I'm in the process of implementing num-traits for vectors and currently need Mask<<T as SimdElement>::Mask, N>, but with this change would be able to do simply Mask<T, N>.

workingjubilee · 2023-03-11T01:26:53Z

If the real splitting point is on error messages, then maybe we need some ui test examples, then? We should do what we need to do to feel confident shipping stuff.

jhorstmann · 2023-03-11T16:49:44Z

As for the argument that perhaps an ISA will have different mask types based on vector element type, I think it is uncompelling.

I wonder how difficult it would be to add support for #[repr(simd)] for [bool; N] in rustc, and whether that would sidestep any issues with the mask element type. The array would have no layout guarantees, and llvm codegen should treat that as <N x i1>.

AFAIK, in llvm codegen any masks are truncated anyway to i1 elements before they can be used by llvm intrinsics. If the mask is generated programmatically, llvm should be free to use best mask type depending on target and usage.

workingjubilee · 2023-03-11T21:44:58Z

I wonder how difficult it would be to add support for #[repr(simd)] for [bool; N] in rustc, and whether that would sidestep any issues with the mask element type. The array would have no layout guarantees, and llvm codegen should treat that as .

@jhorstmann I have been working on an RFC and implementation of generic integers directly into rustc and the language so that we can simply use that (but I have been frying a lot of fish, lately).

calebzulawski added 2 commits December 14, 2022 00:28

Use element type in mask

1440187

Remove From implementation

70ba34a

calebzulawski requested review from thomcc, programmerjake and workingjubilee December 14, 2022 05:40

programmerjake reviewed Dec 14, 2022

View reviewed changes

programmerjake mentioned this pull request Dec 14, 2022

change gather/scatter mask sizes to match data size not index/address size since that matches x86_64 #323

Open

Fix bitmask feature

7855cf9

programmerjake approved these changes Dec 19, 2022

View reviewed changes

workingjubilee approved these changes Feb 6, 2023

View reviewed changes

calebzulawski mentioned this pull request Feb 19, 2023

gather with i32 indexes #329

Open

calebzulawski mentioned this pull request Sep 10, 2023

Critical issues before stabilization #364

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change Mask element to match Simd element #322

Change Mask element to match Simd element #322

calebzulawski commented Dec 14, 2022

thomcc commented Dec 14, 2022

calebzulawski commented Dec 14, 2022 •

edited

Loading

programmerjake commented Dec 14, 2022

programmerjake left a comment

programmerjake Dec 14, 2022

programmerjake Dec 14, 2022

calebzulawski Dec 15, 2022

calebzulawski commented Dec 15, 2022

workingjubilee commented Dec 19, 2022

calebzulawski commented Dec 19, 2022 •

edited

Loading

calebzulawski commented Dec 19, 2022 •

edited

Loading

programmerjake commented Dec 19, 2022 •

edited

Loading

calebzulawski commented Dec 19, 2022

programmerjake left a comment

programmerjake commented Dec 19, 2022

workingjubilee commented Dec 28, 2022

thomcc commented Dec 28, 2022 •

edited

Loading

calebzulawski commented Jan 21, 2023 •

edited

Loading

programmerjake commented Jan 22, 2023

workingjubilee commented Feb 6, 2023

workingjubilee left a comment

thomcc commented Mar 11, 2023

programmerjake commented Mar 11, 2023

workingjubilee commented Mar 11, 2023 •

edited

Loading

calebzulawski commented Mar 11, 2023

workingjubilee commented Mar 11, 2023

jhorstmann commented Mar 11, 2023

workingjubilee commented Mar 11, 2023

Change Mask element to match Simd element #322

Are you sure you want to change the base?

Change Mask element to match Simd element #322

Conversation

calebzulawski commented Dec 14, 2022

thomcc commented Dec 14, 2022

calebzulawski commented Dec 14, 2022 • edited Loading

programmerjake commented Dec 14, 2022

programmerjake left a comment

Choose a reason for hiding this comment

programmerjake Dec 14, 2022

Choose a reason for hiding this comment

programmerjake Dec 14, 2022

Choose a reason for hiding this comment

calebzulawski Dec 15, 2022

Choose a reason for hiding this comment

calebzulawski commented Dec 15, 2022

workingjubilee commented Dec 19, 2022

calebzulawski commented Dec 19, 2022 • edited Loading

calebzulawski commented Dec 19, 2022 • edited Loading

programmerjake commented Dec 19, 2022 • edited Loading

calebzulawski commented Dec 19, 2022

programmerjake left a comment

Choose a reason for hiding this comment

programmerjake commented Dec 19, 2022

workingjubilee commented Dec 28, 2022

thomcc commented Dec 28, 2022 • edited Loading

calebzulawski commented Jan 21, 2023 • edited Loading

programmerjake commented Jan 22, 2023

workingjubilee commented Feb 6, 2023

workingjubilee left a comment

Choose a reason for hiding this comment

thomcc commented Mar 11, 2023

Footnotes

programmerjake commented Mar 11, 2023

workingjubilee commented Mar 11, 2023 • edited Loading

calebzulawski commented Mar 11, 2023

workingjubilee commented Mar 11, 2023

jhorstmann commented Mar 11, 2023

workingjubilee commented Mar 11, 2023

calebzulawski commented Dec 14, 2022 •

edited

Loading

calebzulawski commented Dec 19, 2022 •

edited

Loading

calebzulawski commented Dec 19, 2022 •

edited

Loading

programmerjake commented Dec 19, 2022 •

edited

Loading

thomcc commented Dec 28, 2022 •

edited

Loading

calebzulawski commented Jan 21, 2023 •

edited

Loading

workingjubilee commented Mar 11, 2023 •

edited

Loading