Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combination of i8x16::move_mask() and .trailing_zeros() is a footgun #165

Open
whitequark opened this issue Jul 6, 2024 · 3 comments
Open

Comments

@whitequark
Copy link

For example, consider this code, extracted from one of my libraries:

for &group in samples.array_chunks::<16>() {
    let mask = predicate(i8x16::new(group));
    offset += mask.move_mask().trailing_zeros() as usize;
    if mask.any() { break }
}
offset

The intent here is to move offset by the fractional part of the group that matched the predicate. Unfortunately, if the predicate didn't match at all, the moved mask would have 32 zeroes, throwing off the pointer.

(I know about iter.remainder()--in this particular application it can't be used because by the time the remainder is examined, it is already advanced past the group being matched by the predicate.)

Making i8x16::move_mask() be an u16 would solve this problem, but won't work for i32x4. Adding leading_zeros() and trailing_zeros() to the SIMD types, or perhaps boolean mask types from #43, would also solve it.

@Lokathor
Copy link
Owner

Lokathor commented Jul 8, 2024

I wouldn't have come up with a loop like that, I think. Though, I see your point about how things can get mixed up if you write it that way.

Given the amount of time I have to work on wide lately (not much), I'd prefer to avoid solutions that are breaking changes in the library.

I'm not totally sure that I understand the proposed loop as well. If there's any elements matching the predicate, you add some to the offset and then break the loop immediately? So the only time you don't break the loop is if "no elements match", so wouldn't the move_mask of a "no elements match" always be a 0? So can't you make it an if-else with adding a constant 16 per loop that continues, and then add move_mask when things don't continue?

@whitequark
Copy link
Author

So can't you make it an if-else with adding a constant 16 per loop that continues, and then add move_mask when things don't continue?

I can! I'm reporting this as an UI issue, not a correctness issue. Feel free to close if this isn't wanted.

@Lokathor
Copy link
Owner

It can stay open, but basically it's sorta "needs design" I guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants