Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move some x86 intrinsics code to helper functions in shims::x86 #3214

Merged
merged 4 commits into from
Dec 8, 2023

Conversation

eduardosm
Copy link
Contributor

To make them reusable for intrinsics of other x86 features.

Splitted from #3192


/// Conditionally multiplies the packed floating-point elements in
/// `left` and `right` using the high 4 bits in `imm`, sums the four
/// products, and conditionally stores the sum in `dest` using the low
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is "conditional", I assume these are not actually four products but up to four products?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right

let op = this.read_scalar(&op)?.to_uint(op.layout.size)?;
let mask = this.read_scalar(&mask)?.to_uint(mask.layout.size)?;
all_zero &= op & mask == 0;
masked_set &= op & mask == mask;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
masked_set &= op & mask == mask;
masked_set &= (op & mask) == mask;

I first read this as op & (mask == mask) and was quite confused...

Comment on lines 694 to 698
all_zero &= (op & mask) == 0;
masked_set &= (op & mask) == mask;
}

Ok(f(all_zero, masked_set))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No that's not what I meant... what I mean was to do acc = f(acc, op, mask) inside the loop. That way the caller then decides how the results get accumulated.

Basically, this is folding in some way over the two vectors, and so the natural general interface is to allow any kind of folding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No that's not what I meant... what I mean was to do acc = f(acc, op, mask) inside the loop. That way the caller then decides how the results get accumulated.

That would a fn(bool, Scalar, Scalar) -> bool, wouldn't it? Not a fn(bool, bool) -> bool.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah -- I might have gotten the signature wrong.

Basically, whatever the type of f is in the current code. Or maybe something a bit more general.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't working correctly for the (op & mask) != 0 && (op & mask) != mask case. We need to first calculate the all_zero and masked_set booleans (for the whole SIMD vector) and then !all_zero && !masked_set.

|acc, op, mask| {
let op = op.to_scalar().to_uint(op.layout.size)?;
let mask = mask.to_scalar().to_uint(mask.layout.size)?;
Ok(acc && (op & mask) != 0 && (op & mask) != mask)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code will be overall shorter if you move the match down inside the closure; then the let op and let mask can be outside the match.

@RalfJung
Copy link
Member

RalfJung commented Dec 8, 2023

Thanks a lot. :)
@bors r+

@bors
Copy link
Collaborator

bors commented Dec 8, 2023

📌 Commit 44bf5fc has been approved by RalfJung

It is now in the queue for this repository.

@bors
Copy link
Collaborator

bors commented Dec 8, 2023

⌛ Testing commit 44bf5fc with merge a5b9f54...

@bors
Copy link
Collaborator

bors commented Dec 8, 2023

☀️ Test successful - checks-actions
Approved by: RalfJung
Pushing a5b9f54 to master...

@bors bors merged commit a5b9f54 into rust-lang:master Dec 8, 2023
7 of 8 checks passed
bors added a commit that referenced this pull request Dec 9, 2023
Fix x86 SSE4.1 ptestnzc

Fixes ptestnzc by bringing back the original implementation of #3214.

`(op & mask) != 0 && (op & mask) == !ask` need to be calculated for the whole vector. It cannot be calculated for each element and then folded.

For example, given
* `op = [0b100, 0b010]`
* `mask = [0b100, 0b110]`

The correct result would be:
* `op & mask = [0b100, 0b010]`
Comparisons are done on the vector as a whole:
* `all_zero = (op & mask) == [0, 0] = false`
* `masked_set = (op & mask) == mask = false`
* `!all_zero && !masked_set = true` correct result

The previous method:
* `op & mask = [0b100, 0b010]`
Comparisons are done element-wise:
* `all_zero = (op & mask) == [0, 0] = [true, true]`
* `masked_set = (op & mask) == mask = [true, false]`
* `!all_zero && !masked_set = [true, false]`
 After folding with AND, the final result would be `false`, which is incorrect.
@eduardosm eduardosm deleted the move-x86-code branch December 9, 2023 11:55
RalfJung pushed a commit to RalfJung/rust that referenced this pull request Dec 17, 2023
Fix x86 SSE4.1 ptestnzc

Fixes ptestnzc by bringing back the original implementation of rust-lang/miri#3214.

`(op & mask) != 0 && (op & mask) == !ask` need to be calculated for the whole vector. It cannot be calculated for each element and then folded.

For example, given
* `op = [0b100, 0b010]`
* `mask = [0b100, 0b110]`

The correct result would be:
* `op & mask = [0b100, 0b010]`
Comparisons are done on the vector as a whole:
* `all_zero = (op & mask) == [0, 0] = false`
* `masked_set = (op & mask) == mask = false`
* `!all_zero && !masked_set = true` correct result

The previous method:
* `op & mask = [0b100, 0b010]`
Comparisons are done element-wise:
* `all_zero = (op & mask) == [0, 0] = [true, true]`
* `masked_set = (op & mask) == mask = [true, false]`
* `!all_zero && !masked_set = [true, false]`
 After folding with AND, the final result would be `false`, which is incorrect.
bors added a commit that referenced this pull request Feb 16, 2024
RalfJung pushed a commit to RalfJung/rust that referenced this pull request Feb 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants