-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wasm] Optimize Vector128<float>/<double>.Equals in interp/jiterp #88064
Conversation
Tagging subscribers to this area: @BrzVlad, @kotlarmilos Issue DetailsCalling Equals on vectors of floating point types produces extremely nasty interp code that compiles down to a huge blob of wasm, which makes it hard to write tests/measurements that actually fit into traces. This PR adds intrinsics for doing equality comparisons on R4/R8 vectors and then implements them in jiterp. The performance also improves slightly over what it was before since the generated code is much improved. The c interp implementation could probably be improved on, though - perhaps if clang has an all bits set intrinsic and an all_true intrinsic?
|
@tannergooding any concerns about this change to Equals? |
coreclr failures do not apply, @kg merge at will. @tannergooding we welcome feedback before or after merge |
internal static bool EqualsFloatingPoint (Vector128<T> lhs, Vector128<T> rhs) | ||
{ | ||
Vector128<T> result = Vector128.Equals(lhs, rhs) | ~(Vector128.Equals(lhs, lhs) | Vector128.Equals(rhs, rhs)); | ||
return result.AsInt32() == Vector128<int>.AllBitsSet; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just handle Equals
directly? There are several APIs on Vector2/3/4
and Vector<T>
that are [Intrinsic]
instance methods, so I would expect Mono has the support for handling that already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now we already vectorize the underlying Equals operation, the problem is that this one desugars (?) to a bunch of individual SIMD operations that each get their own interp opcode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is, to be clear, if you call Vector128<float>.Equals(...)
the interp generates something like this right now:
v128_equals_r4 (lhs, rhs)
v128_equals_r4 (lhs, lhs)
v128_equals_r4 (rhs, rhs)
v128_or stack
v128_not stack
v128_or stack
v128_load_allbitsset
v128_equals_i4 stack
v128_all_true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant why not mark the instance Equals
method as [Intrinsic]
and have Mono treat that to be the same as operator ==
for integers and to be a single opcode representing this sequence for float
/double
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's what you prefer I can figure out how to do it. I wanted to keep this change as narrow as possible since the existing Equals method is fine for ints as-is. I'll see how much work it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is, why split off a separate EqualsFloatingPoint
at all, rather than simply handling its only caller directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries if its overly complex to do. I just figured it would be better overall for both RyuJIT and Mono.
We actually used to do just what I've proposed in RyuJIT, but dropped that support a while back since the inliner was able to handle it and the instance equals calls were much rarer to encounter.
Add interp intrinsics for Vector128 float and double Equals methods Implement Vector128 float and double Equals methods in jiterp
Add validation to make sure we never appendSimd(0) by accident
Some of the PackedSimd changes on main broke this, so I had to update it. Good thing I ran my tests again :-) |
Calling Equals on vectors of floating point types produces extremely nasty interp code that compiles down to a huge blob of wasm, which makes it hard to write tests/measurements that actually fit into traces. This PR adds intrinsics for doing equality comparisons on R4/R8 vectors and then implements them in jiterp. The performance also improves slightly over what it was before since the generated code is much improved.
The c interp implementation could probably be improved on, though - perhaps if clang has an all bits set intrinsic and an all_true intrinsic?