-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change Vector2/3/4, Quaternion, Plane, Vector<T>, and Vector64/128/256/512<T> to be implemented in managed where trivially possible #102301
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
…ented in managed where trivially possible
This comment was marked as outdated.
This comment was marked as outdated.
Things are already looking much better than before.
There is an assert around |
…PIs as intrinsic for the profitability boost
f7b27b2
to
1d6cafa
Compare
/// <returns><paramref name="value" /> reinterpreted as a new <see cref="Plane" />.</returns> | ||
internal static Plane AsPlane(this Vector4 value) | ||
{ | ||
#if MONO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this perf problem or functionality problem?
Either way, it is likely going to be fixed as side-effect of #102988 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly a functionality problem that I've not dug into yet. The previous tested commit had seen some failures related to Quaternion/Vector4/Plane and their equality tests, but only on Mono.
My current guess is that there's something subtly incorrect in the Mono handling that breaks for SIMD here and it will need a fix before BitCast can be used, but I'd like to try and ensure CoreCLR is clean without regressions before I do any more in depth Mono changes.
1d6cafa
to
4dae1fc
Compare
One regression is in - vmovups xmm0, xmmword ptr [rdi]
+ vmovss xmm0, dword ptr [rdi]
+ vinsertps xmm0, xmm0, dword ptr [rdi+0x04], 16
+ vinsertps xmm0, xmm0, dword ptr [rdi+0x08], 32
+ vinsertps xmm0, xmm0, dword ptr [rdi+0x0C], 48 This one is because while we have There's a small regression in G_M1352_IG02:
mov rax, rsi
mov ecx, edx
- vmovups xmm0, xmmword ptr [rax]
cmp ecx, 128
jge SHORT G_M1352_IG04
- ;; size=17 bbWeight=1 PerfScore 5.75
+ ;; size=13 bbWeight=1 PerfScore 1.75
G_M1352_IG03:
- vmovd xmm1, rdi
- vpxor xmm0, xmm1, xmm0
+ vmovd xmm0, rdi
+ vpxor xmm0, xmm0, xmmword ptr [rax]
jmp G_M1352_IG07
align [0 bytes for IG05]
- ;; size=13 bbWeight=0.50 PerfScore 2.17
+ ;; size=13 bbWeight=0.50 PerfScore 3.50
G_M1352_IG04:
+ vmovups xmm0, xmmword ptr [rsi]
vmovups xmm1, xmmword ptr [rsi+0x10] Nothing major, just a place where we don't share a load anymore. In general the JIT ends up needing to create many additional Most of the corelib regressions are in the xplat APIs that don't have any accelerated implementation today, like All in all, I think we are in a position where taking this is feasible and it doesn't look to regress any meaningful scenarios, only the already unaccelerated edge cases. -- Mono is being left "as is" for now since their inliner isn't as robust as the RyuJIT one, but we should be able to independently investigate removing the Mono support for the xplat APIs as well so they only need to support the platform specific APIs. CC. @dotnet/jit-contrib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice clean up in jit! 👍
Likely regressions: cc: @tannergooding |
wasm regressions here dotnet/perf-autofiling-issues#36108 (aot) and improvements dotnet/perf-autofiling-issues#36093 (interp) cc @radekdoulik |
These should be resolved with #103177
Little bit surprised that
👍, fairly significant improvements at that. I know that |
I find it surprising too, @radekdoulik can you take a look at our codegen here please
Yes, nice improvements |
Possibly related noticeable size improvements on Mono iOS HelloWorld dotnet/perf-autofiling-issues#35768 |
This is a basic prototype for #102275 and covers the all vectorized types where the handling can be trivially done in managed (such as by deferring to another intrinsic API).
As part of that, it also removes
MethodImpl(MethodImplOptions.AggressiveInlining)
from various APIs where the method IL size is approx. less than the always inline threshold for RyuJIT (this was estimated by examining a few functions and finding that 2 or less calls with no complex logic typically fits the need).