-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up interface checking and casting #49257
Conversation
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/CastHelpers.cs
Outdated
Show resolved
Hide resolved
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/CastHelpers.cs
Show resolved
Hide resolved
@AndyAyersMS btw, the CastHelpers (also SpanHelpers) are heavily optimized by hands for general cases - I wonder if we should disable PGO for them or even mark as AggressiveOptimization - otherwise we can re-order them because of some loop during start up and mess up other cases. |
fb8a3de
to
855430a
Compare
Updated summary |
Can you run a benchmark, e.g. something like this (for |
Have one but its been running for hours (too many variants); will have to try something shorter |
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/CastHelpers.cs
Outdated
Show resolved
Hide resolved
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/CastHelpers.cs
Outdated
Show resolved
Hide resolved
Looks good after fixing the condition @jkotas pointed out #49257 (comment) Interface cast benchmark gist
|
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/CastHelpers.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me. @VSadov Could you please take a look as well?
You can delete unnecessary usings in the System.Runtime.CompilerServices.CastHelpers class : |
Done |
if (interfaceMap[i + 3] == toTypeHnd) | ||
// If not enough for unrolled, jmp straight to small loop | ||
// as we already know there is one or more interfaces so don't need to check again. | ||
goto few; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If interface table is allocated as 4 elements aligned and zero-filled, we could remove the "few" case entirely. At least in Chk case which is expected to succeed.
That would come with size increase though, so does not seem worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Nice!!!
These methods get prejitted, right? Once we have PGO data for prejitted methods we should look and see if we're happy with the codegen we get. |
@AndyAyersMS - Yes, these methods are R2R (so no jit at startup) and may rejit later. They are not |
16% - 38% faster* for
IsInstanceOfInterface
&ChkCastInterface
[rdx + 8]
vs[rdx + r8*8 + 8]
; for lower latency and greater CPU port availability for more instruction parallelism (*see "Intel Optimization Reference Manual" sections "3.5.1.2 Using LEA" and "3.5.1.6 Address Calculations")dec
s exchanged for 1add
with less data depenency*Results #49257 (comment)
/cc @VSadov