-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inliner: next steps #59002
Comments
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsThis issue summarizes current problems which Inliner doesn't handle well currently. stfld is expensive for ref typesCurrently inliner doesn't realize class MyTest
{
private object _a;
private object _b;
public MyTest(object a, object b)
{
_a = a;
_b = b;
}
static MyTest Create(object a, object b) => new MyTest(a, b);
} Codegen for ; Method MyTest:Test(System.Object,System.Object):MyTest
G_M58625_IG01: ;; offset=0000H
57 push rdi
56 push rsi
53 push rbx
4883EC20 sub rsp, 32
488BF1 mov rsi, rcx
488BFA mov rdi, rdx
;; bbWeight=1 PerfScore 3.75
G_M58625_IG02: ;; offset=000DH
48B9D00288CDFB7F0000 mov rcx, 0x7FFBCD8802D0
E8B4C7A15F call CORINFO_HELP_NEWSFAST
488BD8 mov rbx, rax
488D4B08 lea rcx, bword ptr [rbx+8]
488BD6 mov rdx, rsi
E825C3A15F call CORINFO_HELP_ASSIGN_REF
488D4B10 lea rcx, bword ptr [rbx+16]
488BD7 mov rdx, rdi
E819C3A15F call CORINFO_HELP_ASSIGN_REF
488BC3 mov rax, rbx
;; bbWeight=1 PerfScore 5.25
G_M58625_IG03: ;; offset=003AH
4883C420 add rsp, 32
5B pop rbx
5E pop rsi
5F pop rdi
C3 ret
;; bbWeight=1 PerfScore 2.75
; Total bytes of code: 66 Calles with Pinvokes are expensive(or basically any callees which impact on callers' prologues/epilogues) class MyTest
{
[DllImport("user32.dll")]
static extern int MessageBeep(uint uType);
static int MessageBeepWrapper(uint uType) => MessageBeep(uType);
static void Beep(bool beep)
{
if (beep)
MessageBeepWrapper(0x20);
}
} Codegen for G_M29584_IG01: ;; offset=0000H
55 push rbp
4157 push r15
4156 push r14
4155 push r13
4154 push r12
57 push rdi
56 push rsi
53 push rbx
4883EC68 sub rsp, 104
488DAC24A0000000 lea rbp, [rsp+A0H]
8BF1 mov esi, ecx
;; bbWeight=1 PerfScore 9.00
G_M29584_IG02: ;; offset=001AH
488D4D88 lea rcx, [rbp-78H]
498BD2 mov rdx, r10
E8FA4E705F call CORINFO_HELP_INIT_PINVOKE_FRAME
488BF8 mov rdi, rax
488BCC mov rcx, rsp
48894DA8 mov qword ptr [rbp-58H], rcx
488BCD mov rcx, rbp
48894DB8 mov qword ptr [rbp-48H], rcx
4084F6 test sil, sil
744B je SHORT G_M29584_IG07
;; bbWeight=1 PerfScore 5.75
G_M29584_IG03: ;; offset=003CH
B920000000 mov ecx, 32
48B8C8018ACDFB7F0000 mov rax, 0x7FFBCD8A01C8
48894598 mov qword ptr [rbp-68H], rax
488D0516000000 lea rax, G_M29584_IG05
488945B0 mov qword ptr [rbp-50H], rax
488D4588 lea rax, bword ptr [rbp-78H]
48894710 mov qword ptr [rdi+16], rax
C6470C00 mov byte ptr [rdi+12], 0
;; bbWeight=0.50 PerfScore 3.00
G_M29584_IG04: ;; offset=0066H
FF15FCD54600 call [MyTest:MessageBeep(int):int]
;; bbWeight=0.50 PerfScore 1.50
G_M29584_IG05: ;; offset=006CH
C6470C01 mov byte ptr [rdi+12], 1
833D11C7F55F00 cmp dword ptr [(reloc 0x7ffc2d38f428)], 0
7406 je SHORT G_M29584_IG06
FF15310AF45F call [CORINFO_HELP_STOP_FOR_GC]
;; bbWeight=0.50 PerfScore 4.00
G_M29584_IG06: ;; offset=007FH
488B4590 mov rax, bword ptr [rbp-70H]
48894710 mov qword ptr [rdi+16], rax
;; bbWeight=0.50 PerfScore 1.00
G_M29584_IG07: ;; offset=0087H
4883C468 add rsp, 104
5B pop rbx
5E pop rsi
5F pop rdi
415C pop r12
415D pop r13
415E pop r14
415F pop r15
5D pop rbp
C3 ret
;; bbWeight=1 PerfScore 5.25
; Total bytes of code: 152 typeof(T) should be recognized as a constant argument (and propagated by value)void Validate(Type type)
{
if (type.IsValueType)
throw new InvalidOperationException("");
}
void CallTest() => Validate(typeof(string)); Inliner's log for
So it recognized call (cnsArg, ...) heuristic is too optimisticWhen inliner sees a pattern like runtime/src/coreclr/jit/fgbasic.cpp Lines 1230 to 1235 in f29484a
Methods with string literals from external assemblies are never inlinedSee #53726 (comment) JIT should not give up completely for large methodsIt should be able to analyze a small part of such method to find potential foldable branches in large methods or when it runs out of "time budget", related issues:
JIT should be able to resolve 'call' instructions in methods with loops.We either should allow it to resolve 'call' in TC=0/methods with loops or enable QJFL. Enabling it for TC=0 will allow users to see how it works with intrinsics on sharplab. Potential impact on TP. So currently we'll never see a foldable branch here: void Test<T>()
{
if (typeof(T) == typeof(int))
return;
for (int i = 0; i < 100; i++) {}
} ... more to come
|
This issue summarizes problems which Inliner doesn't handle well currently.
Better understanding of what will be removed if we inline a callee
TODO:
Bottom-up inlining for AOT
TODO:
"Agile" time budget
TODO:
stfld is expensive for ref types
Currently inliner doesn't realize
stfld
for ref types will end up as ASSIGN_REF calls (gc write barriers) and is cheap for value-types. There is a benchmark that shows better results with NoInlining: https://discord.com/channels/143867839282020352/312132327348240384/886606988287569950Codegen for
Create
:ldsfld over a static readonly field returns constant
static readonly
fields over primitive types most likely are converted to constants, e.g.:Inliner's log in
Test
:while it is expected from it to recognize a foldable branch here.
Callees with Pinvokes are expensive
(or basically any callee that injects code into callers' prologues/epilogues)
Codegen for
Beep
:NOTE if caller already have a pinvoke - then it should make sense to inline callees with pinvokes too.
typeof(T) should be recognized as a constant argument (and propagated by value)
Inliner's log for
CallTest
:So it recognized
get_IsValueType
intrinsic, but didn't realize we pass a constant there. #55875 is a good example of how to propagate complex trees during inlining so the same should be done withtypeof
.call (cnsArg, ...) heuristic is too optimistic
When inliner sees a pattern like
if (CallSomething(arg0, ...)
it might recognize it as a foldable branch currently if arg0 is a constant arg: it's too optimistic and doesn't check call's signature.runtime/src/coreclr/jit/fgbasic.cpp
Lines 1230 to 1235 in f29484a
Methods with string literals from external assemblies are never inlined
See #53726 (comment)
JIT should not give up completely on large methods
It should be able to analyze a small part of such method to find potential foldable branches in large methods or when it runs out of "time budget", related issues:
JIT should be able to resolve 'call' instructions in methods with loops.
We either should allow it to resolve 'call' in TC=0/methods with loops or enable QJFL. Enabling it for TC=0 will allow users to see how it works with intrinsics on sharplab. Potential impact on TP. So currently we'll never see a foldable branch here:
Cache machine code size
If JIT compiles a method with IL size, say, 100 bytes and it ends up with only 20 bytes of machine code we can keep that info for future compilations that this method is not as big as it looks (it can be just a single bit instead of machine code size).
Misc
OS/Platform specifics: Callee-saved registers and CSE, ABI differences
Explicit tail-calls are never inlined
Save some tips for inliner in tier0/crossgen
etc.
category:planning
theme:inlining
skill-level:expert
cost:large
impact:large
The text was updated successfully, but these errors were encountered: