-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Conversation
Pre ; Assembly listing for method AsyncTaskMethodBuilder`1:AwaitUnsafeOnCompleted(byref,byref):this
; Lcl frame size = 48
G_M62408_IG01:
56 push rsi
4883EC30 sub rsp, 48
4889542428 mov qword ptr [rsp+28H], rdx
498BF0 mov rsi, r8
G_M62408_IG02:
498BD1 mov rdx, r9
E8C3F2FFFF call AsyncTaskMethodBuilder`1:GetStateMachineBox(byref):ref:this
488BD0 mov rdx, rax
480FBE0E movsx rcx, byte ptr [rsi]
488B0E mov rcx, gword ptr [rsi]
41B801000000 mov r8d, 1
E8968B9D5D call TaskAwaiter:UnsafeOnCompletedInternal(ref,ref,bool)
90 nop
G_M62408_IG03:
4883C430 add rsp, 48
5E pop rsi
C3 ret
; Total bytes of code 49, prolog size 10 for method AsyncTaskMethodBuilder`1:AwaitUnsafeOnCompleted(byref,byref):this Post ; Assembly listing for method AsyncTaskMethodBuilder`1:AwaitUnsafeOnCompleted(byref,byref):this
; Lcl frame size = 48
G_M62419_IG01:
4156 push r14
57 push rdi
56 push rsi
55 push rbp
53 push rbx
4883EC30 sub rsp, 48
4889542428 mov qword ptr [rsp+28H], rdx
488BF9 mov rdi, rcx
498BF0 mov rsi, r8
498BD9 mov rbx, r9
G_M62419_IG02:
E8C37E8A5B call ExecutionContext:Capture():ref
488BE8 mov rbp, rax
488B17 mov rdx, gword ptr [rdi]
48B9906AF140FD7F0000 mov rcx, 0x7FFD40F16A90
E88E1FB75F call CORINFO_HELP_ISINSTANCEOFCLASS
4C8BF0 mov r14, rax
4D85F6 test r14, r14
7414 je SHORT G_M62419_IG04
49396E48 cmp gword ptr [r14+72], rbp
740C je SHORT G_M62419_IG03
498D4E48 lea rcx, bword ptr [r14+72]
488BD5 mov rdx, rbp
E82422B75F call CORINFO_HELP_ASSIGN_REF
G_M62419_IG03:
EB11 jmp SHORT G_M62419_IG05
G_M62419_IG04:
488BCF mov rcx, rdi
488BD3 mov rdx, rbx
4C8BC5 mov r8, rbp
E8ACF4FFFF call AsyncTaskMethodBuilder`1:CreateStateMachineBox(byref,ref):ref:this
4C8BF0 mov r14, rax
G_M62419_IG05:
498BD6 mov rdx, r14
480FBE0E movsx rcx, byte ptr [rsi]
488B0E mov rcx, gword ptr [rsi]
41B801000000 mov r8d, 1
E8BCAE915B call TaskAwaiter:UnsafeOnCompletedInternal(ref,ref,bool)
90 nop
G_M62419_IG06:
4883C430 add rsp, 48
5B pop rbx
5D pop rbp
5E pop rsi
5F pop rdi
415E pop r14
C3 ret
; Total bytes of code 128, prolog size 15 for method AsyncTaskMethodBuilder`1:AwaitUnsafeOnCompleted(byref,byref):this |
@dotnet-bot test OSX10.12 x64 Checked Innerloop Build and Test |
It depends on the workload. Create needs to be called for every async method that yields at least once, so if one yields exactly once (which is quite common in some workloads), this won't help and could actually make things worse, especially if the get method doesn't get inlined. There's also been some concern expressed about the size overheads added in AOT builds by the use of generics in these builders, and it would seem the introduction of another generic method will only make that worse. @jkotas? You're focused on call counts. Does this have any measurable throughput impact at all? |
The call costs would still be there regardless (when the method is combined as it doesn't inline); so the additional cost is the preamble in G_M11442_IG01:
4156 push r14
57 push rdi
56 push rsi
55 push rbp
53 push rbx
4883EC40 sub rsp, 64
488BD9 mov rbx, rcx
488BF2 mov rsi, rdx
498BE8 mov rbp, r8
;
G_M11445_IG12:
4883C440 add rsp, 64
5B pop rbx
5D pop rbp
5E pop rsi
5F pop rdi
415E pop r14
C3 ret
With Tiered Jitting that should solve the issue for crossgen, but as you point out it may be an issue for AoT (depending how good it is at resolving runtime handles at they would be fixed?) Jit diffs aren't huge
|
Interestingly adding the |
@dotnet-bot test Ubuntu arm Cross Checked Innerloop Build and Test |
Having issues with dev machine atm, will get back to you |
I think you are looking at different instantiation in crossgen than what you are looking with JIT. This same inlining failure is going to happen with JIT when
Correct. Note that the code size bloat matters for JIT too, it is just harder to measure the impact. The code expanding optimizations are great for micro-benchmarks, but they can easily hurt e.g. non-trivial real-world workloads that run mix of hundred different async methods. |
Its a modest ~55 byte increase per method; to use a 128 byte method vs 466 byte method combination (49+417) when the async path resumes 2+ times; which should give it more chance to be instruction cache resident (inlining aside). Looped async resumption is also a pattern that is likely to be more common with the introduction of async enumerators/streams. Saying that... I don't feel too strongly about this change, |
Ubuntu arm https://github.com/dotnet/coreclr/issues/17129
|
Might be able to do something for bloat |
That's not including all of the metadata/runtime data structures/etc. for the additional generic method, though, right? |
No, so this probably wouldn't help? #21908 |
Again, this is probably more interesting for now #21909 can revisit this later |
Split
CreateStateMachineBox
out fromGetStateMachineBox
for a fast-path in common scenario.At Jit time
GetStateMachineBox
for a pre-"boxed" statemachine is generally considered a profitable inline (without forcing with AggressiveInlining), adding only 79 bytes of asm to XxOnCompleted; whereas with the creation in it is not (417 bytes of asm).At crossgen it it blocked due to
[FAILED: runtime dictionary lookup]
but that should be cleaned up by Tier1 Jit.The
GetStateMachineBox
portion is the always called section forAwaitUnsafeOnCompleted
As per the comments the Get not Create path is the more common one (here's with Get inlined):
/cc @stephentoub