-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JIT] Add support to inline the field access of primitive types marked with TLS #82973
Conversation
…peIndex in t_threadStaticBlocks
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak Issue DetailsSummaryThis is an early prototype to add support of inlining A thread local cache During codegen, the enclosing type of the TS field that is being access (load or store) is monitored and gets a unique index. This index is the position at which the static data block will be stored in The To access the DisassemblyTo understand how it works, consider the following C# code. [MethodImpl(MethodImplOptions.NoInlining)]
public int GetThreadStaticInt() => t_threadStaticIntValue;
[ThreadStatic]
private static int t_threadStaticIntValue = 0; Today, we always generate a helper call to retrieve the G_M000_IG01: ;; offset=0000H
4883EC28 sub rsp, 40
G_M000_IG02: ;; offset=0004H
48B9D0482999FA7F0000 mov rcx, 0x7FFA992948D0
BA78010000 mov edx, 376
E81894A65F call CORINFO_HELP_GETSHARED_NONGCTHREADSTATIC_BASE_NOCTOR
8B80A4040000 mov eax, dword ptr [rax+04A4H]
G_M000_IG03: ;; offset=001EH
4883C428 add rsp, 40
C3 ret With this prototype, this is what we would generate: G_M000_IG01: ;; offset=0000H
56 push rsi
4883EC20 sub rsp, 32
G_M000_IG02: ;; offset=0005H
65488B0C2558000000 mov rcx, qword ptr GS:[0x0058] ; Access the TLS of current thread
488B7130 mov rsi, qword ptr [rcx+30H] ; Get the runtime TLS slot from TLS[_tls_index]
4883BEB801000002 cmp qword ptr [rsi+1B8H], 2 ; See if length of `t_threadStaticBlocks` > typeIndex. Here `typeIndex == 2`.
7E22 jle SHORT G_M000_IG04 ; If yes, then proceed, else fallback to the helper
G_M000_IG03: ;; offset=001CH
48B9D048F522FA7F0000 mov rcx, 0x7FFA22F548D0
BA78010000 mov edx, 376
41B802000000 mov r8d, 2 ; This is a new paramter to the helper that would cache static data block in `t_threadStaticBlocks` at index `2`.
E8FAF8AB5F call CORINFO_HELP_GETSHARED_NONGCTHREADSTATIC_BASE_NOCTOR
8B80A4040000 mov eax, dword ptr [rax+04A4H]
EB16 jmp SHORT G_M000_IG06
G_M000_IG04: ;; offset=003EH
488B86B0010000 mov rax, qword ptr [rsi+1B0H] ; Get the `t_threadStaticBlocks[typeIndex]`.
488B4010 mov rax, qword ptr [rax+10H] ; Get the `t_threadStaticBlocks[typeIndex]`.
4885C0 test rax, rax ; Check if a valid entry is present at `t_threadStaticBlocks[typeIndex]`.
74CE je SHORT G_M000_IG03 ; If invalid, then go to the helper
G_M000_IG05: ;; offset=004EH
8B80A4040000 mov eax, dword ptr [rax+04A4H] ; If valid, then get the field value.
G_M000_IG06: ;; offset=0054H
4883C420 add rsp, 32
5E pop rsi
C3 ret DetailsTODO TODO
Contributes to #79521.
|
@kunalspathak, can you also give an example of the use of a For example, what is the codegen given the following: [ThreadStatic]
public static volatile int t_value;
public static int Test(uint count)
{
int sum = 0;
for (uint i = 0; i < count; i++)
{
sum += t_value;
}
return sum;
} I'd expect we end up with the large up front block as "hoistable", that is the initial resolution of the TLS base. The inner loop should then remain "small" and effectively just a direct memory access since we'll have already resolved the base/offset of the TLS value for the given thread. Notably it also looks like there is a "cheap check" and "expensive fallback" to the TLS handling here. The way the blocks are being ordered doesn't look to mark the expensive fallback as "cold" which might negatively impact things, particularly for subsequent executions of the code. |
…various helper expansion
Failures are known issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple hopefully minor questions/comments
LGTM with a couple of nits |
…nullptr 1st time and valid value 2nd time onwards
Other failures are unrelated and fixed by #84649 |
Summary
This is an early prototype to add support of inlining
ThreadStatic
(TS) field access (for now just the primitive types. Today, for fields that are marked withThreadStatic
always has to go through the helper. The helper first gets the current thread, the thread local block, then the ThreadLocalModule for the current moduleIndex and lastly the static data block. This prototype tries to inline such accesses by adding few data structures that acts as a cache.A thread local cache
t_threadStaticBlocks
(array of pointers) is added which stores the static blocks corresponding to the given thread.During codegen, the enclosing type of the TS field that is being access (load or store) is monitored and gets a unique index. This index is the position at which the static data block will be stored in
t_threadStaticBlocks
cache at runtime. Since the index is assigned during codegen and embedded in the code, it remains same for any thread that executes the code at runtime. Hence any thread that executes the code, will make sure that it gets the relevant static data block fromt_threadStaticBlocks
cache.The
t_threadStaticBlocks
is populated during runtime as well. The first time the field access code is executed, it tries to find the static data block int_threadStaticBlocks
but doesn't find it. It fallbacks to the slow path which is the existing helper call. The helper call has been modified to update thet_threadStaticBlocks
with the static data block. Next time when the TS field access code is executed, the entry is found in the cache, and we skip going to the helpers.To access the
t_threadStaticBlocks
at runtime, code is generated to access the relevant cache for the current thread by fetching theTLS
of current thread, getting the slot for runtime, and then getting thet_threadStaticBlocks
present in that slot.Disassembly
To understand how it works, consider the following C# code.
Today, we always generate a helper call to retrieve the
t_threadStaticIntValue
field value.With this prototype, this is what we would generate:
Details
TODO
TODO
t_threadStaticBlocks
.typeIndex
.Contributes to #79521.