Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline TLS field access for linux/osx x64/arm64 #87082

Merged
merged 88 commits into from
Jul 6, 2023

Conversation

kunalspathak
Copy link
Member

@kunalspathak kunalspathak commented Jun 2, 2023

Extending #85619 and #82973 for linux.

       mov      rdi, 0x7FA9DB182F28      ; GOT descriptor address of `maxThreadStaticBlocks`
       mov      rax, 0x7FA9DB7A5DB0      ; function address of __tls_get_addr
       call     rax                      ; gets the address of `maxThreadStaticBlocks`
       cmp      dword ptr [rax], 2
       jl       SHORT G_M8151_IG06
       mov      rdi, qword ptr [rax+08H]   ; access the `threadStaticBlocks` variable
       mov      rdi, qword ptr [rdi+10H]
       test     rdi, rdi
       je       SHORT G_M8151_IG06
       mov      rax, bword ptr [rdi]
       add      rax, 16

Base:

|          Method |     Mean |     Error |    StdDev |   Median |      Min |      Max | Allocated |
|---------------- |---------:|----------:|----------:|---------:|---------:|---------:|----------:|
| GetThreadStatic | 3.383 ns | 0.0378 ns | 0.0316 ns | 3.374 ns | 3.346 ns | 3.459 ns |         - |
| SetThreadStatic | 5.334 ns | 0.0733 ns | 0.0650 ns | 5.338 ns | 5.261 ns | 5.451 ns |         - |

Diff:

|          Method |     Mean |     Error |    StdDev |   Median |      Min |      Max | Allocated |
|---------------- |---------:|----------:|----------:|---------:|---------:|---------:|----------:|
| GetThreadStatic | 1.048 ns | 0.0200 ns | 0.0177 ns | 1.041 ns | 1.026 ns | 1.087 ns |         - |
| SetThreadStatic | 2.844 ns | 0.0379 ns | 0.0354 ns | 2.837 ns | 2.807 ns | 2.915 ns |         - | 

This also adds mrs instruction in arm64 to access TLS on linux. The codegen is:

G_M8151_IG02:  
            mrs     x0, tpidr_el0
            ldr     w1, [x0, #0xE8]  ; `maxThreadStaticBlocks` value
            cmp     w1, #2
            blt     G_M8151_IG05
            ldr     x0, [x0, #0xF0]  ; `threadStaticBlocks` array
            ldr     x1, [x0, #0x10]
            cbz     x1, G_M8151_IG05

G_M8151_IG03:  
            mov     w0, #5
            str     w0, [x1, #0x24]

osx/arm64:

Because of register allocator doesn't reuse the x0 properly, we end up creating the same constant again in x1, but this is how osx/arm64 looks like:

            movz    x0, #0x8C00
            movk    x0, #817 LSL #16
            movk    x0, #1 LSL #32
            movz    x1, #0x8C00
            movk    x1, #817 LSL #16
            movk    x1, #1 LSL #32
            ldr     x1, [x1]
            blr     x1

osx/x64:

image

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 2, 2023
@ghost ghost assigned kunalspathak Jun 2, 2023
@ghost
Copy link

ghost commented Jun 2, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Extending #85619 and #82973 for linux.

Author: kunalspathak
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

src/coreclr/inc/corinfo.h Outdated Show resolved Hide resolved
src/coreclr/jit/helperexpansion.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/helperexpansion.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/helperexpansion.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/helperexpansion.cpp Outdated Show resolved Hide resolved
JITDUMP("offsetOfThreadLocalStoragePointer= %u\n", threadStaticBlocksInfo.offsetOfThreadLocalStoragePointer);
JITDUMP("offsetOfMaxThreadStaticBlocks= %u\n", offsetOfMaxThreadStaticBlocksVal);
JITDUMP("offsetOfThreadStaticBlocks= %u\n", offsetOfThreadStaticBlocksVal);
JITDUMP("tlsIndex= %u\n", (ssize_t)threadStaticBlocksInfo.tlsIndex.addr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dump threadStaticBlocksInfo.tlsIndex.accessType also?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it will be useful? I don't see we print accessType elsewhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it will be useful? I don't see we print accessType elsewhere.

I don't know. Structurally, it seems odd to omit it. Up to you.

@ghost ghost added needs-author-action An issue or pull request that requires more info or actions from the author. and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Jul 1, 2023
src/coreclr/jit/gentree.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/helperexpansion.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/lsraarmarch.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/lsraxarch.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/helperexpansion.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/lsraxarch.cpp Outdated Show resolved Hide resolved
@ghost ghost added needs-author-action An issue or pull request that requires more info or actions from the author. and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Jul 3, 2023
@jakobbotsch jakobbotsch dismissed their stale review July 3, 2023 21:15

Addressed

@kunalspathak
Copy link
Member Author

Ping @BruceForstall

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM w/ one nit

src/coreclr/jit/lsraarmarch.cpp Outdated Show resolved Hide resolved
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants