Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JitDasmWithAddress switch to print the process address of every instruction #43120

Merged
merged 6 commits into from
Oct 12, 2020

Conversation

kunalspathak
Copy link
Member

In order to do alignment related investigation, it will be good to have a way to print the process address where JIT code is inserted into. Added COMPlus_JitDasmWithAddress switch to enable printing it. If this switch is set, it will also print the offset at the top of every block. Also I noticed that opts.dspEmit was never getting used, so I deleted it and its usage.

Sample output of x64:

G_M3239_IG01:           ;; offset=0000H
[ 00007ff8`1ba29390 ]        4883EC18             sub      rsp, 24
[ 00007ff8`1ba29394 ]        33C0                 xor      rax, rax
[ 00007ff8`1ba29396 ]        4889442410           mov      qword ptr [rsp+10H], rax
[ 00007ff8`1ba2939b ]        4889442408           mov      qword ptr [rsp+08H], rax
                                                ;; bbWeight=1    PerfScore 2.50
G_M3239_IG02:           ;; offset=0010H
[ 00007ff8`1ba293a0 ]        8B4108               mov      eax, dword ptr [rcx+8]
[ 00007ff8`1ba293a3 ]        4883C10C             add      rcx, 12
[ 00007ff8`1ba293a7 ]        48894C2410           mov      bword ptr [rsp+10H], rcx
[ 00007ff8`1ba293ac ]        488B4C2410           mov      rcx, bword ptr [rsp+10H]
[ 00007ff8`1ba293b1 ]        4885D2               test     rdx, rdx
[ 00007ff8`1ba293b4 ]        7505                 jne      SHORT G_M3239_IG04
                                                ;; bbWeight=1    PerfScore 5.50

x86 output:

G_M3239_IG01:           ;; offset=0000H
[ 08b189c0 ]       55           push     ebp
[ 08b189c1 ]       8BEC         mov      ebp, esp
[ 08b189c3 ]       57           push     edi
[ 08b189c4 ]       56           push     esi
[ 08b189c5 ]       83EC08       sub      esp, 8
[ 08b189c8 ]       33C0         xor      eax, eax
[ 08b189ca ]       8945F4       mov      dword ptr [ebp-0CH], eax
[ 08b189cd ]       8945F0       mov      dword ptr [ebp-10H], eax
                                                ;; bbWeight=1    PerfScore 5.75
G_M3239_IG02:           ;; offset=0010H
[ 08b189d0 ]       8B4104       mov      eax, dword ptr [ecx+4]
[ 08b189d3 ]       83C108       add      ecx, 8
[ 08b189d6 ]       894DF4       mov      bword ptr [ebp-0CH], ecx
[ 08b189d9 ]       8B4DF4       mov      ecx, bword ptr [ebp-0CH]
[ 08b189dc ]       85D2         test     edx, edx
[ 08b189de ]       7504         jne      SHORT G_M3239_IG04
                                                ;; bbWeight=1    PerfScore 5.50

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 6, 2020
@kunalspathak
Copy link
Member Author

@dotnet/jit-contrib

@kunalspathak
Copy link
Member Author

For windows arm, we show the offsets by default something like this:

G_M3239_IG01:      ;; offset=0000H
[ 00007ff8`1ba198a0 ] 000000  E92D 480C      push    {r2,r3,r11,lr}
[ 00007ff8`1ba198a4 ] 000004  F10D 0B08      add     r11, sp, 8
[ 00007ff8`1ba198a8 ] 000008  2200           movs    r2, 0
[ 00007ff8`1ba198aa ] 00000A  9201           str     r2, [sp+0x04]      // [V04 loc2]
[ 00007ff8`1ba198ac ] 00000C  9200           str     r2, [sp]   // [V06 loc4]
                                                ;; bbWeight=1    PerfScore 5.00
G_M3239_IG02:      ;; offset=000EH
[ 00007ff8`1ba198ae ] 00000E  6843           ldr     r3, [r0+4]
[ 00007ff8`1ba198b0 ] 000010  300C           adds    r0, 12
[ 00007ff8`1ba198b2 ] 000012  9001           str     r0, [sp+0x04]      // [V04 loc2]
[ 00007ff8`1ba198b4 ] 000014  9801           ldr     r0, [sp+0x04]      // [V04 loc2]
[ 00007ff8`1ba198b6 ] 000016  2900           cmp     r1, 0
[ 00007ff8`1ba198b8 ] 000018  D101           bne     SHORT G_M3239_IG04
                                                ;; bbWeight=1    PerfScore 6.00

Want to hear the thoughts if I should make offset displaying visible (for all platforms/archs) only when we set COMPlus_JitDasmWithAddress and thus turn off offset display by default even for arm?

@BruceForstall
Copy link
Member

Why would 32-bit arm show 64-bit addresses?

Want to hear the thoughts if I should make offset displaying visible (for all platforms/archs) only when we set COMPlus_JitDasmWithAddress and thus turn off offset display by default even for arm?

IMO, no. I believe they are disabled for COMPlus_JitDiffableDasm. And I think are useful to interpret the EH dumps.

@kunalspathak
Copy link
Member Author

Why would 32-bit arm show 64-bit addresses?

Those are taken using altjit so it is showing the x64 address space.

IMO, no. I believe they are disabled for COMPlus_JitDiffableDasm. And I think are useful to interpret the EH dumps.

Got it. Is there any reason why offset display is enabled by default just on arm and not on other archs?

@BruceForstall
Copy link
Member

nit: regarding the address display: do we need the [ and ]? It doesn't seem like it's ambiguous without it.

@kunalspathak
Copy link
Member Author

nit: regarding the address display: do we need the [ and ]? It doesn't seem like it's ambiguous without it.

Sure, I will remove it.

@BruceForstall
Copy link
Member

Got it. Is there any reason why offset display is enabled by default just on arm and not on other archs?

Not sure. It might be that it is useful when looking at EH or GC just on arm, or, if it is useful on other platforms nobody noticed it and enabled it.

@kunalspathak
Copy link
Member Author

Here is the x64 output after removing [ and ].

G_M3239_IG01:      ;; offset=0000H
 00007ff8`370b9390        4883EC18             sub      rsp, 24
 00007ff8`370b9394        33C0                 xor      rax, rax
 00007ff8`370b9396        4889442410           mov      qword ptr [rsp+10H], rax
 00007ff8`370b939b        4889442408           mov      qword ptr [rsp+08H], rax
                                                ;; bbWeight=1    PerfScore 2.50
G_M3239_IG02:      ;; offset=0010H
 00007ff8`370b93a0        8B4108               mov      eax, dword ptr [rcx+8]
 00007ff8`370b93a3        4883C10C             add      rcx, 12
 00007ff8`370b93a7        48894C2410           mov      bword ptr [rsp+10H], rcx
 00007ff8`370b93ac        488B4C2410           mov      rcx, bword ptr [rsp+10H]
 00007ff8`370b93b1        4885D2               test     rdx, rdx
 00007ff8`370b93b4        7505                 jne      SHORT G_M3239_IG04
                                                ;; bbWeight=1    PerfScore 5.50

src/coreclr/src/jit/compiler.h Outdated Show resolved Hide resolved
src/coreclr/src/jit/jitconfigvalues.h Outdated Show resolved Hide resolved
src/coreclr/src/jit/emit.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/emit.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/emitxarch.cpp Show resolved Hide resolved
@briansull
Copy link
Contributor

briansull commented Oct 7, 2020

Even on 64-bit systems we can probably just display the lowest 32-bit of the addresses:
Due to how we allocate memory pages, I don't believe that we wil ever have a case where we have a method that crosses a 2^32 address boundry.

'370b9390      4883EC18          sub      rsp, 24
'370b9394      33C0              xor      rax, rax
'370b9396      4889442410        mov      qword ptr [rsp+10H], rax
'370b939b      4889442408        mov      qword ptr [rsp+08H], rax

@kunalspathak
Copy link
Member Author

Even on 64-bit systems we can probably just display the lowest 32-bit of the addresses:

Yes, I started with that, but then just reused the FMT_ADDR instead of creating a separate version of it for 64-bit which will just display 32-bit address. But what you suggest is more sensible. Will fix it.

@BruceForstall
Copy link
Member

Even on 64-bit systems we can probably just display the lowest 32-bit of the addresses

But there's no downside, except for display width, right?

And there might be a benefit, if you're looking at the output while in the debugger.

@kunalspathak
Copy link
Member Author

And there might be a benefit, if you're looking at the output while in the debugger.

That's a good point and I agree. It will be much easier to map the instructions (and find address in the debugger disassembly) while doing debugging. I will keep it as it is.

@kunalspathak kunalspathak merged commit b8a92e5 into dotnet:master Oct 12, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants