Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JIT] x86/x64 - improved localloc codegen in some cases #76851

Merged
merged 15 commits into from
Nov 1, 2022

Conversation

TIHan
Copy link
Contributor

@TIHan TIHan commented Oct 11, 2022

Description

Should resolve: #6546

@BruceForstall and I went through this, and it turns out that it was actually quite simple.

The x86/x64 emitter keeps track of the stack whenever it sees the pattern sub esp, but there are cases where we do not want to do this. To prevent it from keeping track, it would move esp into a temporary register, do sub with that register, and then move it back to esp. This causes two extra instructions to be emitted for x86 under this scenario.

The change adds a new instruction, INS_sub_hide, that does the same thing as sub, but will tell the emitter not to track the stack. We already have other instructions to do this exact thing already with push and pop:

// Does not affect the stack tracking in the emitter
INST5(push_hide,        "push",             IUM_RD, 0x0030FE,     0x000068,     BAD_CODE,     BAD_CODE,     0x000050,    INS_FLAGS_None )
INST5(pop_hide,         "pop",              IUM_WR, 0x00008E,     BAD_CODE,     BAD_CODE,     BAD_CODE,     0x000058,    INS_FLAGS_None )

Example diff:
image

Acceptance Criteria

  • Remove regTmp from genStackPointerConstantAdjustment and possibly further up.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 11, 2022
@ghost ghost assigned TIHan Oct 11, 2022
@ghost
Copy link

ghost commented Oct 11, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Description
@BruceForstall and I went through this, and it turns out that it was actually quite simple.

The x86 emitter keeps track of the stack whenever it sees the pattern sub esp, but there are cases where we do not want to do this. To prevent it from keeping track, it would move esp into a temporary register, do sub with that register, and then move it back to esp. This causes two extra instructions to be emitted for x86 under this scenario.

The change adds a new instruction, INS_sub_hide, that does the same thing as sub, but will tell the emitter not to track the stack. We already have other instructions to do this exact thing already with push and pop:

// Does not affect the stack tracking in the emitter
INST5(push_hide,        "push",             IUM_RD, 0x0030FE,     0x000068,     BAD_CODE,     BAD_CODE,     0x000050,    INS_FLAGS_None )
INST5(pop_hide,         "pop",              IUM_WR, 0x00008E,     BAD_CODE,     BAD_CODE,     BAD_CODE,     0x000058,    INS_FLAGS_None )

Acceptance Criteria

  • RemoveregTmp from genStackPointerConstantAdjustment and possibly further up.
Author: TIHan
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@TIHan TIHan marked this pull request as ready for review October 24, 2022 19:37
@TIHan
Copy link
Contributor Author

TIHan commented Oct 24, 2022

@dotnet/jit-contrib This is ready assuming CI succeeds. cc @BruceForstall

@kunalspathak
Copy link
Member

@dotnet/jit-contrib This is ready assuming CI succeeds. cc @BruceForstall

Could you paste sample diffs?

@TIHan TIHan changed the title [JIT] x86 - improved localloc codegen in some cases [JIT] x86/x64 - improved localloc codegen in some cases Oct 24, 2022
@TIHan
Copy link
Contributor Author

TIHan commented Oct 24, 2022

Almost all diffs look like this:
image

Was actually surprised, I didn't expect to get x64 diffs but I did find another part where we did the mov, sub, mov in a potential x64 path, so I replaced it using sub_hide.

@TIHan
Copy link
Contributor Author

TIHan commented Oct 25, 2022

/azp run runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@TIHan
Copy link
Contributor Author

TIHan commented Oct 25, 2022

@BruceForstall Outerloop did pass, apart from the formatting failures.

src/coreclr/jit/codegenxarch.cpp Show resolved Hide resolved
src/coreclr/jit/codegenxarch.cpp Outdated Show resolved Hide resolved
@ghost ghost added needs-author-action An issue or pull request that requires more info or actions from the author. and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Oct 25, 2022
@TIHan
Copy link
Contributor Author

TIHan commented Oct 25, 2022

@BruceForstall I've made changes to BuildLclHeap in LSRA - also added asserts to the temp register count in codegen to verify.
I ran SuperPMI replay locally for both x86/x64 and they succeeded.

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple more comments

src/coreclr/jit/codegenxarch.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/codegenxarch.cpp Outdated Show resolved Hide resolved
TIHan and others added 3 commits October 26, 2022 09:51
Co-authored-by: Bruce Forstall <brucefo@microsoft.com>
Co-authored-by: Bruce Forstall <brucefo@microsoft.com>
@TIHan
Copy link
Contributor Author

TIHan commented Oct 31, 2022

@dotnet/jit-contrib This is ready again.

@BruceForstall
Copy link
Member

Diffs

@TIHan TIHan merged commit 2fb0a8c into dotnet:main Nov 1, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Dec 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RyuJIT x86: improve localloc codegen in some cases
3 participants