-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contain memory operands under casts #72719
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsFold the sign/zero-extension into an appropriate load. The change implements full support on XARCH, partial support on ARM/64, and punts LA. A follow-up issue will be filed to track the remaining work. The ARM/64 commit contains some notes on what is required for said support on our load/store architectures. The diffs are nice and simple: - ldr lr, [sp+0x34] // [V07 arg7]
- uxtb lr, lr
+ ldrb lr, [sp+0x34] // [V07 arg7]
|
d3d8af4
to
6ffddae
Compare
@dotnet/jit-contrib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The diffs look really good.
I tried doing something similar here: #70756 - but I didn't get to work on it longer.
I think it would be good to run the fuzzers and [libraries] stress on this change. |
/azp run Fuzzlyn, Antigen, runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-coreclr gcstress0x3-gcstress0xc, runtime-coreclr jitstressregs |
Azure Pipelines successfully started running 6 pipeline(s). |
TODO: consider using a dedicated IND_EXT oper for ARM/ARM64 instead of containment. This would allow us to cleany handle all indirections. It would not mean we'd give up on the casts containment, as we'd still need to handle the "reg optional" case. IND_EXT will be much like an ordinary IND, but have a "source" and "target" types. The "target" type would always be int/long, while "source" could be of any integral type. This design would be a bit more natural, and nicely separable from casts. However, the main problem with the current state of things, apart from the fact codegen of indirections is tied strongly to "GenTreeIndir", is the fact that changing type of the load can invalidate LEA containment. One would think this is solvable with some tricks, like re-running containment analysis on an indirection after processing the cast, but ARM64 codegen doesn't support uncontained LEAs in some cases. A possible solution to that problem is uncontaining the whole address tree. That would be messy, but ought to work. An additional complication is that these trees can contain a lot of contained operands as part of ADDEX and BFIZ, so what would have to be done first is the making of these into proper EXOPs. In any case, this is all future work.
In CAST<short>(IND<byte>(...)), "m_extendSrcSize" must be "1". Modulo the above, stress runs looked clean.
0da2d79
to
7ec2a8a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's curious to me that there are so many x86 improvements. Do we not already removed most of these unnecessary casts there by retyping loads in morph?
I think most of these come from normalize-on-load stack parameters, which we do not retype. |
Doesn't morph introduce this IR shape itself in |
It of course does, and it is, but it is necessary for in-register parameters (and on-stack ones as well, though that should be relatively easy to fix by using extending loads in Morph's logic there is questionable in more than one way. If we know the local will be DNER at that point, we could retype it to its small type instead of introducing a cast. However, this turns out to sometimes not be a win because we don't CSE locals, but do CSE casts. |
Makes sense about CSE. The way Anyway, thanks for the contribution as usual. |
Improvements on |
Possible regressions: |
Recognize legal patterns in lowering and then fold the sign/zero-extension into an appropriate load at codegen time.
The change implements full support on XARCH, partial support on ARM/64, and punts LA. A follow-up issue will be filed to track the remaining work. The ARM/64 commit contains some notes on what is required for said support on our load/store architectures.
The diffs are nice and simple:
Regressions are RA/alignment.