-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix SIMD data overallocation #71043
Fix SIMD data overallocation #71043
Conversation
The JIT was allocating 16 bytes for TYP_SIMD8 static data loads, and marking the data section as requiring 16 byte alignment as well. Fix this to not allocate more than necessary, or over align.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsThe JIT was allocating 16 bytes for TYP_SIMD8 static data loads,
|
@tannergooding @kunalspathak PTAL |
Currently, the data section alignment request is ignored unless it is 8. Since the minimum is 4, this effectively means that 16-byte SIMD16 data alignment requests are ignored. This is likely because this code was written before arm64 supported SIMD, and was never revised. Cases of SIMD loads of constant data lead to larger alignment padding of the data section. This is somewhat mitigated by dotnet#71043 which fixes a bug with overallocation and overalignment of SIMD8 data loads.
if (emitComp->opts.disAddr) | ||
{ | ||
printf("; @" FMT_ADDR "\n", DBG_ADDR(dst)); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an example of what this displayed previously vs what it displays now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you set COMPlus_JitDasmWithAddress=1
you'll see:
; @ 000002cc`0d8d4fc4
RWD00 dq 0020002000200020h, 0020002000200020h
; @ 000002cc`0d8d4fd4
RWD16 dq 006200720065006Bh, 0073006F00720065h
; @ 000002cc`0d8d4fe4
RWD32 dq 006F00670065006Eh, 0074006100690074h
; @ 000002cc`0d8d4ff4
RWD48 dq 0074006F00670065h, 0065007400610069h
where the ; @ ...
lines are new.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd assume those be excluded in DiffableDasm
(I've never used JitDasmWithAddress
, didn't know it existed 👍)?
Also its worth noting that the address shown there is a multiple of 4
, but I would have expected it to be a multiple of 16
(should probably be 000002cc_0d8d4fd0
, not 000002cc_0d8d4fc4
). Was this done with MIN_DATA_ALIGN
or is there a different bug here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you set JitDasmWithAddress, you've given up on diffable (and JitDasmWithAddress is normally not set).
As for the multiple of 4 issue: that's what I've fixed with #71044.
unsigned cnsSize = sizeof(constArgValues); | ||
unsigned cnsAlign = (comp->compCodeOpt() != Compiler::SMALL_CODE) ? cnsSize : 1; | ||
const unsigned cnsAlign = | ||
(comp->compCodeOpt() != Compiler::SMALL_CODE) ? cnsSize : emitter::dataSection::MIN_DATA_ALIGN; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I peeked at emitSimd8Const
, emitSimd16Const
, and emitSimd32Const
and they all look to be doing the right thing here, but it might be good for someone else to double check.
This change looks good/correct to me, but it would probably be better if we lowered this to GT_CNS_VEC
and went through the new normalized code path. Longer term, it would be good if we could get rid of the last 2-3 usages of SIMDIntrinsicInitN
and replace them with SimdAsHWIntrinsic
nodes instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it might be good for someone else to double check.
Looks fine to me.
* Align arm64 data section as requested Currently, the data section alignment request is ignored unless it is 8. Since the minimum is 4, this effectively means that 16-byte SIMD16 data alignment requests are ignored. This is likely because this code was written before arm64 supported SIMD, and was never revised. Cases of SIMD loads of constant data lead to larger alignment padding of the data section. This is somewhat mitigated by #71043 which fixes a bug with overallocation and overalignment of SIMD8 data loads. * Additional fixes 1. On arm64/LA64, if asking for a data alignment greater than code alignment, we need to increase the requested code alignment since the code section is where this data will live. This isn't viewable in SPMI diffs, but it does increase the alignment of some functions from 8 to 16 byte code alignment. 2. Assert that the data section is at least 4 bytes aligned (this is the default in our code, and alignment only increases). 3. Simplify the code setting the alignment flags for allocMem. * Formatting + disable alignment asserts It looks like the buffer pointer passed back from crossgen2 doesn't meet the alignment request. Perhaps it does in the final image, but not in the buffer the JIT fills in? Maybe the asserts could be used for JIT-time but not AOT (when the buffer address is the final location of the code/data)?
The JIT was allocating 16 bytes for TYP_SIMD8 static data loads,
and marking the data section as requiring 16 byte alignment as
well. Fix this to not allocate more than necessary, or over align.