Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix SIMD data overallocation #71043

Merged

Conversation

BruceForstall
Copy link
Member

The JIT was allocating 16 bytes for TYP_SIMD8 static data loads,
and marking the data section as requiring 16 byte alignment as
well. Fix this to not allocate more than necessary, or over align.

The JIT was allocating 16 bytes for TYP_SIMD8 static data loads,
and marking the data section as requiring 16 byte alignment as
well. Fix this to not allocate more than necessary, or over align.
@ghost ghost assigned BruceForstall Jun 21, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 21, 2022
@ghost
Copy link

ghost commented Jun 21, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

The JIT was allocating 16 bytes for TYP_SIMD8 static data loads,
and marking the data section as requiring 16 byte alignment as
well. Fix this to not allocate more than necessary, or over align.

Author: BruceForstall
Assignees: BruceForstall
Labels:

area-CodeGen-coreclr

Milestone: -

@BruceForstall
Copy link
Member Author

@tannergooding @kunalspathak PTAL
cc @dotnet/jit-contrib

BruceForstall added a commit to BruceForstall/runtime that referenced this pull request Jun 21, 2022
Currently, the data section alignment request is ignored unless
it is 8. Since the minimum is 4, this effectively means that
16-byte SIMD16 data alignment requests are ignored. This is likely
because this code was written before arm64 supported SIMD, and was
never revised.

Cases of SIMD loads of constant data lead to larger alignment
padding of the data section. This is somewhat mitigated by
dotnet#71043 which fixes a bug with overallocation
and overalignment of SIMD8 data loads.
Comment on lines +7817 to +7821
if (emitComp->opts.disAddr)
{
printf("; @" FMT_ADDR "\n", DBG_ADDR(dst));
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an example of what this displayed previously vs what it displays now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you set COMPlus_JitDasmWithAddress=1 you'll see:

; @ 000002cc`0d8d4fc4 
RWD00  	dq	0020002000200020h, 0020002000200020h
; @ 000002cc`0d8d4fd4 
RWD16  	dq	006200720065006Bh, 0073006F00720065h
; @ 000002cc`0d8d4fe4 
RWD32  	dq	006F00670065006Eh, 0074006100690074h
; @ 000002cc`0d8d4ff4 
RWD48  	dq	0074006F00670065h, 0065007400610069h

where the ; @ ... lines are new.

Copy link
Member

@tannergooding tannergooding Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd assume those be excluded in DiffableDasm (I've never used JitDasmWithAddress, didn't know it existed 👍)?

Also its worth noting that the address shown there is a multiple of 4, but I would have expected it to be a multiple of 16 (should probably be 000002cc_0d8d4fd0, not 000002cc_0d8d4fc4). Was this done with MIN_DATA_ALIGN or is there a different bug here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you set JitDasmWithAddress, you've given up on diffable (and JitDasmWithAddress is normally not set).

As for the multiple of 4 issue: that's what I've fixed with #71044.

unsigned cnsSize = sizeof(constArgValues);
unsigned cnsAlign = (comp->compCodeOpt() != Compiler::SMALL_CODE) ? cnsSize : 1;
const unsigned cnsAlign =
(comp->compCodeOpt() != Compiler::SMALL_CODE) ? cnsSize : emitter::dataSection::MIN_DATA_ALIGN;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I peeked at emitSimd8Const, emitSimd16Const, and emitSimd32Const and they all look to be doing the right thing here, but it might be good for someone else to double check.

This change looks good/correct to me, but it would probably be better if we lowered this to GT_CNS_VEC and went through the new normalized code path. Longer term, it would be good if we could get rid of the last 2-3 usages of SIMDIntrinsicInitN and replace them with SimdAsHWIntrinsic nodes instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it might be good for someone else to double check.

Looks fine to me.

@BruceForstall BruceForstall merged commit 99e58d5 into dotnet:main Jun 21, 2022
@BruceForstall BruceForstall deleted the FixAlignedDataOverallocation branch June 21, 2022 17:13
BruceForstall added a commit that referenced this pull request Jun 21, 2022
* Align arm64 data section as requested

Currently, the data section alignment request is ignored unless
it is 8. Since the minimum is 4, this effectively means that
16-byte SIMD16 data alignment requests are ignored. This is likely
because this code was written before arm64 supported SIMD, and was
never revised.

Cases of SIMD loads of constant data lead to larger alignment
padding of the data section. This is somewhat mitigated by
#71043 which fixes a bug with overallocation
and overalignment of SIMD8 data loads.

* Additional fixes

1. On arm64/LA64, if asking for a data alignment greater than code
alignment, we need to increase the requested code alignment since
the code section is where this data will live. This isn't viewable
in SPMI diffs, but it does increase the alignment of some functions
from 8 to 16 byte code alignment.
2. Assert that the data section is at least 4 bytes aligned
(this is the default in our code, and alignment only increases).
3. Simplify the code setting the alignment flags for allocMem.

* Formatting + disable alignment asserts

It looks like the buffer pointer passed back from crossgen2 doesn't
meet the alignment request. Perhaps it does in the final image, but
not in the buffer the JIT fills in? Maybe the asserts could be used
for JIT-time but not AOT (when the buffer address is the final location
of the code/data)?
@ghost ghost locked as resolved and limited conversation to collaborators Jul 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants