Fix SIMD data overallocation #71043

BruceForstall · 2022-06-21T04:54:10Z

The JIT was allocating 16 bytes for TYP_SIMD8 static data loads,
and marking the data section as requiring 16 byte alignment as
well. Fix this to not allocate more than necessary, or over align.

The JIT was allocating 16 bytes for TYP_SIMD8 static data loads, and marking the data section as requiring 16 byte alignment as well. Fix this to not allocate more than necessary, or over align.

ghost · 2022-06-21T04:54:29Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

The JIT was allocating 16 bytes for TYP_SIMD8 static data loads,
and marking the data section as requiring 16 byte alignment as
well. Fix this to not allocate more than necessary, or over align.

Author:	BruceForstall
Assignees:	BruceForstall
Labels:	`area-CodeGen-coreclr`
Milestone:	-

BruceForstall · 2022-06-21T04:54:48Z

@tannergooding @kunalspathak PTAL
cc @dotnet/jit-contrib

Currently, the data section alignment request is ignored unless it is 8. Since the minimum is 4, this effectively means that 16-byte SIMD16 data alignment requests are ignored. This is likely because this code was written before arm64 supported SIMD, and was never revised. Cases of SIMD loads of constant data lead to larger alignment padding of the data section. This is somewhat mitigated by dotnet#71043 which fixes a bug with overallocation and overalignment of SIMD8 data loads.

tannergooding · 2022-06-21T15:37:43Z

src/coreclr/jit/emit.cpp

+        if (emitComp->opts.disAddr)
+        {
+            printf("; @" FMT_ADDR "\n", DBG_ADDR(dst));
+        }
+


Is there an example of what this displayed previously vs what it displays now?

If you set COMPlus_JitDasmWithAddress=1 you'll see:

; @ 000002cc`0d8d4fc4 RWD00 dq 0020002000200020h, 0020002000200020h ; @ 000002cc`0d8d4fd4 RWD16 dq 006200720065006Bh, 0073006F00720065h ; @ 000002cc`0d8d4fe4 RWD32 dq 006F00670065006Eh, 0074006100690074h ; @ 000002cc`0d8d4ff4 RWD48 dq 0074006F00670065h, 0065007400610069h

where the ; @ ... lines are new.

I'd assume those be excluded in DiffableDasm (I've never used JitDasmWithAddress, didn't know it existed 👍)?

Also its worth noting that the address shown there is a multiple of 4, but I would have expected it to be a multiple of 16 (should probably be 000002cc_0d8d4fd0, not 000002cc_0d8d4fc4). Was this done with MIN_DATA_ALIGN or is there a different bug here?

If you set JitDasmWithAddress, you've given up on diffable (and JitDasmWithAddress is normally not set).

As for the multiple of 4 issue: that's what I've fixed with #71044.

tannergooding · 2022-06-21T15:43:31Z

src/coreclr/jit/lower.cpp

-            unsigned cnsSize  = sizeof(constArgValues);
-            unsigned cnsAlign = (comp->compCodeOpt() != Compiler::SMALL_CODE) ? cnsSize : 1;
+            const unsigned cnsAlign =
+                (comp->compCodeOpt() != Compiler::SMALL_CODE) ? cnsSize : emitter::dataSection::MIN_DATA_ALIGN;


I peeked at emitSimd8Const, emitSimd16Const, and emitSimd32Const and they all look to be doing the right thing here, but it might be good for someone else to double check.

This change looks good/correct to me, but it would probably be better if we lowered this to GT_CNS_VEC and went through the new normalized code path. Longer term, it would be good if we could get rid of the last 2-3 usages of SIMDIntrinsicInitN and replace them with SimdAsHWIntrinsic nodes instead.

but it might be good for someone else to double check.

Looks fine to me.

* Align arm64 data section as requested Currently, the data section alignment request is ignored unless it is 8. Since the minimum is 4, this effectively means that 16-byte SIMD16 data alignment requests are ignored. This is likely because this code was written before arm64 supported SIMD, and was never revised. Cases of SIMD loads of constant data lead to larger alignment padding of the data section. This is somewhat mitigated by #71043 which fixes a bug with overallocation and overalignment of SIMD8 data loads. * Additional fixes 1. On arm64/LA64, if asking for a data alignment greater than code alignment, we need to increase the requested code alignment since the code section is where this data will live. This isn't viewable in SPMI diffs, but it does increase the alignment of some functions from 8 to 16 byte code alignment. 2. Assert that the data section is at least 4 bytes aligned (this is the default in our code, and alignment only increases). 3. Simplify the code setting the alignment flags for allocMem. * Formatting + disable alignment asserts It looks like the buffer pointer passed back from crossgen2 doesn't meet the alignment request. Perhaps it does in the final image, but not in the buffer the JIT fills in? Maybe the asserts could be used for JIT-time but not AOT (when the buffer address is the final location of the code/data)?

Fix SIMD data overallocation

3d913c5

The JIT was allocating 16 bytes for TYP_SIMD8 static data loads, and marking the data section as requiring 16 byte alignment as well. Fix this to not allocate more than necessary, or over align.

ghost assigned BruceForstall Jun 21, 2022

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 21, 2022

BruceForstall requested a review from tannergooding June 21, 2022 04:54

BruceForstall requested a review from kunalspathak June 21, 2022 04:54

BruceForstall mentioned this pull request Jun 21, 2022

Align arm64 data section as requested #71044

Merged

tannergooding reviewed Jun 21, 2022

View reviewed changes

tannergooding approved these changes Jun 21, 2022

View reviewed changes

kunalspathak approved these changes Jun 21, 2022

View reviewed changes

BruceForstall merged commit 99e58d5 into dotnet:main Jun 21, 2022

BruceForstall deleted the FixAlignedDataOverallocation branch June 21, 2022 17:13

ghost locked as resolved and limited conversation to collaborators Jul 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix SIMD data overallocation #71043

Fix SIMD data overallocation #71043

BruceForstall commented Jun 21, 2022

ghost commented Jun 21, 2022

BruceForstall commented Jun 21, 2022

tannergooding Jun 21, 2022

kunalspathak Jun 21, 2022

BruceForstall Jun 21, 2022

tannergooding Jun 21, 2022 •

edited

Loading

BruceForstall Jun 21, 2022

tannergooding Jun 21, 2022

kunalspathak Jun 21, 2022

Fix SIMD data overallocation #71043

Fix SIMD data overallocation #71043

Conversation

BruceForstall commented Jun 21, 2022

ghost commented Jun 21, 2022

BruceForstall commented Jun 21, 2022

tannergooding Jun 21, 2022

Choose a reason for hiding this comment

kunalspathak Jun 21, 2022

Choose a reason for hiding this comment

BruceForstall Jun 21, 2022

Choose a reason for hiding this comment

tannergooding Jun 21, 2022 • edited Loading

Choose a reason for hiding this comment

BruceForstall Jun 21, 2022

Choose a reason for hiding this comment

tannergooding Jun 21, 2022

Choose a reason for hiding this comment

kunalspathak Jun 21, 2022

Choose a reason for hiding this comment

tannergooding Jun 21, 2022 •

edited

Loading