Align arm64 data section as requested #71044

BruceForstall · 2022-06-21T05:30:06Z

Currently, the data section alignment request is ignored unless
it is 8. Since the minimum is 4, this effectively means that
16-byte SIMD16 data alignment requests are ignored. This is likely
because this code was written before arm64 supported SIMD, and was
never revised.

Cases of SIMD loads of constant data lead to larger alignment
padding of the data section. This is somewhat mitigated by
#71043 which fixes a bug with overallocation
and overalignment of SIMD8 data loads.

Currently, the data section alignment request is ignored unless it is 8. Since the minimum is 4, this effectively means that 16-byte SIMD16 data alignment requests are ignored. This is likely because this code was written before arm64 supported SIMD, and was never revised. Cases of SIMD loads of constant data lead to larger alignment padding of the data section. This is somewhat mitigated by dotnet#71043 which fixes a bug with overallocation and overalignment of SIMD8 data loads.

ghost · 2022-06-21T05:30:20Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Currently, the data section alignment request is ignored unless
it is 8. Since the minimum is 4, this effectively means that
16-byte SIMD16 data alignment requests are ignored. This is likely
because this code was written before arm64 supported SIMD, and was
never revised.

Cases of SIMD loads of constant data lead to larger alignment
padding of the data section. This is somewhat mitigated by
#71043 which fixes a bug with overallocation
and overalignment of SIMD8 data loads.

Author:	BruceForstall
Assignees:	BruceForstall
Labels:	`area-CodeGen-coreclr`
Milestone:	-

BruceForstall · 2022-06-21T05:30:34Z

@kunalspathak @tannergooding PTAL
cc @dotnet/jit-contrib

tannergooding · 2022-06-21T15:52:58Z

src/coreclr/jit/emit.cpp

@@ -6330,11 +6330,9 @@ unsigned emitter::emitEndCodeGen(Compiler* comp,
    }

    UNATIVE_OFFSET roDataAlignmentDelta = 0;


Unrelated to your change here, but there is a comment above that reads:

// For arm64/LoongArch64, we want to allocate JIT data always adjacent to code similar to what native compiler does.
// This way allows us to use a single ldr to access such data like float constant/jmp table.
// For LoongArch64 using pcaddi + ld to access such data.

I'm wondering why this is. In particular, x86/x64 explicitly say "don't do this" because it messes with the instruction decoder/cache and can lead to very poor speculative execution, etc.

I would expect Arm64 to have similar limitations and for us to likewise want this data separate from the code. This also includes for other reasons like preventing users from trying to execute "data", etc.

It's reasonable to reconsider. However, on arm64, we have limited addressing mode range for data load instructions. If we put the data in a "data section", we would either have to (1) generate pessimistic code to allow the largest possible range, (2) ensure that data section is "close enough" to the code, or (3) optimistically assume the data is "close enough" to the code, and allow a back-off/retry if not.

Perhaps @TamarChristinaArm or our other friends at ARM could provide input here on what's the recommended/optimal approach and if Arm64 has similar considerations around having data/instructions close together.

I would expect Arm64 to have similar limitations and for us to likewise want this data separate from the code. This also includes for other reasons like preventing users from trying to execute "data", etc.

Indeed we do have similar issues on Arm64 and the NX bits are of particular interest these days. What we try to do in these cases is to create an anchor to the data section, and then subsequent loads just use offsets from the anchor.

typically we also then consider the anchors cheap to re-materialize to avoid spilling them around call sites etc.

If you're doing NX bits you'd have to allocate new pages for the constants anyway, you could consider getting a page near the code. If you're within the range of an adrp+add you can use the adrp as the anchor.

Thanks for the insight here! I'll log an issue capturing this and ensuring we consider the potential impact longer term.

Logged #71155

src/coreclr/jit/emit.cpp

1. On arm64/LA64, if asking for a data alignment greater than code aligment, we need to increase the requested code alignment since the code section is where this data will live. This isn't viewable in SPMI diffs, but it does increase the alignment of some functions from 8 to 16 byte code alignment. 2. Assert that the data section is at least 4 bytes aligned (this is the default in our code, and alignment only increases). 3. Simplify the code setting the alignment flags for allocMem.

BruceForstall · 2022-06-21T17:02:03Z

Pushed a few improvements:

On arm64/LA64, if asking for a data alignment greater than code
alignment, we need to increase the requested code alignment since
the code section is where this data will live. This isn't viewable
in SPMI diffs, but it does increase the alignment of some functions
from 8 to 16 byte code alignment.
Assert that the data section is at least 4 bytes aligned
(this is the default in our code, and alignment only increases).
Simplify the code setting the alignment flags for allocMem.

kunalspathak

Code looks better now. Thanks!

It looks like the buffer pointer passed back from crossgen2 doesn't meet the alignment request. Perhaps it does in the final image, but not in the buffer the JIT fills in? Maybe the asserts could be used for JIT-time but not AOT (when the buffer address is the final location of the code/data)?

tannergooding · 2022-06-21T19:23:26Z

src/coreclr/jit/emit.cpp

@@ -6280,14 +6285,14 @@ unsigned emitter::emitEndCodeGen(Compiler* comp,
        const weight_t scenarioHotWeight = 256.0;
        if (emitComp->fgCalledCount > (scenarioHotWeight * emitComp->fgProfileRunsCount()))
        {
-            allocMemFlag = CORJIT_ALLOCMEM_FLG_16BYTE_ALIGN;
+            codeAlignment = 16;


Do we know why this heuristic is 32-bit x86 only and not also for x64?

I don't know. Maybe historical and should be revisited?

tannergooding · 2022-06-21T19:26:48Z

src/coreclr/jit/emit.cpp

+    {
+        allocMemFlagDataAlign = CORJIT_ALLOCMEM_FLG_RODATA_16BYTE_ALIGN;
+    }
+    else if (dataAlignment == 32)


Just wondering, is there a reason why this one is 16, 32 while code is 32, 16 (just order of the checks)?

What is the default for data alignment if nothing is specified, is it 4 or 8?

Just wondering, is there a reason why this one is 16, 32 while code is 32, 16 (just order of the checks)?

No reason; probably should have been more consistent, but it doesn't actually matter.

What is the default for data alignment if nothing is specified, is it 4 or 8?

It's 8, or 4 for 32-bit platforms with <8 bytes of data.

tannergooding · 2022-06-21T19:27:51Z

src/coreclr/jit/emit.cpp

@@ -6375,6 +6401,18 @@ unsigned emitter::emitEndCodeGen(Compiler* comp,
    {
        assert(((size_t)codeBlock & 31) == 0);
    }
+    if ((allocMemFlag & CORJIT_ALLOCMEM_FLG_16BYTE_ALIGN) != 0)


should this be else if that way it doesn't look like setting both 16 and 32 is valid?

I suppose that makes sense. Probably should be some asserts places (including in the VM and crossgen) that doesn't allow setting multiple; I don't see those asserts today (at least they check 32 before 16).

ghost assigned BruceForstall Jun 21, 2022

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 21, 2022

BruceForstall requested review from kunalspathak and tannergooding June 21, 2022 05:30

tannergooding reviewed Jun 21, 2022

View reviewed changes

src/coreclr/jit/emit.cpp Outdated Show resolved Hide resolved

tannergooding approved these changes Jun 21, 2022

View reviewed changes

kunalspathak approved these changes Jun 21, 2022

View reviewed changes

BruceForstall mentioned this pull request Jun 21, 2022

Fix SIMD data overallocation #71043

Merged

BruceForstall mentioned this pull request Jun 21, 2022

Enable fake hot/cold splitting on ARM64 #70708

Merged

tannergooding reviewed Jun 21, 2022

View reviewed changes

tannergooding approved these changes Jun 21, 2022

View reviewed changes

BruceForstall merged commit 058f83b into dotnet:main Jun 21, 2022

BruceForstall deleted the FixArm64AlignedConstSection branch June 21, 2022 23:03

tannergooding mentioned this pull request Jun 22, 2022

Ensure Arm64 method local constants don't live next to the code #71155

Open

JulieLeeMSFT mentioned this pull request Jul 7, 2022

What's new in .NET 7 Preview 6 [WIP] dotnet/core#7454

Closed

ghost locked as resolved and limited conversation to collaborators Jul 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Align arm64 data section as requested #71044

Align arm64 data section as requested #71044

BruceForstall commented Jun 21, 2022

ghost commented Jun 21, 2022

BruceForstall commented Jun 21, 2022

tannergooding Jun 21, 2022

BruceForstall Jun 21, 2022

tannergooding Jun 21, 2022

TamarChristinaArm Jun 22, 2022

tannergooding Jun 22, 2022 •

edited

Loading

tannergooding Jun 22, 2022

BruceForstall commented Jun 21, 2022

kunalspathak left a comment

tannergooding Jun 21, 2022

BruceForstall Jun 21, 2022

tannergooding Jun 21, 2022

BruceForstall Jun 21, 2022

tannergooding Jun 21, 2022

BruceForstall Jun 21, 2022

		@@ -6330,11 +6330,9 @@ unsigned emitter::emitEndCodeGen(Compiler* comp,
		}

		UNATIVE_OFFSET roDataAlignmentDelta = 0;

Align arm64 data section as requested #71044

Align arm64 data section as requested #71044

Conversation

BruceForstall commented Jun 21, 2022

ghost commented Jun 21, 2022

BruceForstall commented Jun 21, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding Jun 22, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BruceForstall commented Jun 21, 2022

kunalspathak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding Jun 22, 2022 •

edited

Loading