-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the CPUID and XSAVE logics for APX #104637
base: main
Are you sure you want to change the base?
Conversation
Hi @tannergooding, I reopened the PR for APX CPUID updates here, I will resolve the conflict on guid, let CI start soon and fix fails popping up. I understand this PR is targeting on the next release cycle, say .Net 10, so when your schedule allows, I wonder if you could review this PR for the first round, thanks! |
resolved the conflict. |
The REX2 enabling PR (#106557) is there, that PR is based on this one, since now main is accepting .net 10 changes, I will rebase the changes to latest main and continue the works. |
d549714
to
3446e28
Compare
resolved conflicts with main |
@tannergooding This PR is ready for review. Build failures are related to the changes in https://github.com/dotnet/runtime/blob/main/src/coreclr/pal/src/arch/amd64/context2.S the native compiler seems to not recognize EGPRs, e.g. r16 as a legal operand. I'm not sure how we can resolve this or if this part is needed for now. And for the changes to accommodate the XSTATE changes for EGPRs, I would be very willing to taking some suggestions from the high level, there might be some changes missing or not reasonable at all. it will be much appreciated if some advice can be shared. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will also need to update the CONTEXT& CONTEXT::operator=(const CONTEXT& ctx) to properly copy the new registers if the XStateFeaturesMask has a mask for this feature. And also make sure that when it is not set, we copy only the necessary part of the context.
@@ -183,6 +183,29 @@ LOCAL_LABEL(Done_Restore_CONTEXT_FLOATING_POINT): | |||
kmovq k6, qword ptr [rdi + (CONTEXT_KMask0 + 6 * 8)] | |||
kmovq k7, qword ptr [rdi + (CONTEXT_KMask0 + 7 * 8)] | |||
|
|||
// TODO-xarch-apx: the definition of XSTATE mask value for APX is now missing on the OS level, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is Unix only code, so the XSTATE mask is not coming from the OS headers, but rather pal.h for C/C++ code and src/coreclr/pal/src/arch/amd64/asmconstants.h for asm. This comment can be removed.
// TODO-xarch-apx: the definition of XSTATE mask value for APX is now missing on the OS level, | ||
// we are currently using bare value to hack it through the build process, and test the implementation through CI. | ||
// those changes will be removed when we have the OS support for APX. | ||
test BYTE PTR [rdi + CONTEXT_XStateFeaturesMask], 524288 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add symbolic definition for the mask to src/coreclr/pal/src/arch/amd64/asmconstants.h next to the other XSTATE_xxx definitions?
@tannergooding assume you plan to review this change? |
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
I will start to work on the merge conflicts on this PR and let the CI start. |
wasm failures looks irrelevant. linux and arm related build failures originate from the ASM code that uses EGPRs in the |
Hi @tannergooding, @BruceForstall This PR is ready for review, the reason for the failures are specified in #104637 (comment), it would be appreciated it if some inputs can be shared on this issue, thanks! |
We should get #109210 merged before this or the AVX10.2 one goes in, since it fixes an issue with the thunk generator. The changes in here mostly look good to me, however. I just need to finish a final review pass over it. |
#109210 has been merged. You'll need to update to fix the merge conflicts. |
Did you fix this issue? We're using clang 18.1.8 to build (in azure Linux containers): dotnet/dotnet-buildtools-prereqs-docker#1207). What version has the APX features required? |
src/coreclr/vm/threadsuspend.cpp
Outdated
@@ -71,6 +71,16 @@ extern "C" void RedirectedHandledJITCaseForGCStress_Stub(void); | |||
#define IS_VALID_WRITE_PTR(addr, size) _ASSERTE((addr) != NULL) | |||
#define IS_VALID_CODE_PTR(addr) _ASSERTE((addr) != NULL) | |||
|
|||
#if defined(TARGET_AMD64) || defined(TARGET_X86) | |||
// These values should be picked up from winrt.h, defining them in case they are missing there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
// These values should be picked up from winrt.h, defining them in case they are missing there. | |
// These values should be picked up from winnt.h, defining them in case they are missing there. |
src/native/minipal/cpufeatures.c
Outdated
static uint32_t apxStateSupport() | ||
{ | ||
#if defined(HOST_APPLE) | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be zero, or an assert, and not false
?
docs/design/features/xarch-apx.md
Outdated
@@ -0,0 +1,3 @@ | |||
# APX Integration in .NET |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doc can be added only once there is something worth talking about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can remove this for now and may add some specification to the design when REX2 or APX-EVEX changes coming in.
We've communicated with our LLVM folks and were told LLVM-19 should have APX support, but eventually we will need LLVM-20 to accommodate the Avx10.2 codegen changes. Do you have any preference how we can make this through? we are okay with upgrading to LLVM-19 this time and LLVM-20 later, or upgrade once altogether. (LLVM-20 release should be out early next year.) |
mov r16, qword ptr [rdi + CONTEXT_Egpr + 0 * 8] | ||
mov r17, qword ptr [rdi + CONTEXT_Egpr + 1 * 8] | ||
mov r18, qword ptr [rdi + CONTEXT_Egpr + 2 * 8] | ||
mov r19, qword ptr [rdi + CONTEXT_Egpr + 3 * 8] | ||
mov r20, qword ptr [rdi + CONTEXT_Egpr + 4 * 8] | ||
mov r21, qword ptr [rdi + CONTEXT_Egpr + 5 * 8] | ||
mov r22, qword ptr [rdi + CONTEXT_Egpr + 6 * 8] | ||
mov r23, qword ptr [rdi + CONTEXT_Egpr + 7 * 8] | ||
mov r24, qword ptr [rdi + CONTEXT_Egpr + 8 * 8] | ||
mov r25, qword ptr [rdi + CONTEXT_Egpr + 9 * 8] | ||
mov r26, qword ptr [rdi + CONTEXT_Egpr + 10 * 8] | ||
mov r27, qword ptr [rdi + CONTEXT_Egpr + 11 * 8] | ||
mov r28, qword ptr [rdi + CONTEXT_Egpr + 12 * 8] | ||
mov r29, qword ptr [rdi + CONTEXT_Egpr + 13 * 8] | ||
mov r30, qword ptr [rdi + CONTEXT_Egpr + 14 * 8] | ||
mov r31, qword ptr [rdi + CONTEXT_Egpr + 15 * 8] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't strictly need LLVM 19
here (or LLVM 20
for AVX10.2).
Both MASM and GAS have directives that allow defining macros and directives that allow emitting raw bytes.
Correspondingly, we should be able to just do .byte 0x00, 0x00, 0x00
(using appropriate bytes of course) and add a comment indicating what it translates to
We could theoretically build something more robust with macros as well, allowing something like mov_egpr(r16, ...)
but that might be "too much" given what we need here is fairly static
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the inputs!
So we can replace these asm code with raw hex code for now to get the CI pass, am I understanding it right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, and log a TODO for us to replace it with the proper instructions when we do eventually update to a newer assembler with the APX support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@@ -91,7 +93,8 @@ | |||
#define CONTEXT_KMask0 CONTEXT_Ymm0H+(16*16) | |||
#define CONTEXT_Zmm0H CONTEXT_KMask0+(8*8) | |||
#define CONTEXT_Zmm16 CONTEXT_Zmm0H+(32*16) | |||
#define CONTEXT_Size CONTEXT_Zmm16+(64*16) | |||
#define CONTEXT_Egpr CONTEXT_Zmm16+(16*8) | |||
#define CONTEXT_Size CONTEXT_Egpr+(64*16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look right. It seems it should be
#define CONTEXT_Egpr CONTEXT_Zmm16+(64*16)
#define CONTEXT_Size CONTEXT_Egpr+(16*8)
@@ -22,6 +22,8 @@ | |||
|
|||
; DO NOT CHANGE R2R NUMERIC VALUES OF THE EXISTING SETS. Changing R2R numeric values definitions would be R2R format breaking change. | |||
|
|||
; The ISA definiitons should also be mapped to `hwintrinsicIsaRangeArray` in hwintrinsic.cpp. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nit:
; The ISA definiitons should also be mapped to `hwintrinsicIsaRangeArray` in hwintrinsic.cpp. | |
; The ISA definitions should also be mapped to `hwintrinsicIsaRangeArray` in hwintrinsic.cpp. |
@@ -125,6 +155,19 @@ static uint32_t avx512StateSupport() | |||
return ((_xgetbv(0) & 0xE6) == 0x0E6) ? 1 : 0; | |||
} | |||
|
|||
#ifndef XSTATE_MASK_APX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was already defined in this file at line 75
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 75 is the definition under UNIX (Avx512 seems to adopt the design that have these mask only explicitly defined under UNIX, so I followed it for APX). And under windows, I suppose the definition should be from winnt.h, I'm not quite familiar with how PAL works with each part. If so, we will need this definition here until OS supports APX and defines these masks in winnt?
@@ -313,6 +313,16 @@ typedef int __ptrace_request; | |||
ASSIGN_CONTROL_REGS \ | |||
ASSIGN_INTEGER_REGS \ | |||
|
|||
#if defined(HOST_AMD64) && defined(XSTATE_SUPPORTED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was already defined in pal.h, it should be visible here.
Previously #103019
Overview on the changes:
XArchIntrinsicConstants
: Compress all the Avx512 related flags into 1 -Avx512f+bw+cd+dq+vl
toAvx512
, this saves more space inXArchIntrinsicConstants
so that we can hold more x86 ISAs, like here, APX.CR4[XSAVE]
(existing) ->XCR0[APX_F]
->CPUID(7,1).EDX[APX_F]
- the current status is that due to the missing macro definition for APX on the OS level, the second check will fail anyways, and it may break the build on CI (to be verified).