-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiment - Remove space restriction from XarchIntrinsicConstants #102913
Conversation
Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas |
@@ -50,11 +50,11 @@ extern RhConfig * g_pRhConfig; | |||
|
|||
#if defined(HOST_X86) || defined(HOST_AMD64) || defined(HOST_ARM64) | |||
// This field is inspected from the generated code to determine what intrinsics are available. | |||
EXTERN_C int g_cpuFeatures; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are bit-tested at runtime using compiler generated code. This is going to be quite complicated to do without regressing perf on 32bit x86.
I already wrote this elsewhere - I would like to get an understanding why we're wasting bits in this enum - a lot of the bits are redundant, e.g. XArchIntrinsicConstants_Avx2 and XArchIntrinsicConstants_VectorT256. Why do we need this duplication? Is this putting RyuJIT implementation details ("VectorT256" that is not an ISA extension as far as I know) into PAL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a lot of the bits are redundant, e.g. XArchIntrinsicConstants_Avx2 and XArchIntrinsicConstants_VectorT256
That's 3 bits (VectorT128, VectorT256, VectorT512). I would not call it a lot. Is there anything else?
Why do we need this duplication? Is this putting RyuJIT implementation details ("VectorT256" that is not an ISA extension as far as I know) into PAL?
Right, we do not need the VectorT*
bits in the minipal. They got there by sequence of copy&paste operations of code that was in CoreCLR JIT/EE interface originally.
We do need the VectorT*
bits on JIT/EE interface (that is different enum). The vector size is modelled as virtual ISA there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's 3 bits (VectorT128, VectorT256, VectorT512). I would not call it a lot. Is there anything else?
All of the VL suffixes that seem to just be shortcuts for XArchIntrinsicConstants_Avx512f_vl plus something else that already has a bit.
The question is basically whether we can get away with 32 bits until we're ready to call Sse41 and friends "baseline". The underlying oses are starting to do that on both Windows and Linux side. Or we'll need to do the more complicated thing and extend this to arbitrary lengths like this PR is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a lot of the bits are redundant, e.g. XArchIntrinsicConstants_Avx2 and XArchIntrinsicConstants_VectorT256.
These are not duplicate. The former describes support for Avx2
the latter describes the size of Vector<T>
, these do not have to line up and it may be important to understand the encoded size of Vector<T>
for ABI purposes.
There are some opportunities to fold together bits. For example, AVX512*_VL
could just be Avx512vl
. Given that we require AVX512F+BW+CD+DQ+VL
to be provided simultaneously, we could combine all of these into one as well.
We could notably also not allow as fine-grained control as this currently is allowing. Rather than allowing individual opt-in to SSE3, SSSE3, SSE4.1
, etc, we could force these to be present at the respective target baselines (i.e. x86-64-v2
, x86-64-v3
, x86-64-v4
) and only include the standalone ISAs that are independent of the primary baselines
However, all of that is less control than most other compilers toolchains give and we're likely to run out of bits sooner than we're able to meaningfully compress them and drop support for downlevel ISAs (although I am definitively still supportive of us moving towards an x86-64-v2
baseline). So it's probably goodness that we take a look at this since we're already pushing up against the boundaries already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not duplicate.
They are duplicate/undesirable at PAL level. PAL should be policy free. XArchIntrinsicConstants_... is PAL level enum.
They are not duplicate at the JIT/EE interface level where the Vector policy size policy lives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the VL suffixes that seem to just be shortcuts for XArchIntrinsicConstants_Avx512f_vl plus something else that already has a bit.
I agree that we can use just a single bit for VL in this enum if we needed to free up more bits.
@tannergooding @jkotas @MichalStrehovsky |
Closing this discussion. |
NO NEED FOR REVIEW