-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup some handling around Avx10v1 #103241
Conversation
Note regarding the
|
Note regarding the
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
src/coreclr/vm/codeman.cpp
Outdated
if (((cpuFeatures & XArchIntrinsicConstants_Avx512f) != 0) && | ||
((cpuFeatures & XArchIntrinsicConstants_Avx512f_vl) != 0) | ||
((cpuFeatures & XArchIntrinsicConstants_Avx512bw) != 0) | ||
((cpuFeatures & XArchIntrinsicConstants_Avx512bw_vl) != 0) | ||
((cpuFeatures & XArchIntrinsicConstants_Avx512cd) != 0) | ||
((cpuFeatures & XArchIntrinsicConstants_Avx512cd_vl) != 0) | ||
((cpuFeatures & XArchIntrinsicConstants_Avx512dq) != 0) | ||
((cpuFeatures & XArchIntrinsicConstants_Avx512dq_vl) != 0)) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might require a R2R breaking change, but it would be nice if we could just compress these down to 1 bit and cleanup the CorInfoInstructionSet
handling a bit as well.
We're never going to support, in the JIT at least, having the baseline AVX512 ISAs independent, so it'd be nicer if we just had DOTNET_EnableAVX512
, XArchIntrinsicConstants_Avx512
, and InstructionSet_Avx512
instead. That would also allow us to do simpler queries in the JIT around ISA support
At the same time, the nested classes are implied by other information. For example, the X64
, VL
, and V512
nested classes are all implied by the enclosing type being supported and another implicit bit of information such as the target architecture or if the base class is supported. So we really don't "need" all the extra logic and instruction set IDs to support those.
CC. @dotnet/jit-contrib for review and @dotnet/avx512-contrib as an FYI Just some post cleanup work for the AVX10v1 support that went in last week. |
(Resolved merge conflicts) |
CC. @jkotas for the VM side changes. Relatively straightforward as detailed above, mostly just ensuring that the As called out, it would be nice if we could simplify this even more but I expect that'd require a breaking change to the config knobs we expose and to the R2R format. |
src/coreclr/tools/Common/JitInterface/ThunkGenerator/InstructionSetDesc.txt
Show resolved
Hide resolved
src/native/minipal/cpufeatures.c
Outdated
} | ||
} | ||
result |= XArchIntrinsicConstants_Evex; | ||
result |= XArchIntrinsicConstants_Avx512f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we bake the JIT policy of requiring all these features together at the same time into the minipal, we can fold all these bits into just a single bit XArchIntrinsicConstants_Avx512f
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code in JIT/EE interface https://github.com/dotnet/runtime/pull/103241/files#diff-eba17a72b6caa8b52b196f3b84bfd20dfeaf39a89f410943a692d1f33f569f3bR1366 does the same thing. What's the benefit of doing this in the minipal too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we bake the JIT policy of requiring all these features together at the same time into the minipal, we can fold all these bits into just a single bit XArchIntrinsicConstants_Avx512f.
Right, and I think that's desirable long term. I've updated the PR to fold it down to Avx512
and Avx512Vbmi
What's the benefit of doing this in the minipal too?
It's helping enforce the consistency everywhere that consumes the CPUID bits.
What I ended up finding is that the implication
defined in InstructionSetDesc
don't work well with recursive implication sets. This is because it basically does a naive ordering and so what happened is that Avx512F
would be supported, which meant that Avx512BW/CD/DQ
had their requirements met. Then at end when processing the recursive implications, if one of BW/CD/DQ/VL
wasn't supported, F
would be unset but the others wouldn't leading to a torn view.
While handling this in codeman is sufficient for the JIT, it doesn't necessarily mean we followed suite in NAOT or other scenarios as they could get disconnected. So simply enforcing this at the minipal level makes it a bit easier and ensures everywhere has a consistent view of what we require to be supported together.
More ideally we'd also clean this up at the InstructionSet_*
level too as we don't need instructionset64bit
(this is implied by the target architecture bitness), we don't need instructionset Avx512F_VL
(this could be compressed down to just Avx512F
), etc. I think doing this when we're ready for the next R2R breaking change would be good, as it should free up bits and reduce the overall complexity of the JIT by just requiring a little bit of handling in the lookupIntrinsicId
function instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think doing this when we're ready for the next R2R breaking change would be good
We just took a R2R breaking change a few days ago, so if you would like to cleanup the CorInfoInstructionSet
bits, it is a good time now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍, I'd like to do that in a follow up PR then so that the more core fixes can go in. The CorInfoInstructionSet cleanup is just nice to have at that point.
XArchIntrinsicConstants_Serialize = 0x20000, | ||
XArchIntrinsicConstants_Avx10v1 = 0x40000, | ||
XArchIntrinsicConstants_Avx10v1_V512 = 0x80000, | ||
XArchIntrinsicConstants_Evex = 0x100000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need the Evex bit here? It is implied by other bits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Evex
bit is intentional and a bit special (will work on getting an explanation of it added to the docs).
The consideration is that with Avx10v1
we have two different ways to say that certain 128-bit or 256-bit EVEX encoded instructions are supported. These instructions having been made available originally as part of AVX512VL
(which requires 512-bit support) but now also being available in AVX10v1
(which does not require 512-bit support).
So, if code is using say Avx512F.VL.*
(and never uses say Avx512F.*
), then what we want is to actually track this as requiring the EVEX bit and not as requiring Avx10v1
or Avx512
specifically, since it will work provided that either of these two ISAs are supported. This allows the JIT to track and report the actual requirements for certain instructions more cleanly (we only need to check the EVEX
bit for such instructions) and allows NAOT images to be generated that don't fail if you're code was written using one but was generated using another.
If we had known about Avx10v1
prior to shipping Avx512
support, I imagine we would've exposed the managed API surface differently. We probably would've intentionally designed the surface area a bit differently (even with it not matching hardware). In particular, the actual setup we have is basically the following, where AvxF+BW+CD+DQ
and below that have a nested V512
type representing the 512-bit extensions and where we would've come up with an ISA name to represent AvxF+BW+CD+DQ
(possibly just Evex
, maybe something different):
graph TD;
Sse2-->Aes;
Vex-->Avx;
Avx-->Avx2;
Evex-->AvxF+BW+CD+DQ;
AvxF+BW+CD+DQ-->AvxVbmi;
AvxF+BW+CD+DQ-->AvxIfma;
AvxF+BW+CD+DQ-->AvxBf16;
AvxF+BW+CD+DQ-->AvxBitalg;
AvxF+BW+CD+DQ-->AvxFp16;
AvxF+BW+CD+DQ-->AvxVbmi2;
AvxF+BW+CD+DQ-->AvxVpopcntdq;
AvxVbmi-->Avx10v1;
AvxIfma-->Avx10v1;
AvxBf16-->Avx10v1;
AvxBitalg-->Avx10v1;
AvxFp16-->Avx10v1;
AvxVbmi2-->Avx10v1;
AvxVpopcntdq-->Avx10v1;
Avx2-->AvxVnni;
Vex-->Bmi1;
Vex-->Bmi2;
Avx2-->Evex;
Fma-->Evex;
Avx-->Fma;
X86Base-->Lzcnt
Sse2-->Pclmulqdq
Sse42-->Popcnt
X86Base-->Sse;
Sse-->Sse2;
Sse2-->Sse3;
Sse3-->Ssse3;
Ssse3-->Sse41;
Sse41-->Sse42;
Sse42-->Vex;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is fine for the JIT to have internal methods that return true when it is fine to use Vex or Evex instruction encodings. I do not see why the Evex bit has to leak outside the JIT all the way to the minipal. The Vex bit does not leak all the way to the minipal either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The consideration is that for EVEX
it's not internal to the JIT.
Anytime the JIT does a compExactlyDependsOn(InstructionSet_*)
or compOpportunisticallyDependsOn(InstructionSet_*)
query, it gets reported back to the VM. The different VMs handle it as follows:
- For JIT this reporting is a
no-op
- For R2R this reporting exactly tracks the
InstructionSet_*
queried and whether it was "exactly" or "opportunistic"- This is where the cleanup of
InstructionSet_*
would be beneficial, because we have lots more flags than actually needed
- This is where the cleanup of
- For AOT this reporting gets compressed back to
XArchIntrinsicConstants
which gets queried for an exact match on AOT startup
So, we need the JIT query for EVEX
usage to map back to InstructionSet_EVEX
for R2R and XArchIntrinsicConstants_EVEX
for AOT that way if the code only uses Avx512F.VL.Abs
, it can work when Avx10v1
without V512 support exists. Otherwise we'd need it to report as Avx10v1
or Avx512
, both of which cut out hardware that would otherwise work with the binary. We only query the more exact ISA when it isn't part of that synthetic EVEX baseline
and thus needs to restrict the target.
For VEX
there isn't any need, because the hierarchy is logical and makes sense. There is no historical split where two separate ISAs give access to the same underlying instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, got it.
EVEX is low-level instruction encoding. I think it would be best to confine it to the JIT. The mapping between EVEX and the ISAs is a bit of mental leap outside the JIT.
Would it be better to have bits for Avx512_VLOnly
virtual ISA that is the common subset of Avx512
and Avx10v1
? I assume that all Avx512*.VL
ISAs are supported in Avx10v1
. Is that correct?
if the code only uses Avx512F.VL.Abs, it can work when Avx10v1 without V512 support exists.
Is the user code going to look like this:
if (Avx512F.VL.IsSupported || Avx10v1.IsSupported)
{
... code that uses Avx512F.VL.Abs ...
}
Or are we going to make Avx512F.VL.IsSupported
return true
for Avx10v1
so that Avx10v1.IsSupported
part of the condition is not needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, just as an idea, we could call it just Avx10 or Avx10v0 (representing the baseline common to Avx512 and Avx10v1)? Alternatively we could call it just AvxF_BW_CD_DQ?
I have looked around how other projects are solving this naming problem. LLVM seems to use EVEX256
name for the Avx10v1
and Avx512
intersection. I guess it is how we ended up with EVEX
here. I like the 256
suffix. It helps with clarifying that the virtual ISA represents the non-512 intersection.
On a slightly related point, I believe it would it be a binary breaking change to change from an abstract class to an interface (i.e. change public abstract class Avx to instead be public interface Avx), correct?
It should not be a breaking change for typical uses, but as you have said there are likely cases that it would break.
I am not sure how well the ISAs modeling via inheritance hierarchy is going to work going forward. I would not be surprised if we see more places where new hardware revisions carve out subsets of existing instructions and give it a new ISA name. It just happened with Avx512/Avx10v1, Arm64 streaming mode is heading that way, and RISCV may end up there too given its development process. We may want to think about different approach to accommodate these trends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it is how we ended up with EVEX here. I like the 256 suffix. It helps with clarifying that the virtual ISA represents the non-512 intersection.
👍
I am not sure how well the ISAs modeling via inheritance hierarchy is going to work going forward.
I agree we should think some more, but overall I do believe this is still the correct model. We've seen quite a bit of praise on how we've modeled everything as compared to how C/C++ or Rust do it. The overall separation of xplat APIs
vs platform specific APIs
and then the division of the latter into clear classes that are based on the underlying hardware checks has helped make discoverability, approachability, and usability of the feature in .NET one of the best.
It just happened with Avx512/Avx10v1
This one I think was more x64 having tried something new with Avx512
and then finding out that it didn't work in practice, so they are now fixing it by reconverging everything with Avx10v1
. The architectural documents around Avx10v1
then cover how they plan to handle this moving forward and that it's effectively going back to the model we had seen previously with Sse
and Avx
(where you have a baseline encoding and then incremental functionality extended on top of this in a version like manner).
Arm64 streaming mode is heading that way
This is a case where there is still a clear hierarchy given we have SveStreaming
and SveNonStreaming
as the split.
Even if we had shipped class Sve
prior to streaming support becoming defined, we could have inserted a class SveStreaming
and had Sve
derive from it, moving relevant methods down (since that is safe to do for classes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-- What I'll plan on doing given the conversation so far is getting this in as is first as it fixes some bugs without the refactoring being too complex and puts us in a good position for .NET 9.
I'll then follow up with a PR that does a broader cleanup of the data used by R2R and changes EVEX
to EVEX256
(which will itself require some refactoring in the JIT).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we had shipped class Sve prior to streaming support becoming defined, we could have inserted a class SveStreaming and had Sve derive from it, moving relevant methods down (since that is safe to do for classes).
I think this hierarchy would look weird since the streaming and non-streaming support do not form a hierarchy. You can have one without the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
streaming and non-streaming support do not form a hierarchy
I think this is more just the names aren't the "right" ones.
What I had meant is that you have two sets of instructions. That is, there are SVE instructions that are available both in regular SVE and in SVE streaming mode. Then there are SVE instructions which are not available in streaming mode unless another architectural bit is set.
LGTM |
This does some minor cleanup around the new AVX10v1 support introduced in #101938
Namely it:
instructionsetdesc
implications to not repeat theEVEX
dependencies twiceminipal_getcpufeatures
andEEJitManager::SetCpuInfo
functions only report Avx512F if F+BW+CD+DQ+VL are supported