Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing the InstructionSetDesc implications #86486

Merged
merged 28 commits into from
Jun 2, 2023

Conversation

tannergooding
Copy link
Member

@tannergooding tannergooding commented May 19, 2023

Extracted from #85551

This primarily handles ensuring the InstructionSetDesc implications are correct. In particular it ensures that Avx512F is dependent on both AVX2 and FMA. It likewise ensures that the nested VL classes are dependent on both the base VL class and the containing class (e.g. Avx512BW.VL is dependent on both Avx512BW and Avx512F.BW).

It also adds a recursive implication that ensures that F + BW + CD + DQ + VL are only ever enabled as a set. This is done to simplify the current implementation of the JIT without tying us to that behavior permanently. It would be non-trivial work and not "pay for play" to allow this to be represented as a single R2R flag instead.

Finally, as part of fixing this there were a couple minor bugs surfaced. In particular, avx-vnni was marked as opportunistic which in turn marked avx2 as opportunistic. This is problematic if the user decides to target avx but not avx2 and was causing Vector<T> and Vector256<T> to behave incorrectly. This was handled by ensuring avx-vnni is only marked as opportunistic if avx2 is explicitly supported and ensuring that the NI_IsSupported_Dynamic is only returned for opportunistic APIs.

Tests were added covering NAOT, CG2, and JIT validating that the full set of currently exposed ISAs behave correctly. This includes when users explicitly opt into or out-of a given ISA.

@ghost ghost assigned tannergooding May 19, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label May 19, 2023
@tannergooding
Copy link
Member Author

CC. @jkotas, @davidwrighton

This is the last bit to break apart from #85551 and will leave that as just the Vector<T> changes.

@tannergooding tannergooding force-pushed the prefer-vector-width-4 branch 2 times, most recently from be5596a to 381f759 Compare May 19, 2023 15:58
@tannergooding tannergooding force-pushed the prefer-vector-width-4 branch from 381f759 to 89b5aff Compare May 19, 2023 16:49
@jkotas jkotas requested a review from davidwrighton May 19, 2023 17:16
implication ,X86 ,AVX512VBMI_VL ,AVX512BW_VL

; While the AVX-512 ISAs can be individually lit-up, they really
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidwrighton Could you please review these new implications?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These implications don't entirely make sense to me. What they indicate is that if any of avx512cd, bw, f, bmi, or the vl variants are enabled, then all the various instruction sets will be considered to be enabled. If that's the case, then why do we need so many flags? I don't understand. In addition, if you're making the implications have this circular flow that isn't actually required by the flags the processor specifies, we need to treat them similarly in the avx512 detection logic in codeman.cpp. Thus, we can't enable ANY of the AVX512 support in the runtime unless all of the associated cpu flags are enabled.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What they indicate is that if any of avx512cd, bw, f, bmi, or the vl variants are enabled, then all the various instruction sets will be considered to be enabled

Yes and the comment is meant to explain that, if you have any ideas on how to reword it then please feel free to suggest them.

If that's the case, then why do we need so many flags

We have an actual split in CPUID checks and in other corresponding logic that ties class names to flag names. We may also want to actually allow splitting this in the future, such as if we decide to support Knight's Landing or if some future hardware decides to only support a subset of AVX512.

In addition, if you're making the implications have this circular flow that isn't actually required by the flags the processor specifies, we need to treat them similarly in the avx512 detection logic in codeman.cpp

This is already handled in codeman by virtue of the:

    CPUCompileFlags.Set64BitInstructionSetVariants();
    CPUCompileFlags.EnsureValidInstructionSetSupport();

If Avx512F was supported but Avx512BW was not, then Avx512F would be turned off by the EnsureValidInstructionSetSupport call.

Copy link
Member

@MichalStrehovsky MichalStrehovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NativeAOT smoke test change looks good. Thanks for extending the coverage!

@tannergooding tannergooding added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI avx512 Related to the AVX-512 architecture and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels May 22, 2023
@ghost
Copy link

ghost commented May 22, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Extracted from #85551

Author: tannergooding
Assignees: tannergooding
Labels:

area-CodeGen-coreclr, arch-avx512, needs-area-label

Milestone: -

Copy link
Member

@davidwrighton davidwrighton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need a better explanation for what we're doing here. See my comments.

optimisticInstructionSetSupportBuilder.AddSupportedInstructionSet("avxvnni");
}

if (supportedInstructionSet.HasInstructionSet(InstructionSet.X64_AVX2))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoiding having avx2 be opportunistic because there are some potentially deeper problems and involvement with how it impacts Vector<T>, as you called out on the other PR.

AvxVnni inherits Avx2 and so it being opportunistic means that Avx2 is also opportunistic. This shipped that way in .NET 7 and so its possible there is some R2R issue there.

{
if (comp->compExactlyDependsOn(isa))
if (!comp->IsTargetAbi(CORINFO_NATIVEAOT_ABI) || comp->compExactlyDependsOn(isa))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain what change is happening here? I don't see a description of what we're changing around NativeAOT behavior in the change description.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was an existing bug here that was surfaced if crossgen/naot targeted avx but not avx2

For most cases, the ISA initially checked and tracked as part of isIsaSupported is the same as what is tracked by the InstructionSetDesc

However, for Vector256 in particular we have the case where the implication is on Avx and we will accelerate some APIs when only Avx is supported. But, we only want IsHardwareAccelerated to report true when Avx2 is also supported.

We were then ending up in a scenario where we'd end up failing to handle IsHardwareAccelerated for the recursive case when avx was supported but avx2 was not because isIsaSupported (AVX) would be true and then we'd fail the compExactlyDependsOn check for AVX2 and then return NI_IsSupported_Dynamic, which was incorrect since avx2 was not opportunistic.

This fixes that so we now ensure that we only go down the true/dynamic path if the compiler could support AVX2 at all.

implication ,X86 ,AVX512VBMI_VL ,AVX512BW_VL

; While the AVX-512 ISAs can be individually lit-up, they really
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These implications don't entirely make sense to me. What they indicate is that if any of avx512cd, bw, f, bmi, or the vl variants are enabled, then all the various instruction sets will be considered to be enabled. If that's the case, then why do we need so many flags? I don't understand. In addition, if you're making the implications have this circular flow that isn't actually required by the flags the processor specifies, we need to treat them similarly in the avx512 detection logic in codeman.cpp. Thus, we can't enable ANY of the AVX512 support in the runtime unless all of the associated cpu flags are enabled.

@ghost ghost added needs-author-action An issue or pull request that requires more info or actions from the author. and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels May 30, 2023
@runfoapp runfoapp bot mentioned this pull request Jun 1, 2023
@tannergooding tannergooding merged commit c81e83a into dotnet:main Jun 2, 2023
@tannergooding tannergooding deleted the prefer-vector-width-4 branch June 2, 2023 03:25
@ghost ghost locked as resolved and limited conversation to collaborators Jul 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI avx512 Related to the AVX-512 architecture
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants