Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check opportunistic ISAs in crossgen for CompExactlyDependsOn #90326

Closed
wants to merge 2 commits into from

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Aug 10, 2023

@stephentoub noticed that the following method shows up as non-prejitted in an empty app:

[MethodImpl(MethodImplOptions.AggressiveInlining)]
[CompExactlyDependsOn(typeof(Ssse3))]
[CompExactlyDependsOn(typeof(AdvSimd.Arm64))]
private static (Vector128<byte>, Vector128<byte>, Vector128<byte>) FormatGuidVector128Utf8(Guid value, bool useDashes)
{

Crossgen uses SSE2 as a baseline (+ SSE4.2 as opportunistic) we should still compile this method since it can be reached with opportunistic ISAs.

With this fix I no longer see this method non-prejitted (along with a few others)

@EgorBo
Copy link
Member Author

EgorBo commented Aug 10, 2023

@dotnet/crossgen-contrib PTAL

@@ -604,7 +604,9 @@ public static bool ShouldCodeNotBeCompiledIntoFinalImage(InstructionSetSupport i
// This instruction set isn't supported on the current platform at all.
continue;
}
if (instructionSetSupport.IsInstructionSetSupported(instructionSet) || instructionSetSupport.IsInstructionSetExplicitlyUnsupported(instructionSet))
if (instructionSetSupport.IsInstructionSetSupported(instructionSet) ||
instructionSetSupport.IsInstructionSetOptimisticallySupported(instructionSet) ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above says "and none of the intrinsic types are in an opportunistic state", so the current behavior looks very intentional and not a bug.

Maybe trigger some optional crossgen2 test runs to see whether the test will identify the problem with this change?

Copy link
Member Author

@EgorBo EgorBo Aug 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minimal repro to illustrate the problem:

void Foo()
{
    if (Ssse3.IsSupported)
    {
        Bar();
    }
}

[CompExactlyDependsOn(typeof(Ssse3))]
void Bar()
{
    var a = Ssse3.Shuffle(..);
    ...
}

The current behavior for crossgen2 (exactIsaLevel=Sse2, opportunisticIsaLevel=Sse4.2) is to ignore Bar(). But at the same time it's going to compile Foo as:

void Foo()
{
    if (runtimeSsse3Check == true)
    {
        Bar(); // will trigger Tier0 jit compilation
    }
}

So feels like we do need to prejit methods demanding opportunistic ISAs. Another option is to lift [CompExactlyDependsOn(typeof(Ssse3))] to Sse2 but that won't work - analyzer will complain on missing Ssse3 check and also, analyzer is not aware about our opportunistic ISA level so this has to be done in crossgen instead.

cc @davidwrighton who introduced this attribute and the analyzer in #85481

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I remember we can't use Avx as opportunistic with Sse as exact so presumably we can't hit Vector<> mismatch problems

@EgorBo
Copy link
Member Author

EgorBo commented Aug 12, 2023

/azp list

@azure-pipelines

This comment was marked as resolved.

@EgorBo
Copy link
Member Author

EgorBo commented Aug 12, 2023

/azp run runtime-coreclr outerloop, runtime-coreclr jitstress-isas-x86, runtime-coreclr crossgen2

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@jkotas
Copy link
Member

jkotas commented Aug 15, 2023

The ShouldCodeNotBeCompiledIntoFinalImage is used for two purposes: (1) To determine whether to compile the method and (2) to determine whether to allow inlining of the method.

Can this change have bad interactions with inlining? Consider the following pattern:

void A() { ... B(); ... }

[CompExactlyDependsOn(SomeOpportunisticInstructionSet)]
void B() { .... }

If we inline B() into A(), A() is not going to be marked with SomeOpportunisticInstructionSet as a prerequisite and we end up running A() with inlined B() at runtime. Is it a problem?

@jkotas jkotas requested a review from sbomer August 15, 2023 05:16
@jkotas
Copy link
Member

jkotas commented Aug 15, 2023

@sbomer Could you please take a look at this one?

@EgorBo
Copy link
Member Author

EgorBo commented Aug 15, 2023

If we inline B() into A(), A() is not going to be marked with SomeOpportunisticInstructionSet

My understanding that B() can be used in A() only if it's guarded by if (Ssse3.IsSupported) - if it's not then the analyzer is expected to complain that A() has to be marked as [CompExactlyDependsOn(SSSE3)] too

@jkotas
Copy link
Member

jkotas commented Aug 15, 2023

My understanding that B() can be used in A() only if it's guarded by if (Ssse3.IsSupported) - if it's not then the analyzer is expected to complain that A() has to be marked as [CompExactlyDependsOn(SSSE3)] too

If it was ok to assume this, it should not be necessary to block inlining of CompExactlyDependsOn methods. Why is inlining of CompExactlyDependsOn methods blocked?

@sbomer
Copy link
Member

sbomer commented Aug 15, 2023

My understanding of the original problem is that we can't pre-jit methods that assume an unsupported instruction set because the use of unsupported instructions would be precompiled to throw. If such a method is reached at runtime, we're in trouble.

I think it would be ok to inline such a method as long as the caller's IsSupported check is also treated as constant during prejit. The problem was that tiered jitting could result in a state where the throwing helper wasn't inlined, but was reached from a jitted caller where IsSupported returns true. So we blocked prejit of such helpers.

If we allowed inlining of such helpers, we would have also had to block prejit of the callers, up until a matching IsSupported check. I think blocking inlining made it so we didn't have to think about this. I suspect the blocking of inlining is left over from before we had the intrinsics analyzer, because I can't think of a reason to block inlining now that we have the attribute automatically propagated to callers.

For the opportunistic instruction set, I don't actually see a reason why we couldn't prejit such helpers. If we produce non-throwing code for Sse3, and the IsSupported check is a runtime check, I don't see the problem, so I don't understand the comment mentioned in #90326 (comment).

@EgorBo
Copy link
Member Author

EgorBo commented Aug 15, 2023

For the opportunistic instruction set, I don't actually see a reason why we couldn't prejit such helpers. If we produce non-throwing code for Sse3, and the IsSupported check is a runtime check, I don't see the problem, so I don't understand the comment mentioned in #90326 (comment).

Yep, in this PR only lifts the limitation for opportunistic ISAs, so none of instructions are expected to turn into Throw()

Why is inlining of CompExactlyDependsOn methods blocked?

I wasn't aware we block them. Do you ask because otherwise FormatGuidVector128Utf8 would not pop up due to AggressiveInlining or you know where CG2 makes them non-inlineable?

@sbomer
Copy link
Member

sbomer commented Aug 15, 2023

you know where CG2 makes them non-inlineable?

I didn't realize we block inlining either until @jkotas pointed it out. This check here:

if (CorInfoImpl.ShouldCodeNotBeCompiledIntoFinalImage(_instructionSetSupport, calleeMethod))
canInline = false;

@EgorBo
Copy link
Member Author

EgorBo commented Aug 15, 2023

So I understand the problem but am still struggling to imagine a case where CompExactlyDependsOn relaxed to opportunistic ISAs (what this PR does) can lead to problems since nothing is expected to turn into some Throw() code.

@jkotas
Copy link
Member

jkotas commented Aug 15, 2023

Do you ask because otherwise FormatGuidVector128Utf8 would not pop up due to AggressiveInlining

Yes, and also to help me to build a model around how this is meant to work.

I believe that we have the tools to precompile everything and drop this filtering. For example, if I have method marked with [CompExactlyDependsOn(typeof(Avx2))], we should be able to precompile it just fine. The method will get Avx2 fixup and we will only use the precompiled code when the Avx2 is actually present at runtime. We are not doing that today. Why not? Is there some subtle correctness issue that I am missing?

@sbomer
Copy link
Member

sbomer commented Aug 15, 2023

My (limited) understanding last time I talked to @davidwrighton was that prejitted Avx2 helpers will throw, because crossgen assumes that Avx2 is unavailable in prejitted code. If we treated Avx2 the same as the optimistic instruction set in this PR, then I agree, I don't know why we wouldn't be able to prejit them. The current behavior might be a tradeoff that reduces prejitted code size.

@EgorBo
Copy link
Member Author

EgorBo commented Aug 15, 2023

I wasn't there when these things were added so I can only guess:
The analyzer itself (without crossgen's part) does look like a nice thing to have because it helps us to identify cases where we accidently might use an illegal instruction without proper if (IsSupported) and to avoid putting that IsSupported everywhere we can mark helper calls with that special attribute that will force us to mark callers as such as well (or use if (IsSupported) in them to stop that chain).

For crossgen I can understand why we should not prejit a method that e.g. assumes AVX is there, e.g.:

void Foo()
{
    if (Avx.IsSupported)
    {
        Bar();
    }
}

[CompExactlyDependsOn(typeof(Avx))]
void Bar()
{
    Avx.MoveMask();
}

Let's say we prejit both. Foo is executed and the Bar is never invoked because the whole condition is just eliminated - OK. Bar() is prejitted to just Throw because RyuJIT is not allowed to use AVX when it compiles code for given Crossgen request -- we migth expect cg2 to skip methods which are never called (or e.g. inlined at all call-sites) but I am not sure we can rely on it, right? So Bar might be prejitted to just Throw();

Then, we re-jit Foo to tier1 and this time we take the AVX path as it's actually supported but don't inline Bar in it (because, say, it's too big). We run the code, Bar() is still in Tier-R2R (not yet promoted) - we hit the Throw.

I assume you might wonder why JIT just doesn't emit AVX instructions as demanded even if it's asked to emit SSE2 level of code - I personally don't have an answer on why we don't do it today in JIT. I can only guess that it might be something to do with mixing different encodings: legacy/VEX/EVEX (perhaps, someone from @dotnet/jit-contrib knows more?)

And, back to the topic, I can't imagine a similar scenario for opportunistic ISAs (as they're quite limitted - e.g. we can't have exact SSE and opportunistic AVX)

@EgorBo
Copy link
Member Author

EgorBo commented Aug 15, 2023

Another (theoretical) problem that we might have even today is if we e.g. initialize some static (readonly) field from R2R assuming AVX is not available and e.g. create an array of 4 floats because of that. But then, after rejit we assume the array was created with AVX in mind, hence, the array has 8 floats (because there it is AVX, available now) and go out of bounds.

I kind of feared SearchValues could do something like that (cc @MihaZupan).

@jkotas
Copy link
Member

jkotas commented Aug 15, 2023

I can only guess that it might be something to do with mixing different encodings: legacy/VEX/EVEX (perhaps, someone from https://github.com/orgs/dotnet/teams/jit-contrib knows more?)

You would have to track the allowed encodings per basic block and optimizations like CSE would have to be aware of the allowed encoding - to avoid moving code that requires the more advanced encoding into a place where the more advanced encoding is not allowed. I can believe that this is hard and fragile.

@MihaZupan
Copy link
Member

Re: SearchValues, #89979 sounds very related. We wouldn't run into issues like out of bounds reads, but you could see correctness issues as described in the issue.

@EgorBo
Copy link
Member Author

EgorBo commented Aug 15, 2023

Although, I feel like opportunistic IsSupported checks are currently broken? E.g.

static void Main()
{
    Test();
}

[MethodImpl(MethodImplOptions.NoInlining)]
static void Test()
{
    if (Sse41.IsSupported || AdvSimd.Arm64.IsSupported)
        Console.WriteLine("SSE41 or NEON");
    else
        Console.WriteLine(":(");
}

NativeAOT:

; Assembly listing for method Prog:Test() (FullOpts)
; Emitting BLENDED_CODE for generic X64 - Windows
; FullOpts code
; NativeAOT compilation
       sub      rsp, 40
       test     byte  ptr [(reloc 0x4000000000421990)], 16      ; data for <Module>:g_cpuFeatures
       je       SHORT G_M14333_IG05
       lea      rcx, gword ptr [(reloc 0x4000000000421998)]      ; '"SSE41 or NEON"'
       call     System.Console:WriteLine(System.String)
       nop      
       add      rsp, 40
       ret      
G_M14333_IG05:  ;; offset=0x001F
       lea      rcx, gword ptr [(reloc 0x40000000004219a0)]      ; '":("'
       call     System.Console:WriteLine(System.String)
       nop      
       add      rsp, 40
       ret      

^ it works on NAOT.

Crossgen:

; Assembly listing for method Prog:Test() (FullOpts)
; Emitting BLENDED_CODE for generic X64 - Windows
; FullOpts code
; ReadyToRun compilation
       sub      rsp, 40
       mov      rcx, qword ptr [(reloc 0x40000000004200b0)]      ; const ptr
       mov      rcx, gword ptr [rcx]
       call     [System.Console:WriteLine(System.String)]
       nop      
       add      rsp, 40
       ret      

Maybe that's why CompExactlyDependsOn was contstrained to exact only? Is it a bug or some intentional change to save binary size?

@jkotas
Copy link
Member

jkotas commented Aug 16, 2023

I feel like opportunistic IsSupported checks are currently broken?

The opportunistic checks are emitted as method fixups in R2R. In r2r dump, you can see things like CHECK_InstructionSetSupport X86Base+ Sse+ Lzcnt+. This fixup says that the prejited code can be only used when the machine supports the given instruction set. Note that the fixup also allows checks for explicitly unsupported instruction sets (e.g. avx2- means this method can be only used if the machine does not have avx2).

Notice that CompExactlyDependsOn does not produce the CHECK_InstructionSetSupport fixup. It makes it hard to reason about the correctness of this change for me. If CompExactlyDependsOn(Ssse3) produced CHECK_InstructionSetSupport ssse3 fixup on the method, it would be much easier to reason about the correctness.

@EgorBo
Copy link
Member Author

EgorBo commented Aug 16, 2023

@jkotas got it. Does it make sense to also make IsSupported dynamic just like it's done on NativeAOT? Basically, port this code

Debug.Assert(IsIsSupportedMethod(method));
Debug.Assert(isSupportedField.IsStatic && isSupportedField.FieldType.IsWellKnownType(WellKnownType.Int32));
int flag = 0;
switch (method.Context.Target.Architecture)
{
case TargetArchitecture.X86:
case TargetArchitecture.X64:
flag = XArchIntrinsicConstants.FromInstructionSet(instructionSet);
break;
case TargetArchitecture.ARM64:
flag = Arm64IntrinsicConstants.FromInstructionSet(instructionSet);
break;
default:
Debug.Fail("Unsupported Architecture");
break;
}
var emit = new ILEmitter();
ILCodeStream codeStream = emit.NewCodeStream();
codeStream.Emit(ILOpcode.ldsfld, emit.NewToken(isSupportedField));
codeStream.EmitLdc(flag);
codeStream.Emit(ILOpcode.and);
codeStream.EmitLdc(0);
codeStream.Emit(ILOpcode.cgt_un);
codeStream.Emit(ILOpcode.ret);
return emit.Link(method);
to crossgen.

@jkotas
Copy link
Member

jkotas commented Aug 16, 2023

I do not think that it is worth it for R2R. It is ok for R2R to depend on fallback to JIT. Porting this code would not fix all cases of fallback to JIT. We would still need to fallback to JIT for methods with the legacy/VEX/EVEX problem.

@EgorBo
Copy link
Member Author

EgorBo commented Aug 16, 2023

We would still need to fallback to JIT for methods with the legacy/VEX/EVEX problem.

On NAOT, IsSupported is only emitted for opportunistic ISA checks so no encoding problems. E.g. with the default settings, if (Avx.IsSupported) will be just if(false) while if(Sse41.IsSupported) will be dynamic. But ok, will see if I can figure out how to emit the fixup hint.

@jkotas
Copy link
Member

jkotas commented Aug 16, 2023

Note that the fixup also allows checks for explicitly unsupported instruction sets (e.g. avx2- means this method can be only used if the machine does not have avx2)

The difference between corelib and non-corelib code is that the unsupported instruction set fixups are omitted for CoreLib. We assume that the CoreLib code is going to behave correctly even when ISA.IsSupported value does not match reality some of the time.

@EgorBo
Copy link
Member Author

EgorBo commented Aug 16, 2023

@jkotas r2rdump (with this change) for FormatGuidVector128: (it seems to be correct and includes CHECK_InstructionSetSupport being ssse3 already?)

System.ValueTuple`3<System.Runtime.Intrinsics.Vector128`1<byte>,System.Runtime.Intrinsics.Vector128`1<byte>,System.Runtime.Intrinsics.Vector128`1<byte>> System.Guid.FormatGuidVector128Utf8(System.Guid, bool)
Handle: 0x06001302
Rid: 4866
EntryPointRuntimeFunctionId: 3727
Number of RuntimeFunctions: 1
Number of fixups: 3
    TableIndex 5, Offset 0021: CHECK_InstructionSetSupport X86Base+ Sse+ Sse2+ Sse3+ Ssse3+
    TableIndex 5, Offset 0793: System.Runtime.Intrinsics.Vector128`1<byte> (TYPE_HANDLE)
    TableIndex 5, Offset 0A49: System.ValueTuple`2<System.Runtime.Intrinsics.Vector128`1<byte>,System.Runtime.Intrinsics.Vector128`1<byte>> (TYPE_HANDLE)

System.ValueTuple`3<System.Runtime.Intrinsics.Vector128`1<byte>,System.Runtime.Intrinsics.Vector128`1<byte>,System.Runtime.Intrinsics.Vector128`1<byte>> System.Guid.FormatGuidVector128Utf8(System.Guid, bool)
Id: 3727
StartAddress: 0x002845C0
Size: 180 bytes
UnwindRVA: 0x00035C8C
Version:            1
Flags:              0x03 EHANDLER UHANDLER
SizeOfProlog:       0x0000
CountOfUnwindCodes: 0
FrameRegister:      None
FrameOffset:        0x0
PersonalityRVA:     0x66C278

Debug Info
    Bounds:
    Native Offset: 0x0, Prolog, Source Types: StackEmpty
    Native Offset: 0x3, IL Offset: 0x0000, Source Types: StackEmpty
    Native Offset: 0x6, IL Offset: 0x0033, Source Types: StackEmpty
    Native Offset: 0x40, IL Offset: 0x0047, Source Types: StackEmpty
    Native Offset: 0x4C, IL Offset: 0x0065, Source Types: StackEmpty
    Native Offset: 0x51, IL Offset: 0x006b, Source Types: StackEmpty
    Native Offset: 0x60, IL Offset: 0x008f, Source Types: StackEmpty
    Native Offset: 0x6F, IL Offset: 0x0112, Source Types: StackEmpty
    Native Offset: 0x96, IL Offset: 0x012f, Source Types: SourceTypeInvalid
    Native Offset: 0xA1, Epilog, Source Types: StackEmpty
    Native Offset: 0xA5, IL Offset: 0x0141, Source Types: SourceTypeInvalid
    Native Offset: 0xB3, Epilog, Source Types: StackEmpty

    Variable Locations:
    Variable Number: 4294967294
    Start Offset: 0x0
    End Offset: 0x3
    Loc Type: VLT_REG
    Register: RCX

    Variable Number: 4294967294
    Start Offset: 0x3
    End Offset: 0xA1
    Loc Type: VLT_REG
    Register: RAX

    Variable Number: 4294967294
    Start Offset: 0xA2
    End Offset: 0xB3
    Loc Type: VLT_REG
    Register: RAX

    Variable Number: 0
    Start Offset: 0x0
    End Offset: 0x3
    Loc Type: VLT_REG
    Register: RDX

    Variable Number: 1
    Start Offset: 0x0
    End Offset: 0x4C
    Loc Type: VLT_REG
    Register: R8

    Variable Number: 3
    Start Offset: 0x6
    End Offset: 0x15
    Loc Type: VLT_REG_FP
    Register: 16

    Variable Number: 4
    Start Offset: 0x4C
    End Offset: 0x76
    Loc Type: VLT_REG_FP
    Register: 19

    Variable Number: 5
    Start Offset: 0x40
    End Offset: 0x82
    Loc Type: VLT_REG_FP
    Register: 17

    Variable Number: 5
    Start Offset: 0xA2
    End Offset: 0xA5
    Loc Type: VLT_REG_FP
    Register: 17

    Variable Number: 6
    Start Offset: 0x60
    End Offset: 0x96
    Loc Type: VLT_REG_FP
    Register: 18

    Variable Number: 7
    Start Offset: 0x6F
    End Offset: 0x96
    Loc Type: VLT_REG_FP
    Register: 20

2845c0: 48 8b c1                mov     rax, rcx
2845c3: 0f 10 02                movups  xmm0, xmmword ptr [rdx]
2845c6: 0f 28 c8                movaps  xmm1, xmm0
2845c9: 66 0f 73 d1 04          psrlq   xmm1, 4
2845ce: 0f 28 d1                movaps  xmm2, xmm1
2845d1: 66 0f 60 d0             punpcklbw xmm2, xmm0
2845d5: 66 0f 68 c8             punpckhbw xmm1, xmm0
2845d9: 0f 10 05 d0 84 81 00    movups  xmm0, xmmword ptr [0xa9cab0]
2845e0: 66 0f db d0             pand    xmm2, xmm0
2845e4: 0f 10 1d d5 84 81 00    movups  xmm3, xmmword ptr [0xa9cac0]
2845eb: 66 0f 38 00 da          pshufb  xmm3, xmm2
2845f0: 66 0f db c1             pand    xmm0, xmm1
2845f4: 0f 10 0d c5 84 81 00    movups  xmm1, xmmword ptr [0xa9cac0]
2845fb: 66 0f 38 00 c8          pshufb  xmm1, xmm0
284600: 0f 10 05 c9 84 81 00    movups  xmm0, xmmword ptr [0xa9cad0]
284607: 66 0f 38 00 d8          pshufb  xmm3, xmm0
28460c: 45 84 c0                test    r8b, r8b
28460f: 74 51                   je      0x284662
284611: 0f 10 05 c8 84 81 00    movups  xmm0, xmmword ptr [0xa9cae0]
284618: 0f 28 d3                movaps  xmm2, xmm3
28461b: 66 0f 38 00 d0          pshufb  xmm2, xmm0
284620: 0f 10 05 c9 84 81 00    movups  xmm0, xmmword ptr [0xa9caf0]
284627: 0f 28 e1                movaps  xmm4, xmm1
28462a: 66 0f 38 00 e0          pshufb  xmm4, xmm0
28462f: 0f 10 05 ca 84 81 00    movups  xmm0, xmmword ptr [0xa9cb00]
284636: 66 0f 38 00 d8          pshufb  xmm3, xmm0
28463b: 0f 10 05 ce 84 81 00    movups  xmm0, xmmword ptr [0xa9cb10]
284642: 66 0f 38 00 c8          pshufb  xmm1, xmm0
284647: 66 0f eb cb             por     xmm1, xmm3
28464b: 0f 10 05 ce 84 81 00    movups  xmm0, xmmword ptr [0xa9cb20]
284652: 66 0f eb c1             por     xmm0, xmm1
284656: 0f 11 10                movups  xmmword ptr [rax], xmm2
284659: 0f 11 60 10             movups  xmmword ptr [rax + 16], xmm4
28465d: 0f 11 40 20             movups  xmmword ptr [rax + 32], xmm0
284661: c3                      ret
284662: 0f 28 c3                movaps  xmm0, xmm3
284665: 0f 11 00                movups  xmmword ptr [rax], xmm0
284668: 0f 11 48 10             movups  xmmword ptr [rax + 16], xmm1
28466c: 0f 57 c0                xorps   xmm0, xmm0
28466f: 0f 11 40 20             movups  xmmword ptr [rax + 32], xmm0
284673: c3                      ret

@jkotas
Copy link
Member

jkotas commented Aug 16, 2023

it seems to be correct and includes CHECK_InstructionSetSupport

Yes, it does pick up the fixup by some other path. The CompExactlyDependsOnAttribute does not inject this fixup. If you try to compile an empty method with CompExactlyDependsOnAttribute, it is not going to have the CHECK_InstructionSetSupport fixup.

The current spec for the attribute says this:

// Use this attribute to indicate that a function should only be compiled into a Ready2Run
// binary if the associated type will always have a well defined value for its IsSupported property
I think you are trying to change the spec to

... indicate that a native code for this function in Read2Run binary should be only used at runtime if the associated type will always have a well defined value for its IsSupported property

This spec change is safe and it should not introduce observable functional behavior changes. The implementation needs to match the spec and it should not depend on the function to pick up the fixup by some other path.

If you would like to change the spec for the attribute to something else, it is fine too - needs to be written down and we need to review and adjust the system to conform with the spec.

BTW: There is a dead duplicate definition of the attribute in https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/CompExactlyDependsOn.cs . You can delete this file while you are on it.

@jkotas
Copy link
Member

jkotas commented Aug 16, 2023

Hmm, I have been thinking about it as "the IsSupported property value must be known for the given method at AOT time". Well-defined only means that the code at runtime should not be able to observe different values of IsSupported property at different times. So the question is whether it is possible for a wrong value of IsSupported to sneak in somehow with the current change. A potential bug would most likely show up as invalid instruction fault on low-end hardware that does not support the opportunistic instruction set. Do we have any test running on low-end hardware like that?

@tannergooding
Copy link
Member

tannergooding commented Aug 16, 2023

Do we have any test running on low-end hardware like that?

We have plenty of tests that stress downlevel ISAs, but none that explicitly try to fault AFAIK. Just ones that ensure the JIT throws a PNSE instead.

I don't believe we have any hardware this old on which to test and expect we won't find such a machine outside of Linux. Chromium (and therefore Chrome and potentially Edge) has required SSE3 since 2014 and the ISA has been around on both AMD/Intel since ~2005. x64 MacOS likewise defaults to core2 which includes SSSE3.

So if we wanted to try this we might be able to explicitly try around SSE4.1 (which doesn't require the VEX/EVEX encoding).

@EgorBo
Copy link
Member Author

EgorBo commented Aug 17, 2023

This spec change is safe and it should not introduce observable functional behavior changes. The implementation needs to match the spec and it should not depend on the function to pick up the fixup by some other path.

Thanks for the hints! I tried to make [ExactlyDependsOn] to report InstructionSet via EgorBo@9f1c502 and it worked (tested on an empty method with [CompExactlyDependsOn(typeof(Ssse3))] but I guess it needs to be rewritten to avoid checking custom attributes for each method - I tried to make a side-cache in _compilation object but so far it looks a bit ugly, will get back to it later.

@jkotas
Copy link
Member

jkotas commented Aug 17, 2023

I guess it needs to be rewritten to avoid checking custom attributes for each method

It should be fine to check the custom attributes before compiling every method.

@davidwrighton
Copy link
Member

I'm back from my trip, and reading through this thread, there are a couple of points...

  1. The current scheme is designed to be somewhat conservative, and prefer to avoid generating code rather than getting in a situation which can go poorly. Each release I've revisited this logic, and made it a bit better, but there is quite a bit more that we could do.
  2. Finding and diagnosing situations where we get this wrong is terrifically difficult. Our testing runs on hardware which supports all, or at least most the processor features, and so, there is no simple way to actually find out we generated bad code (as it will just work in our tests, although it may fail on customer machines which are actually old).
  3. As @jkotas points out, fallback to JIT is not terrible, although it isn't ideal.
  4. The CompExactlyDependsOnAttribute could be used to prejit Avx/Avx2 dependent code even when the system is not generally attempting to do so for most logic. In fact we could tweak that api (or add another) to allow us to specify that much of pregenerated code could consider Avx2 to be the default if there is use vectorization. We know that the VAST majority of all .NET code run on x64 processors should really just be using Avx2 from the get go. However, the trick is that enabling Avx2 will change the sizes of the Vector<T> types, which will mean that we need a different instance of the type system for compiling those methods. Now, this is possible to do (and possibly even easy from within the JIT portion of crossgen2), but managing the fixup generation at emission time is a fairly tricky detail.
  5. I've spent some hours today wrangling in my head what could go wrong with this change, and I believe it will actually work correctly. However, as Jan notes we need to update the definition of the meaning of the CompExactlyDependsOnAttribute.

Finally, we can't get rid of all of the filtering as we absolutely have to filter out instruction sets which happen to be disabled (such as Avx2, or Avx512) as otherwise we will generate incorrect code for the methods while building corelib.

@davidwrighton davidwrighton self-requested a review September 7, 2023 22:39
@EgorBo EgorBo marked this pull request as draft January 5, 2024 12:29
@ghost ghost closed this Feb 4, 2024
@ghost
Copy link

ghost commented Feb 4, 2024

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 6, 2024
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants