Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Preserve range check for HW intrinsics with non-const/out-of-range immediates #106765

Merged
merged 12 commits into from
Aug 26, 2024

Conversation

amanasifkhalid
Copy link
Member

@amanasifkhalid amanasifkhalid commented Aug 21, 2024

Fix #106608. Fix #106546. Based on discussion here, intrinsics with out-of-range immediates should always be converted to user calls, and intrinsics with unknown immediates should be given range checks -- if we're optimizing and the immediate is later known, range check removal should kick in. For .NET 10, we plan to add more fallback intrinsic implementations so this pessimization doesn't kick in as often.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 21, 2024
@amanasifkhalid
Copy link
Member Author

cc @dotnet/jit-contrib, @tannergooding PTAL. Thanks!

@amanasifkhalid
Copy link
Member Author

Looks like the AdvSimd.StoreSelectedScalar tests that try the maximum index are failing -- I suspect the JIT is calculating the wrong immediate bounds for this intrinsic. Taking a look...

@tannergooding
Copy link
Member

The title Always convert intrinsics with out-of-bounds immediates to user calls is misleading/not quite correct.

What this is doing isn't converting to user calls, but rather just ensuring that a GT_BOUNDS_CHECK node is inserted, which ensures the required side effect exists in IR and persists. In the case the operand becomes an in range constant due to later constant folding, then the bounds check should get optimized away and the intrinsic will stay as an intrinsic, never getting rewritten to a user call.

@amanasifkhalid
Copy link
Member Author

amanasifkhalid commented Aug 21, 2024

What this is doing isn't converting to user calls, but rather just ensuring that a GT_BOUNDS_CHECK node is inserted, which ensures the required side effect exists in IR and persists.

The GT_BOUNDS_CHECK node is inserted when the immediate is unknown during importation, but if the immediate is known to be out-of-range (i.e. CheckHWIntrinsicImmRange returns false, and the immediate is constant) and we don't have a fallback, we bail in Compiler::impHWIntrinsic, which means it will become a user call, right?

I can update the title to include the GT_BOUNDS_CHECK insertion for immediates not known during importation.

@tannergooding
Copy link
Member

The GT_BOUNDS_CHECK node is inserted when the immediate is unknown during importation, but if the immediate is known to be out-of-range (i.e. CheckHWIntrinsicImmRange returns false, and the immediate is constant) and we don't have a fallback, we bail in Compiler::impHWIntrinsic, which means it will become a user call, right?

Ah, yes. I missed that bit of code, that will cause a user call to be emitted. We don't just import the throw directly (which would be more ideal) because the inliner wasn't happy with that.

The important part of the fix is that we're preserving the bounds check, which for most typical code will be due to values that aren't known to be constant during importation. The path where we have an out of bounds constant during import will likely never be hit in production (as users would have to knowingly pass in a bad value and ignore the analyzer warning).

@amanasifkhalid amanasifkhalid changed the title JIT: Always convert intrinsics with out-of-bounds immediates to user calls JIT: Preserve range check for HW intrinsics with non-const/out-of-range immediates Aug 21, 2024
{
assert(immOp != nullptr);
// Full-range imm-intrinsics do not need the range-check
// because the imm-parameter of the intrinsic method is a byte.
// AVX2 Gather intrinsics no not need the range-check
// because their imm-parameter have discrete valid values that are handle by managed code
if (mustExpand && HWIntrinsicInfo::isImmOp(intrinsic, immOp)
if (!immOp->IsCnsIntOrI() && HWIntrinsicInfo::isImmOp(intrinsic, immOp)
#ifdef TARGET_XARCH
&& !HWIntrinsicInfo::isAVX2GatherIntrinsic(intrinsic) && !HWIntrinsicInfo::HasFullRangeImm(intrinsic)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to explicitly check that AVX2.Gather* works correctly given its special.

The valid values are 1, 2, 4, and 8; other values should result in an exception.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. I'll add a few more test cases.

@amanasifkhalid
Copy link
Member Author

Looks like the AdvSimd.StoreSelectedScalar tests that try the maximum index are failing -- I suspect the JIT is calculating the wrong immediate bounds for this intrinsic. Taking a look...

The logic for calculating the max index size in lookupImmBounds is correct, but the table definition for StoreSelectedScalar (and similar APIs) had SIMD size hard-coded to 8, despite this API having overloads to operate on Vector128. After changing this to -1, the dynamic path in HWIntrinsicInfo::lookupSimdSize hit asserts because it was using the first argument to get the intrinsic's base type. After updating StoreSelectedScalar's flags to use the second argument, the size calculation is now correct.

@tannergooding I'm nervous that similar intrinsics have similar logic issues, and our test coverage just isn't exposing them. From a quick parse of nearby intrinsics, Store and StoreVectorAndZip also have hard-coded SIMD sizes of 8 and HW_Flag_BaseTypeFromFirstArg set, despite the second argument being the vector arg. Do we want to pursue fixing these for .NET 9?

@tannergooding
Copy link
Member

I'm nervous that similar intrinsics have similar logic issues, and our test coverage just isn't exposing them.

Looks like it is a bug that was introduced in .NET 9 as part of #103689, #93223, etc as part of the general work to expose the tuple overloads of the Load/Store* APIs. CC. @kunalspathak

They were correctly encoded as -1 in .NET 8 and we didn't have the potential issue where the second argument couldn't be used to discern the type due to being TYP_STRUCT then either.

@amanasifkhalid
Copy link
Member Author

@tannergooding thanks for pointing that out. It's not the most elegant solution, but special-casing StoreSelectedScalar to always return the size of Vector64 when the second argument is a ValueTuple seems to work. I don't know of any way to hard-code both 8 and 16 as its SIMD sizes in hwintrinsiclistarm64.h without creating another named intrinsic entry, so I think we want to keep looking up its SIMD size at runtime. If we look up the SIMD size using the first arg, HWIntrinsicInfo::lookupSimdSize will assert, because the first arg isn't an intrinsic type. If we use the second arg, the lookup works for all the StoreSelectedScalar invariants, except for the ValueTuple ones. I think going for a surgical fix at this point makes sense, unless there's some trivial way to handle the second arg being a non-intrinsic TYP_STRUCT.

@tannergooding
Copy link
Member

It's not the most elegant solution, but special-casing StoreSelectedScalar to always return the size of Vector64 when the second argument is a ValueTuple seems to work.

I don't think that's sufficient as there's overloads that take a tuple of Vector128 in addition to the overloads that take a tuple of Vector64. It's also not sufficient o only cover StoreSelectedScalar as this would also impact StoreVectorAndZip, Store, LoadAndInsertScalar, and potentially some of the other Load variants that take a tuple.

without creating another named intrinsic entry, so I think we want to keep looking up its SIMD size at runtime. If we look up the SIMD size using the first arg, HWIntrinsicInfo::lookupSimdSize will assert, because the first arg isn't an intrinsic type.

Right. The "proper" way is likely to have a flag on these intrinsics that lets us know we may need to check for System.ValueTuple<...>, get the type handle for the first generic type of the value tuple, and then pass that down to getBaseJitTypeAndSizeOfSIMDType instead.

You can see that some handling similar to this concept is being done for LoadAndInsertScalar already here: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsic.cpp#L1710-L1744 (gets the num instance fields, get the type of the field, then passes that into getBaseJitTypeAndSizeOfSIMDType to determine the simdBaseType and simdSize)

@amanasifkhalid
Copy link
Member Author

@tannergooding thanks for pointing me in the right direction. I've added a new flag, HW_Flag_BaseTypeFromValueTupleArg, to denote this special case, and added handling for it to impHWIntrinsic. Do you know if any other intrinsics need this handling at the moment? I think it's just StoreSelectedScalar, since we can't hard-code its SIMD size. All the other ones you mentioned seem to be able to get away with that, or they have multiple named intrinsics for each variant (which can probably be cleaned up after this change goes through, though I don't know if it's worth changing the API surface).

@@ -487,7 +487,7 @@ HARDWARE_INTRINSIC(AdvSimd, SignExtendWideningLower,
HARDWARE_INTRINSIC(AdvSimd, SignExtendWideningUpper, 16, 1, {INS_sxtl2, INS_invalid, INS_sxtl2, INS_invalid, INS_sxtl2, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_BaseTypeFromFirstArg)
HARDWARE_INTRINSIC(AdvSimd, SqrtScalar, 8, 1, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_fsqrt, INS_fsqrt}, HW_Category_SIMD, HW_Flag_SIMDScalar)
HARDWARE_INTRINSIC(AdvSimd, Store, 8, 2, {INS_st1_2regs, INS_st1_2regs, INS_st1_2regs, INS_st1_2regs, INS_st1_2regs, INS_st1_2regs, INS_invalid, INS_invalid, INS_st1_2regs, INS_invalid}, HW_Category_MemoryStore, HW_Flag_BaseTypeFromFirstArg|HW_Flag_SpecialImport|HW_Flag_SpecialCodeGen|HW_Flag_NeedsConsecutiveRegisters)
HARDWARE_INTRINSIC(AdvSimd, StoreSelectedScalar, 8, 3, {INS_st1, INS_st1, INS_st1, INS_st1, INS_st1, INS_st1, INS_st1, INS_st1, INS_st1, INS_st1}, HW_Category_MemoryStore, HW_Flag_BaseTypeFromFirstArg|HW_Flag_HasImmediateOperand|HW_Flag_SIMDScalar|HW_Flag_SpecialCodeGen|HW_Flag_SpecialImport|HW_Flag_NeedsConsecutiveRegisters)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's several other methods that take ValueTuple, like Store, that need the same fix.

They aren't APIs taking an immediate, so I think it's fine to handle in a separate PR, but they do need to be handled.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll open a follow-up PR for those -- thanks

@amanasifkhalid
Copy link
Member Author

/ba-g Linux NativeAOT/Mono legs timed out

@amanasifkhalid amanasifkhalid merged commit 9a31a5b into dotnet:main Aug 26, 2024
111 of 123 checks passed
@amanasifkhalid amanasifkhalid deleted the intrinsic-range-check branch August 26, 2024 22:11
@amanasifkhalid
Copy link
Member Author

/backport to release/9.0

Copy link
Contributor

Started backporting to release/9.0: https://github.com/dotnet/runtime/actions/runs/10568056920

Copy link
Contributor

@amanasifkhalid backporting to release/9.0 failed, the patch most likely resulted in conflicts:

$ git am --3way --ignore-whitespace --keep-non-patch changes.patch

Applying: Always convert out-of-range constants to user call
Applying: Add tests
Applying: Tweak StoreSelectedScalar definition
Applying: Remove unnecessary arg
Applying: Add AVX tests
Applying: Use first arg of StoreSelectedScalar for type info
Applying: Update HWIntrinsicInfo::lookupSimdSize to handle pointer to primitive as base type arg
Applying: Revert "Update HWIntrinsicInfo::lookupSimdSize to handle pointer to primitive as base type arg"
Applying: Special-case StoreSelectedScalar during importation
Applying: Revert "Special-case StoreSelectedScalar during importation"
Applying: Add handling for base type from ValueTuple
Using index info to reconstruct a base tree...
M	src/coreclr/jit/hwintrinsic.h
Falling back to patching base and 3-way merge...
Auto-merging src/coreclr/jit/hwintrinsic.h
CONFLICT (content): Merge conflict in src/coreclr/jit/hwintrinsic.h
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0011 Add handling for base type from ValueTuple
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

Copy link
Contributor

@amanasifkhalid an error occurred while backporting to release/9.0, please check the run log for details!

Error: git am failed, most likely due to a merge conflict.

amanasifkhalid added a commit to amanasifkhalid/runtime that referenced this pull request Aug 26, 2024
jeffschwMSFT pushed a commit that referenced this pull request Aug 27, 2024
jtschuster pushed a commit to jtschuster/runtime that referenced this pull request Sep 17, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Sep 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
2 participants