Expose cross-platform helpers for Vector64, Vector128, and Vector256 #53450

tannergooding · 2021-05-28T22:01:06Z

This mostly resolves #49397 by exposes cross platform helpers for core functionality, mirroring Vector.

This PR is mostly the tests and the actual product changes are much much smaller than the sum. As with the other HWIntrinsic PRs, the tests are templated and are largely autogenerated from a small table. It is recommended to give a pass over a couple of the templates and the template input data, and not to try and review every test individually. There aren't any new concepts here, just cloning the existing templates with a few tweaks for the shared types.

In particular this PR is (numbers are slightly off now due to a few fixes, but largely still representative):

+3171, -1226 for the JIT - Largely a refactoring to share SimdAsHWIntrinsic logic with the Vector64/128/256 code paths
+7494, -3090 for the Libraries - Small refactoring + Software fallback logic for each API
+1523, -0001 for the Test Metadata Table - Largely copy/pasted from our existing hwintrinsic scenarios
+3856, -0000 for the Test Templates - Largely copy/pasted from our existing hwintrinsic scenarios
The remaining 444,136 lines are the generated tests (roughly 10 tests per API exposed, covering all the scenarios we said were important and are covering for the existing HWIntrinsics)

This does not cover Narrow, Widen, ConverTo*, or IsHardwareAccelerated. These remaining APIs will go in a separate PR since it requires a bit more refactoring to make work (Vector<T> still implements these using the "legacy" SIMD support and the PR will need to update that to use SimdAsHWIntrinsic and to have the logic shareable with the Vector64/128/256<T> paths).

This will unblock some work on related issues as well as allow us to merge several code paths on the managed side where the core logic is the same between various platforms. The related issues that will be resolved following this PR include:

Implementing Narrow, Widen, and ConvertTo
Extend Vector64<T>, Vector128<T>, and Vector256<T> to support nint and nuint #52017 - Which will update Vector64/128/256 to support nint/nuint, bringing it inline with the support added for Vector this release. This is expected to be fairly small and should mostly just be removing a check
Extend System.Runtime.Intrinsics.X86 to support nint and nuint #52021 and Extend System.Runtime.Intrinsics.Arm to support nint and nuint #52027 - Which will expose nint/nuint functionality for the actual intrinsics, this should likewise be decently small and should mostly be managed code after Extend Vector64<T>, Vector128<T>, and Vector256<T> to support nint and nuint #52017

dotnet-issue-labeler · 2021-05-28T22:01:40Z

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

tannergooding · 2021-05-28T22:02:07Z

Going to run isa tests and make sure everything is passing before marking this as ready-for-review.

tannergooding · 2021-06-01T01:09:21Z

/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop

azure-pipelines · 2021-06-01T01:09:58Z

Azure Pipelines successfully started running 3 pipeline(s).

tannergooding · 2021-06-01T16:17:44Z

/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop

azure-pipelines · 2021-06-01T16:18:38Z

Azure Pipelines successfully started running 3 pipeline(s).

tannergooding · 2021-06-01T19:35:24Z

/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop

azure-pipelines · 2021-06-01T19:36:01Z

Azure Pipelines successfully started running 3 pipeline(s).

…ono's existing check

… is supported

Co-authored-by: SingleAccretion <62474226+SingleAccretion@users.noreply.github.com>

tannergooding · 2021-09-29T17:12:10Z

Just merged the workaround. Try rebasing again.

Thanks! Just did that, hopefully the AOT legs pass now

echesakov

Sorry for the delay.

Overall, the JIT changes look good. I left couple suggestions and one nit.

src/coreclr/vm/class.cpp

src/coreclr/jit/gtlist.h

echesakov · 2021-09-30T20:55:15Z

src/coreclr/jit/hwintrinsicarm64.cpp

+            break;
+        }
+
+        case NI_Vector128_GetUpper:


I don't think this is the most optimal way of implementing Vector128.GetUpper()...

Perhaps, something that does DUP Vd, Vn.D[1] should be used instead.
This would correspond to vector.AsUInt64().GetElement(1).As<T>()

Do you mind if I fix this in a follow up PR (just so we can go ahead and get this one resolved and avoid further downstream conflicts)?

Also noting, this might be a case where we want to recognize the ExtractVector128 pattern and replace it with DUP if we know its always going to be more efficient...

Sure, I meant to write that this work can wait when posted the message (but forgot).

Co-authored-by: Egor Chesakov <Egor.Chesakov@microsoft.com>

kunalspathak · 2021-10-07T15:20:20Z

Need to double check if these arm64 improvements are related - dotnet/perf-autofiling-issues#1715

jkotas · 2021-10-10T13:00:03Z

This added close to 500,000 lines of auto-generated tests. Auto-generated tests like this are starting to kill us - in the repo footprint, build times and test times.

tannergooding · 2021-10-10T15:38:12Z

Auto-generated tests like this are starting to kill us - in the repo footprint, build times and test times.

@jkotas, We have several thousand APIs here, each of which need to have their basic functionality testing. Additionally, since they add a range of support to various phases of the JIT including new instructions and new encodings they need some validation that various operations, such as memory read/writes get folded into the instruction, CSE, etc work as expected.

Since we can't unit test phases of the JIT, we are stuck with functional tests written in C# that try to ensure trees are generated that cover the relevant patterns. If we could unit test phases, such as emit, we could have a much higher bar of confidence with many fewer tests, as we could directly validate based on the known encoding scenarios.

Many of these could likely be moved to some outerloop and only run on PRs known to be touching/impacting HWIntrinsic codegen (reducing the cost of your average PR/commit); but as is, these tests as is have caught a number of bugs and other issues that wouldn't have been exposed otherwise and so I don't think they should be removed from the repo until we have a suitable replacement.

jkotas · 2021-10-10T17:41:53Z

Some data:

src/tests before this PR: 488,952,913 bytes
src/tests after this PR: 509,473,324 bytes

This PR grew src/tests footprint by more than 4%.

src/tests is about 60% of the dotnet/runtime repo disk size footprint. Yes, src/tests sources are 1.5x more than sources for both runtimes, all libraries and all libraries tests combined.

I understand that we do not have better option currently, but we may need to have to create one if this continues to grow like this.

tannergooding · 2021-10-10T19:02:41Z

I understand that we do not have better option currently, but we may need to have to create one if this continues to grow like this.

I fully support having a better alternative here. I think the ideal scenario would be:

Emitter is unit testable; we can actually write tests for scenarios like INS_addps xmm0, xmm0, INS_addps xmm0 xmm15, and INS_addps xmm0, [addr] directly
- Such tests would be significantly cheaper to run and validate; the also give a higher bar of confidence with less work for core scenarios
All of the existing src/tests/JIT/HardwareIntrinsics jobs are removed and replaced with Library tests that do basic functionality validation (e.g. ensure Sse.Add is behaving as expected and does per-element addition).

This would cover the majority of bugs and issues we've hit so far at a much cheaper cost. There would still be some things, like lowering, containment, and value numbering that wouldn't be covered here; but we've not hit any issues there that weren't immediately caught by general use of the intrinsics in the BCL.

fanyang-mono · 2021-11-11T22:47:15Z

This PR caused some regression for TE benchmarks when running with Mono. I've created an issue to track it #61484. @tannergooding Could you take a look?

dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI new-api-needs-documentation labels May 28, 2021

tannergooding force-pushed the fix-49397 branch 14 times, most recently from 36751ca to 51f1ba7 Compare May 31, 2021 19:50

dotnet deleted a comment from azure-pipelines bot Jun 1, 2021

tannergooding force-pushed the fix-49397 branch from 51f1ba7 to 2dc5e64 Compare June 1, 2021 01:09

dotnet deleted a comment from azure-pipelines bot Jun 1, 2021

SingleAccretion mentioned this pull request Jun 1, 2021

Vector.Sum(Vector<T>) API implementation for horizontal add. #53527

Merged

tannergooding force-pushed the fix-49397 branch from 703c8c5 to a3db54f Compare June 1, 2021 19:34

tannergooding marked this pull request as ready for review June 1, 2021 23:38

tannergooding and others added 9 commits September 29, 2021 09:58

Minor cleanup of impBaseIntrinsic for x86/x64

6cddb02

Intrinsify the Vector64/128/256 methods

8c28873

Make the internal helper named IsTypeSupported to not conflict with m…

1b665fc

…ono's existing check

Ensure we lie about the type for TYP_SIMD32 bitwise ops when only AVX…

f917ea6

… is supported

Use gtNewSimdZeroNode rather than gtNewSIMDVectorZero

10dfe0c

Applying formatting patch

6d34d27

Apply suggestions from code review

57ce0a9

Co-authored-by: SingleAccretion <62474226+SingleAccretion@users.noreply.github.com>

Apply suggestions from code review

f2529d7

Split HardwareIntrinsic tests into 3 groups

54610f9

tannergooding force-pushed the fix-49397 branch from 035783a to 54610f9 Compare September 29, 2021 17:11

echesakov reviewed Sep 30, 2021

View reviewed changes

Update src/coreclr/vm/class.cpp

fc5c5bc

Co-authored-by: Egor Chesakov <Egor.Chesakov@microsoft.com>

echesakov approved these changes Sep 30, 2021

View reviewed changes

tannergooding mentioned this pull request Sep 30, 2021

Investigate codegen for Vector128.GetUpper #59838

Closed

tannergooding merged commit 826f49d into dotnet:main Sep 30, 2021

EgorBo mentioned this pull request Oct 5, 2021

Regressions in System.Buffers.Binary.Tests.BinaryReadAndWriteTests #60001

Closed

BruceForstall mentioned this pull request Oct 5, 2021

Test failure: Assert failure: Verify_TypeLayout 'System.Numerics.Vector`1' failed to verify type layout #60036

Closed

tannergooding mentioned this pull request Oct 6, 2021

Implement Narrow and Widen using SIMDAsHWIntrinsic #60094

Merged

tannergooding mentioned this pull request Oct 10, 2021

Investigate ways to reduce overhead of the hardware intrinsics test matrix #60233

Open

This was referenced Oct 14, 2021

[Perf] Changes at 10/1/2021 10:57:04 AM dotnet/perf-autofiling-issues#1826

Closed

[Perf] Changes at 9/30/2021 11:27:06 PM dotnet/perf-autofiling-issues#1825

Closed

ghost locked as resolved and limited conversation to collaborators Nov 10, 2021

tannergooding deleted the fix-49397 branch November 11, 2022 15:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose cross-platform helpers for Vector64, Vector128, and Vector256 #53450

Expose cross-platform helpers for Vector64, Vector128, and Vector256 #53450

tannergooding commented May 28, 2021 •

edited

Loading

dotnet-issue-labeler bot commented May 28, 2021

tannergooding commented May 28, 2021

tannergooding commented Jun 1, 2021

azure-pipelines bot commented Jun 1, 2021

tannergooding commented Jun 1, 2021

azure-pipelines bot commented Jun 1, 2021

tannergooding commented Jun 1, 2021

azure-pipelines bot commented Jun 1, 2021

tannergooding commented Sep 29, 2021

echesakov left a comment

echesakov Sep 30, 2021

tannergooding Sep 30, 2021

echesakov Sep 30, 2021

kunalspathak commented Oct 7, 2021

jkotas commented Oct 10, 2021

tannergooding commented Oct 10, 2021 •

edited

Loading

jkotas commented Oct 10, 2021

tannergooding commented Oct 10, 2021

fanyang-mono commented Nov 11, 2021

Expose cross-platform helpers for Vector64, Vector128, and Vector256 #53450

Expose cross-platform helpers for Vector64, Vector128, and Vector256 #53450

Conversation

tannergooding commented May 28, 2021 • edited Loading

dotnet-issue-labeler bot commented May 28, 2021

tannergooding commented May 28, 2021

tannergooding commented Jun 1, 2021

azure-pipelines bot commented Jun 1, 2021

tannergooding commented Jun 1, 2021

azure-pipelines bot commented Jun 1, 2021

tannergooding commented Jun 1, 2021

azure-pipelines bot commented Jun 1, 2021

tannergooding commented Sep 29, 2021

echesakov left a comment

Choose a reason for hiding this comment

echesakov Sep 30, 2021

Choose a reason for hiding this comment

tannergooding Sep 30, 2021

Choose a reason for hiding this comment

echesakov Sep 30, 2021

Choose a reason for hiding this comment

kunalspathak commented Oct 7, 2021

jkotas commented Oct 10, 2021

tannergooding commented Oct 10, 2021 • edited Loading

jkotas commented Oct 10, 2021

tannergooding commented Oct 10, 2021

fanyang-mono commented Nov 11, 2021

tannergooding commented May 28, 2021 •

edited

Loading

tannergooding commented Oct 10, 2021 •

edited

Loading