-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose cross-platform helpers for Vector64, Vector128, and Vector256 #53450
Conversation
Note regarding the This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change. |
Going to run isa tests and make sure everything is passing before marking this as ready-for-review. |
36751ca
to
51f1ba7
Compare
/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop |
Azure Pipelines successfully started running 3 pipeline(s). |
/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop |
Azure Pipelines successfully started running 3 pipeline(s). |
/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop |
Azure Pipelines successfully started running 3 pipeline(s). |
…ono's existing check
Co-authored-by: SingleAccretion <62474226+SingleAccretion@users.noreply.github.com>
035783a
to
54610f9
Compare
Thanks! Just did that, hopefully the AOT legs pass now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay.
Overall, the JIT changes look good. I left couple suggestions and one nit.
break; | ||
} | ||
|
||
case NI_Vector128_GetUpper: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the most optimal way of implementing Vector128.GetUpper()
...
Perhaps, something that does DUP Vd, Vn.D[1]
should be used instead.
This would correspond to vector.AsUInt64().GetElement(1).As<T>()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind if I fix this in a follow up PR (just so we can go ahead and get this one resolved and avoid further downstream conflicts)?
Also noting, this might be a case where we want to recognize the ExtractVector128
pattern and replace it with DUP
if we know its always going to be more efficient...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I meant to write that this work can wait when posted the message (but forgot).
Co-authored-by: Egor Chesakov <Egor.Chesakov@microsoft.com>
Need to double check if these arm64 improvements are related - dotnet/perf-autofiling-issues#1715 |
This added close to 500,000 lines of auto-generated tests. Auto-generated tests like this are starting to kill us - in the repo footprint, build times and test times. |
@jkotas, We have several thousand APIs here, each of which need to have their basic functionality testing. Additionally, since they add a range of support to various phases of the JIT including new instructions and new encodings they need some validation that various operations, such as memory read/writes get folded into the instruction, CSE, etc work as expected. Since we can't unit test phases of the JIT, we are stuck with functional tests written in C# that try to ensure trees are generated that cover the relevant patterns. If we could unit test phases, such as emit, we could have a much higher bar of confidence with many fewer tests, as we could directly validate based on the known encoding scenarios. Many of these could likely be moved to some outerloop and only run on PRs known to be touching/impacting HWIntrinsic codegen (reducing the cost of your average PR/commit); but as is, these tests as is have caught a number of bugs and other issues that wouldn't have been exposed otherwise and so I don't think they should be removed from the repo until we have a suitable replacement. |
Some data: src/tests before this PR: 488,952,913 bytes This PR grew src/tests footprint by more than 4%. src/tests is about 60% of the dotnet/runtime repo disk size footprint. Yes, src/tests sources are 1.5x more than sources for both runtimes, all libraries and all libraries tests combined. I understand that we do not have better option currently, but we may need to have to create one if this continues to grow like this. |
I fully support having a better alternative here. I think the ideal scenario would be:
This would cover the majority of bugs and issues we've hit so far at a much cheaper cost. There would still be some things, like |
This PR caused some regression for TE benchmarks when running with Mono. I've created an issue to track it #61484. @tannergooding Could you take a look? |
This mostly resolves #49397 by exposes cross platform helpers for core functionality, mirroring Vector.
This PR is mostly the tests and the actual product changes are much much smaller than the sum. As with the other HWIntrinsic PRs, the tests are templated and are largely autogenerated from a small table. It is recommended to give a pass over a couple of the templates and the template input data, and not to try and review every test individually. There aren't any new concepts here, just cloning the existing templates with a few tweaks for the shared types.
In particular this PR is (numbers are slightly off now due to a few fixes, but largely still representative):
This does not cover
Narrow
,Widen
,ConverTo*
, orIsHardwareAccelerated
. These remaining APIs will go in a separate PR since it requires a bit more refactoring to make work (Vector<T>
still implements these using the "legacy" SIMD support and the PR will need to update that to useSimdAsHWIntrinsic
and to have the logic shareable with theVector64/128/256<T>
paths).This will unblock some work on related issues as well as allow us to merge several code paths on the managed side where the core logic is the same between various platforms. The related issues that will be resolved following this PR include:
Narrow
,Widen
, andConvertTo