Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make intrinsic nodes multi op (aka delete GT_LIST) #59912

Merged
merged 138 commits into from
Nov 20, 2021

Conversation

SingleAccretion
Copy link
Contributor

@SingleAccretion SingleAccretion commented Oct 3, 2021

This is the final result of the work detailed here, with one very significant addition: GTF_REVERSE_OPS is supported for the multi-op nodes (as much as I would have liked to drop that handling, 3 vector methods showed up in benchmarks with regressions since the time I originally wrote the code).

This is a zero-diff change across all configurations according to SPMI.

The history of this branch has been heavily rewritten to assist in review: there is exactly one commit for each changed function, with the exception of the first commit and the two last commits.

A few TODO-List-Cleanup comments have been added highlighting further work/simplifications enabled by this change (which will be addressed if it is accepted).

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 3, 2021
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Oct 3, 2021
@ghost
Copy link

ghost commented Oct 3, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

This is the final result of the work detailed here, with one very significant detail: GTF_REVERSE_OPS is supported for the multi-op nodes (as much as I would have liked to drop that handling, 3 vector methods showed up in benchmarks with regressions since the time I originally wrote the code).

This is a zero-diff change across all configurations according to SPMI.

The history of this branch has been heavily rewritten to assist in review: there is exactly one commit for each changed function, with the exception of the first and two last commits.

Author: SingleAccretion
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@SingleAccretion SingleAccretion force-pushed the Make-Intrinsic-Nodes-Multi-Op branch 7 times, most recently from df6320a to 9a6ee5e Compare October 4, 2021 23:49
@SingleAccretion SingleAccretion marked this pull request as ready for review October 6, 2021 13:37
@SingleAccretion
Copy link
Contributor Author

SingleAccretion commented Oct 6, 2021

@dotnet/jit-contrib, @tannergooding

@AndyAyersMS
Copy link
Member

@SingleAccretion thanks again for what looks like a very nice contribution.

@dotnet/jit-contrib who would like to be involved in reviewing this?

@echesakov
Copy link
Contributor

@SingleAccretion Thank you for the contribution.

@dotnet/jit-contrib who would like to be involved in reviewing this?

@AndyAyersMS I can take a look later this week.

@JulieLeeMSFT
Copy link
Member

@echesakovMSFT PTAL.

@echesakov
Copy link
Contributor

@echesakovMSFT PTAL.

Sure, I will take a look this week, didn't have time to do it last week

}
#endif

GenTree*& Op(size_t index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that these should be

GenTree* GetOp(size_t index) const;
void SetOp(size_t index, GenTree* value);

The assignment like

tree->Op(1) = op1;

take at least couple seconds from me to parse.

@dotnet/jit-contrib Anyone has the same opinion?

Copy link
Member

@EgorBo EgorBo Nov 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'm fine with both - maybe other members have a different opinion? We mostly use Get/Set pattern, but it's more verbose, and we already use rhs assignments, e.g.
image

Copy link
Contributor Author

@SingleAccretion SingleAccretion Nov 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, the reason I made these short is that I find the pattern of "op maintenance" (see here for an example) displeasing, and bug-prone (I spent at least a few hours on fixing one occurrence), so I would like to encourage people to use the accessors more liberally and let the native compiler do the CSEs for us.

Because of this I wanted it to be Op, but then it could not be GenTree* because in our codebase this is the prevailing pattern:

TThing& Thing();
TThing GetThing() const;
SetThing(TThing thing);

But I have no strong attachment to this (or arguments for it), and will happily rewrite them as we decided is best.

Copy link
Contributor

@echesakov echesakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks great!
I like how it is simple to handle multi-op intrinsics after the change.
I left couple comments we should address before merging this.

src/coreclr/jit/codegenlinear.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/flowgraph.cpp Show resolved Hide resolved
src/coreclr/jit/hwintrinsicarm64.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/hwintrinsicxarch.cpp Outdated Show resolved Hide resolved
@echesakov
Copy link
Contributor

@SingleAccretion Can you please resolve the conflicts? I would want to measure the JIT throughput impact before signing off on this.

@SingleAccretion
Copy link
Contributor Author

@echesakovMSFT out of curiosity: what setup will you be using?

@echesakov
Copy link
Contributor

@echesakovMSFT out of curiosity: what setup will you be using?

@SingleAccretion Previously, we used Pin with crossgen (v1) running on SPC. Since we don't built crossgen.exe anymore, I am going to use Pin with superpmi. As you know, Pin runs only on Intel platforms, so for Arm64 assessment I will do crossjitting. Note that given the specifics of the changes (they would primarily affect code that extensively uses hardware intrinsics) I am the most interested in running superpmi on coreclr_tests collection rather than libraries collection (but I will do both anyway). As an alternative, I might use/create a superpmi collection that has JIT/HardwareIntrinsics tests only.

@SingleAccretion
Copy link
Contributor Author

SingleAccretion commented Nov 18, 2021

Ehh, I was hoping that recent changes to the intrinsic code would apply cleanly, seems however like that's not the case and I will need to do more conflict resolution (it's fine, just will take some time as the base of this branch is not the base of my currently built fork)...

Edit: looks done.

@echesakov
Copy link
Contributor

I measured the JIT throughput impact using the following setup: running superpmi replay under Pin tool with clrjit_win_x64_x64.dll and clrjit_universal_arm64_x64.dll running on two collections - coreclr_tests.pmi (that contains all the JIT\HardwareIntrinsics tests) and libraries.pmi (that contains all vectorized code that we have in .NET). Each combination was run three times.

The results show that there is slight improvement with this change on both x64 and arm64.

collection name base (instr. count) diff (instr. count) base mean diff mean relative difference
coreclr_tests.pmi.windows.arm64.checked.mch 473931392931,474021284550,473784971639 472193993230,472957155424,472498138042 473912549707 472549762232 -0.29%
libraries.pmi.windows.arm64.checked.mch 309239932345,308904007474,308783025160 308501962090,309184810211,308615556152 308975654993 308767442818 -0.07%
coreclr_tests.pmi.windows.x64.checked.mch 432614447265,431224537281,433067453527 431036382674,431957329675,430281590377 432302146024 431091767575 -0.28%
libraries.pmi.windows.x64.checked.mch 289881794617,290333310434,289971314603 289876347257,289774232789,289510475775 290062139885 289720351940 -0.12%

Copy link
Contributor

@echesakov echesakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for contribution, @SingleAccretion! Great work!

@ghost ghost locked as resolved and limited conversation to collaborators Dec 31, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member NO-SQUASH The PR should not be squashed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants