-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make intrinsic nodes multi op (aka delete GT_LIST
)
#59912
Make intrinsic nodes multi op (aka delete GT_LIST
)
#59912
Conversation
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsThis is the final result of the work detailed here, with one very significant detail: This is a zero-diff change across all configurations according to SPMI. The history of this branch has been heavily rewritten to assist in review: there is exactly one commit for each changed function, with the exception of the first and two last commits.
|
df6320a
to
9a6ee5e
Compare
@dotnet/jit-contrib, @tannergooding |
9a6ee5e
to
d9bbae0
Compare
@SingleAccretion thanks again for what looks like a very nice contribution. @dotnet/jit-contrib who would like to be involved in reviewing this? |
@SingleAccretion Thank you for the contribution.
@AndyAyersMS I can take a look later this week. |
d9bbae0
to
e56386f
Compare
@echesakovMSFT PTAL. |
Sure, I will take a look this week, didn't have time to do it last week |
e56386f
to
e8260cd
Compare
} | ||
#endif | ||
|
||
GenTree*& Op(size_t index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that these should be
GenTree* GetOp(size_t index) const;
void SetOp(size_t index, GenTree* value);
The assignment like
tree->Op(1) = op1;
take at least couple seconds from me to parse.
@dotnet/jit-contrib Anyone has the same opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, the reason I made these short is that I find the pattern of "op
maintenance" (see here for an example) displeasing, and bug-prone (I spent at least a few hours on fixing one occurrence), so I would like to encourage people to use the accessors more liberally and let the native compiler do the CSEs for us.
Because of this I wanted it to be Op
, but then it could not be GenTree*
because in our codebase this is the prevailing pattern:
TThing& Thing();
TThing GetThing() const;
SetThing(TThing thing);
But I have no strong attachment to this (or arguments for it), and will happily rewrite them as we decided is best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks great!
I like how it is simple to handle multi-op intrinsics after the change.
I left couple comments we should address before merging this.
@SingleAccretion Can you please resolve the conflicts? I would want to measure the JIT throughput impact before signing off on this. |
@echesakovMSFT out of curiosity: what setup will you be using? |
@SingleAccretion Previously, we used Pin with crossgen (v1) running on SPC. Since we don't built crossgen.exe anymore, I am going to use Pin with superpmi. As you know, Pin runs only on Intel platforms, so for Arm64 assessment I will do crossjitting. Note that given the specifics of the changes (they would primarily affect code that extensively uses hardware intrinsics) I am the most interested in running superpmi on coreclr_tests collection rather than libraries collection (but I will do both anyway). As an alternative, I might use/create a superpmi collection that has JIT/HardwareIntrinsics tests only. |
Ehh, I was hoping that recent changes to the intrinsic code would apply cleanly, seems however like that's not the case and I will need to do more conflict resolution (it's fine, just will take some time as the base of this branch is not the base of my currently built fork)... Edit: looks done. |
I measured the JIT throughput impact using the following setup: running superpmi replay under Pin tool with clrjit_win_x64_x64.dll and clrjit_universal_arm64_x64.dll running on two collections - coreclr_tests.pmi (that contains all the JIT\HardwareIntrinsics tests) and libraries.pmi (that contains all vectorized code that we have in .NET). Each combination was run three times. The results show that there is slight improvement with this change on both x64 and arm64.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for contribution, @SingleAccretion! Great work!
This is the final result of the work detailed here, with one very significant addition:
GTF_REVERSE_OPS
is supported for the multi-op nodes (as much as I would have liked to drop that handling, 3 vector methods showed up in benchmarks with regressions since the time I originally wrote the code).This is a zero-diff change across all configurations according to SPMI.
The history of this branch has been heavily rewritten to assist in review: there is exactly one commit for each changed function, with the exception of the first commit and the two last commits.
A few
TODO-List-Cleanup
comments have been added highlighting further work/simplifications enabled by this change (which will be addressed if it is accepted).