-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow removal of HWIntrinsic nodes that do not have side effects #84110
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak Issue DetailsThis resolves #9626
|
CC. @dotnet/jit-contrib Diffs from before the "minimize TP impact" commit For that set of diffs,
However, if you measure with PGO for both the base and diff, you instead get:
So most of the impact is already "resolved" from PGO. With the most recent commit I got this down to:
or with PGO enabled for both base and diff:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a problem with modeling "special" side effects for intrinsics: it doesn't work for all cases.
Consider:
COMMA
HWI_Prefetch (or other "special" intrinsic"
NODE
Today, morph will simplify this down to just NODE
, because side effect extraction doesn't know about the "special" side effects. Same problem exists anywhere code does gtFlags & GTF_SIDE_EFFECT
(a lot of places). The only reason this isn't a problem currently is because the problematic nodes are top-level. If we ever add a "special" node that produces values, that will cease to be the case.
Can we get away, CQ-wise, with labeling the special side effect with GTF_ASG
? It would make the model self-consistent, simpler and more robust.
The main reason for the TP impact difference between PGO vs non-PGO is that the PGO data is still mostly correct and it still has information for things like Without PGO, MSVC sees that the hwintrinsic path exists, is the only usage of the call, and doesn't bloat the method too much so it inlines it. With PGO, MSVC sees that the hwintrinsic path is cold and so it opts to not inline the code. That in turn allows other inlining and optimizations making it overall much cheaper. -- GT_CALL is also moved earlier in the checks w/ PGO due to it being hotter, which also improves TP overall. |
If we add such nodes, that's potentially something we'll need to handle at that time. As is, we're already modeling the things accurately with the main flags. We're really only account for the desire to not remove the small handful of very-special void nodes.
We're not doing that for other cases, like |
I am not sure I understand. There aren't any other cases today where the side effects go outside what is supported by the flags. |
Sorry, was mixing up my examples. For the barriers, we are marking with It would probably be a better long term investment to have a different side effect kind for things like barriers or other node types which aren't actually assignments but which also shouldn't be reordered or removed. |
Well, it's a question of what to do now. I think it is unlikely to have any real downside to have the special nodes be a bit more pessimistic about their side effects than they strictly need to be, but having a consistent side effects model. We have prior art in this area: |
I guess we can mark it as a call or assignment, but in general I don't think doing so is "good practice". Ideally we'd simply mark this appropriately so that small optimizations can still happen as appropriate. |
Responded to all the feedback and resolved everything except for the overload resolution, which I gave a reason why not above. Should be ready for another review pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo comments. Thank you for addressing the feedback!
This resolves #9626