-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backport #30385 + dependent changes to 1.0.4 #30650
Conversation
Previously these sorts of function would block constant propagation. Hopfully #28955 will just fix this, but until then, add a surgical fix and a test.
Inlining incorrectly computed the new atypes for an _apply call, leading to a cache miss and lack of inlining for call targets that are worth inlining for the given constant arguments, but not necessarily in general.
The design taken here is that CFG transformations are allowed during compacting, but BBs are only removed (i.e. BB numbers are only changed at the beginning of compaction).
CFG transforms can currently cause issues like #29107, but I'm still a few days away from fixing this properly. In the meantime, disable the transform.
…ment for "partially constant" tuples Previously, we hacked in an additional `InferenceResult` field to store varargs type information in order to facilitate better constant propagation through varargs methods. There were many other places, however, where constants moving in/out of tuples/varargs thwarted constant propagation. This commit removes the varargs hack, replacing it with a new inference lattice element (`PartialTuple`) that represents tuples where some (but not all) of the elements are constants. This allows us to follow through with constant propagation in more situations involving tuple construction/destructuring, and also enabled a clean-up of the `InferenceResult` caching code.
E.g. if we had `PiNode(1, CartesianIndex)`, we would eliminate that because `1` was a constant. However, leaving this in allows the compiler to realize that this code is unreachable, as well as guarding codegen against having to wrok through invalid IR.
Unfortunately, we cannnot always rely on :invokes to have argument values that match the declared ssa types (e.g. if the :invoke is dynamically unreachable). Because of that, we cannot assert here, but must instead emit a runtime trap.
Reimplement a larger portion of the optimizations in jl_f__apply in the fallback function, so we can reduce the performance wall in more cases. General fix for #29133-like performance issues
This feels a bit of a large change for a patch release and it touches a lot of internals so I marked it for triage to discuss it. |
Agree, this feels pretty large and risky for a patch release. |
Also agree. Only critical bug fixes should go in patch releases, not major improvements to type inference. Before anybody asks: no, inferring less accurate types is not a bug 😄 |
Ah, sorry! I thought non-API-changing performance improvements were also fair game for patch releases, so I backported this grab-bag of compiler improvements to solve the tuple perf issues in 1.0.x. It's not really one big change so much as a bunch of related little changes, but I see your point w.r.t. risk. I'll just close this then since there seems to be consensus. |
Had to do a bit of git sleuthing to find some previous changes that #30385 depended on that hadn't yet been backported.
I'm opening this PR against
backport-1.0.4
instead of committing directly to that branch just to make sure I don't screw anything up 😛