-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
turn on linear IR #24113
turn on linear IR #24113
Conversation
3d46a01
to
4345023
Compare
Nanosoldier won't be functional again until JuliaIO/JLD.jl#196 is fixed. |
Ah, |
e819d82
to
430c537
Compare
If you rebase this, we can now run nanosoldier against it |
430c537
to
c280f8a
Compare
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
c280f8a
to
61b1e56
Compare
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
61b1e56
to
c1aa491
Compare
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Oy vey. Some of those regressions! |
Looks like it's almost entirely |
Yep, looks scary but not a huge deal. I've been picking these off one by one. |
How is the performance of this for building the system image and running tests? On my machine, it seems to be about
EDIT: fixed build time – I realized I used the wrong reference initially |
Yes, I see the same numbers. I think we'll be able to simplify some of the optimization passes (probably including merging #23240) and add some more compact encodings of common patterns like ssavalue assignment. There are also lots of sequences like this:
that we can peephole optimize away. We'll see how far that gets us. |
c1aa491
to
867fd7d
Compare
This now includes #23240, but using front-end linearization instead of its own linearize pass. Seems to help clean up some of the extra allocations in the benchmarks here. Let's see. @nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
b6a8088
to
ec20c0b
Compare
Ok, it appears that adding some @yuyichao magic has indeed fixed the remaining performance regressions. I worked through a couple more bugs and I think this is working now. |
Everything’s coming up Milhouse! |
67cb6f0
to
5edaa85
Compare
5edaa85
to
1a2b3af
Compare
This recent test is failing on this branch:
The reason appears to be that the better optimizations here are able to eliminate the
became this:
I'm not sure why |
Ok, I believe I've fixed that, by avoiding replacing |
... In particular, in my latest commit I wasn't fully sure whether to return |
292f959
to
3a9ed80
Compare
Looks like this needs a rebase; but the last CI run had quite a bit of green. |
3a9ed80
to
0f4884b
Compare
These objects make it really hard to mutate the AST correctly since one mutation can be accidentally done at places where it is invalid.
The hardest part for running non-local optimization passes (i.e. the transformation does not rely only on one or a few neighboring expressions) is to avoid re-analyse the code. Our current IR, though easy for linear scanning, interpreting, codegen and, to a certain degree, storage, is not very easy for making random updates. Try to workaround this issue in two ways, 1. Never resize the code array when doing updates. Instead, inserting nested arrays that we'll later splice back in for code addition and use `nothing` for code deletion. This way, the array index we cached for other metadata about the code can stay valid. 2. Based on the previous approach, pre-scan the use-def info for all variables before starting the optimization and run the optimization recursively. Code changes will also update this use-def data so that it's always valid for the user. Changes that can affect the use or def of another value will re-trigger the optimization so that we can take advantage of new optimization opportunities. This optimization pass should now handle most of the control-flow insensitive optimizations. Code patterns that are handled partially by this pass but will benefit greatly from an control-flow sensitive version includes, 1. Split slots (based on control flow) This way we can completely eliminate the surprising cost due to variable name conflicts, even when one of the def-use is not type stable. (This pass currently handles the case where all the def/uses are type stable) 2. Delay allocations There are cases where the allocation escapes but only in some branches. This will be especially for error path since we cannot eliminate some `SubArray` allocation only because we want to maintain them for the bounds error. This is very stupid and we should be able to do the allocation only when we throw the error, leaving the performance critical non-error path allocation-free. 3. Reordering assignments It is in general illegal to move an assignment when the slot assigned to is not SSA. However, there are many case that is actually legal (i.e. if there's no other use or def in between) to do so. This shows up a lot in code like ``` SSA = alloc slot = SSA ``` which we currently can't optimize since the slot can't see the assignment is actually an allocation and not a generic black box. We should be able to merge this and eliminate the SSA based on control flow info. For this case, a def info that looks through SSA values can also help.
0f4884b
to
952c7a5
Compare
🎉! |
This should be a fairly agreeable version of #24027. All calls are pulled out of argument position, but are still allowed as any assignment RHS and as arguments to
return
. The Expr.typ
field is still there. I updated codevalidation.jl with the rules implemented here, and got it passing on everything. I hacked in a solution forcglobal
by pre-evaluating constant tuples injl_resolve_globals
.This only increases the sysimg by about 15%, and with a few more things like #24109 I think we'll be fine. I think we should merge this soon and work on optimizations later.
@nanosoldier
runbenchmarks(ALL, vs=":master")