Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

egraphs: don't let rematerialization override LICM. #7306

Merged

Conversation

cfallin
Copy link
Member

@cfallin cfallin commented Oct 19, 2023

This reworks the way that remat and LICM interact during aegraph elaboration. In principle, both happen during the same single-pass "code placement" algorithm: we decide where to place pure instructions (those that are eligible for movement), and remat pushes them one way while LICM pushes them the other.

The interaction is a little more subtle than simple heuristic priority, though -- it's really a decision ordering issue. A remat'd value wants to sink as deep into the loop nest as it can (to the use's block), but we don't know where the uses go until we process them (and make LICM-related choices), and we process uses after defs during elaboration. Or more precisely, we have some work at the use before recursively processing the def, and some work after the recursion returns; and the LICM decision happens after recursion returns, because LICM wants to know where the defs are to know how high we can hoist. (The recursion is itself unrolled into a state machine on an explicit stack so that's a little hard to see but that's what is happening in principle.)

The solution here is to make remat a separate just-in-time thing, once we have arg values. Just before we plug the final arg values into the elaborated instruction, we ask: is this a remat'd value, and if so, do we have a copy of the computation in this block yet. If not, we make one. This has to happen in two places (the main elab loop and the toplevel driver from the skeleton).

The one downside of this solution is that it doesn't handle recursive rematerialization by default. This means that if we, for example, decide to remat single-constant-arg adds (as we actually do in our current rules), we won't then also recursively remat the constant arg to those adds. This can be seen in the licm.clif test case. This doesn't seem to be a dealbreaker to me because most such cases will be able to fold the constants anyway (they happen mostly because of pointer pre-computations: a loop over structs in Wasm computes heap_base + p + offset, and naive LICM pulls a heap_base + offset out of the loop for every struct field accessed in the loop, with horrible register pressure resulting; that's why we have that remat rule. Most such offsets are pretty small.).

Fixes #7283.

@cfallin cfallin requested a review from a team as a code owner October 19, 2023 23:22
@cfallin cfallin requested review from fitzgen, elliottt and alexcrichton and removed request for a team and fitzgen October 19, 2023 23:22
@github-actions github-actions bot added the cranelift Issues related to the Cranelift code generator label Oct 20, 2023
@cfallin cfallin force-pushed the licm-in-the-correct-direction branch from de91dc7 to a094b16 Compare October 20, 2023 03:13
Copy link
Member

@elliottt elliottt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me! I had some questions, but none of it should block merging.

cranelift/codegen/src/fx.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/egraph/elaborate.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/egraph/elaborate.rs Show resolved Hide resolved
cranelift/codegen/src/egraph/elaborate.rs Show resolved Hide resolved
This reworks the way that remat and LICM interact during aegraph
elaboration. In principle, both happen during the same single-pass "code
placement" algorithm: we decide where to place pure instructions (those
that are eligible for movement), and remat pushes them one way while
LICM pushes them the other.

The interaction is a little more subtle than simple heuristic priority,
though -- it's really a decision ordering issue. A remat'd value wants to sink
as deep into the loop nest as it can (to the use's block), but we don't
know *where* the uses go until we process them (and make LICM-related
choices), and we process uses after defs during elaboration. Or more
precisely, we have some work at the use before recursively processing
the def, and some work after the recursion returns; and the LICM
decision happens after recursion returns, because LICM wants to know
where the defs are to know how high we can hoist. (The recursion is
itself unrolled into a state machine on an explicit stack so that's a
little hard to see but that's what is happening in principle.)

The solution here is to make remat a separate just-in-time thing, once
we have arg values. Just before we plug the final arg values into the
elaborated instruction, we ask: is this a remat'd value, and if so, do
we have a copy of the computation in this block yet. If not, we make
one. This has to happen in two places (the main elab loop and the
toplevel driver from the skeleton).

The one downside of this solution is that it doesn't handle *recursive*
rematerialization by default. This means that if we, for example, decide
to remat single-constant-arg adds (as we actually do in our current
rules), we won't then also recursively remat the constant arg to those
adds. This can be seen in the `licm.clif` test case. This doesn't seem
to be a dealbreaker to me because most such cases will be able to fold
the constants anyway (they happen mostly because of pointer
pre-computations: a loop over structs in Wasm computes heap_base + p +
offset, and naive LICM pulls a `heap_base + offset` out of the loop for
every struct field accessed in the loop, with horrible register pressure
resulting; that's why we have that remat rule. Most such offsets are
pretty small.).

Fixes bytecodealliance#7283.
@cfallin cfallin force-pushed the licm-in-the-correct-direction branch from a094b16 to d230360 Compare October 20, 2023 19:37
@cfallin cfallin enabled auto-merge October 20, 2023 19:37
@cfallin cfallin added this pull request to the merge queue Oct 20, 2023
Merged via the queue into bytecodealliance:main with commit 77d030c Oct 20, 2023
@cfallin cfallin deleted the licm-in-the-correct-direction branch October 20, 2023 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cranelift sinking loop-invariant code into a loop
2 participants