egraph support: rewrite to work in terms of CLIF data structures. #5382

cfallin · 2022-12-06T05:21:01Z

This work rewrites the "egraph"-based optimization framework in
Cranelift to operate on aegraphs (acyclic egraphs) represented in the
CLIF itself rather than as a separate data structure to which and from
which we translate the CLIF.

The basic idea is to add a new kind of value, a "union", that is like an
alias but refers to two other values rather than one. This allows us to
represent an eclass of enodes (values) as a tree. The union node allows
for a value to have multiple representations: either constituent value
could be used, and (in well-formed CLIF produced by correct
optimization rules) they must be equivalent.

Like the old egraph infrastructure, we take advantage of acyclicity and
eager rule application to do optimization in a single pass. Like before,
we integrate GVN (during the optimization pass) and LICM (during
elaboration).

Unlike the old egraph infrastructure, everything stays in the
DataFlowGraph. "Pure" enodes are represented as instructions that have
values attached, but that are not placed into the function layout. When
entering "egraph" form, we remove them from the layout while optimizing.
When leaving "egraph" form, during elaboration, we can place an
instruction back into the layout the first time we elaborate the enode;
if we elaborate it more than once, we clone the instruction.

The implementation performs two passes overall:

One, a forward pass in RPO (to see defs before uses), that (i) removes
"pure" instructions from the layout and (ii) optimizes as it goes. As
before, we eagerly optimize, so we form the entire union of optimized
forms of a value before we see any uses of that value. This lets us
rewrite uses to use the most "up-to-date" form of the value and
canonicalize and optimize that form.

The eager rewriting and acyclic representation make each other work
(we could not eagerly rewrite if there were cycles; and acyclicity
does not miss optimization opportunities only because the first time
we introduce a value, we immediately produce its "best" form). This
design choice is also what allows us to avoid the "parent pointers"
and fixpoint loop of traditional egraphs.

This forward optimization pass keeps a scoped hashmap to "intern"
nodes (thus performing GVN), and also interleaves on a per-instruction
level with alias analysis. The interleaving with alias analysis allows
alias analysis to see the most optimized form of each address (so it
can see equivalences), and allows the next value to see any
equivalences (reuses of loads or stored values) that alias analysis
uncovers.
Two, a forward pass in domtree preorder, that "elaborates" pure enodes
back into the layout, possibly in multiple places if needed. This
tracks the loop nest and hoists nodes as needed, performing LICM as it
goes. Note that by doing this in forward order, we avoid the
"fixpoint" that traditional LICM needs: we hoist a def before its
uses, so when we place a node, we place it in the right place the
first time rather than moving later.

This PR replaces the old (a)egraph implementation. It removes both the
cranelift-egraph crate and the logic in cranelift-codegen that uses it.

On spidermonkey.wasm running a simple recursive Fibonacci
microbenchmark, this work shows 5.5% compile-time reduction and 7.7%
runtime improvement (speedup). More benchmarks to come.

Most of this implementation was done in (very productive) pair
programming sessions with Jamey Sharp, thus:

Co-authored-by: Jamey Sharp jsharp@fastly.com

cfallin · 2022-12-06T05:22:24Z

I'll see about splitting this into logical pieces in separate commits tomorrow; I wanted to get the rebased / cleaned-up state up first. I'll run some more benchmarks as well.

cranelift/codegen/Cargo.toml

bjorn3 · 2022-12-06T10:53:15Z

How are alias values printed in the textual format of clif ir? And can it be read back?

cranelift/codegen/Cargo.toml

cfallin · 2022-12-06T17:47:10Z

How are alias values printed in the textual format of clif ir? And can it be read back?

They currently aren't; because union values aren't present in the layout, the usual CLIF writer won't see them (it ignores all insts and values not "in the function" as per the layout). I agree it would be great to come up with a text format for egraph-stage CLIF; maybe as a followup to the initial PR?

This work rewrites the "egraph"-based optimization framework in Cranelift to operate on aegraphs (acyclic egraphs) represented in the CLIF itself rather than as a separate data structure to which and from which we translate the CLIF. The basic idea is to add a new kind of value, a "union", that is like an alias but refers to two other values rather than one. This allows us to represent an eclass of enodes (values) as a tree. The union node allows for a value to have *multiple representations*: either constituent value could be used, and (in well-formed CLIF produced by correct optimization rules) they must be equivalent. Like the old egraph infrastructure, we take advantage of acyclicity and eager rule application to do optimization in a single pass. Like before, we integrate GVN (during the optimization pass) and LICM (during elaboration). Unlike the old egraph infrastructure, everything stays in the DataFlowGraph. "Pure" enodes are represented as instructions that have values attached, but that are not placed into the function layout. When entering "egraph" form, we remove them from the layout while optimizing. When leaving "egraph" form, during elaboration, we can place an instruction back into the layout the first time we elaborate the enode; if we elaborate it more than once, we clone the instruction. The implementation performs two passes overall: - One, a forward pass in RPO (to see defs before uses), that (i) removes "pure" instructions from the layout and (ii) optimizes as it goes. As before, we eagerly optimize, so we form the entire union of optimized forms of a value before we see any uses of that value. This lets us rewrite uses to use the most "up-to-date" form of the value and canonicalize and optimize that form. The eager rewriting and acyclic representation make each other work (we could not eagerly rewrite if there were cycles; and acyclicity does not miss optimization opportunities only because the first time we introduce a value, we immediately produce its "best" form). This design choice is also what allows us to avoid the "parent pointers" and fixpoint loop of traditional egraphs. This forward optimization pass keeps a scoped hashmap to "intern" nodes (thus performing GVN), and also interleaves on a per-instruction level with alias analysis. The interleaving with alias analysis allows alias analysis to see the most optimized form of each address (so it can see equivalences), and allows the next value to see any equivalences (reuses of loads or stored values) that alias analysis uncovers. - Two, a forward pass in domtree preorder, that "elaborates" pure enodes back into the layout, possibly in multiple places if needed. This tracks the loop nest and hoists nodes as needed, performing LICM as it goes. Note that by doing this in forward order, we avoid the "fixpoint" that traditional LICM needs: we hoist a def before its uses, so when we place a node, we place it in the right place the first time rather than moving later. This PR replaces the old (a)egraph implementation. It removes both the cranelift-egraph crate and the logic in cranelift-codegen that uses it. On `spidermonkey.wasm` running a simple recursive Fibonacci microbenchmark, this work shows 5.5% compile-time reduction and 7.7% runtime improvement (speedup). Most of this implementation was done in (very productive) pair programming sessions with Jamey Sharp, thus: Co-authored-by: Jamey Sharp <jsharp@fastly.com>

fitzgen

(posting partial review because the diff keeps changing as I go and making things outdated)

cranelift/codegen/src/context.rs

cranelift/codegen/src/ctxhash.rs

cranelift/codegen/src/egraph.rs

fitzgen · 2022-12-06T18:24:42Z

cranelift/codegen/src/egraph.rs

+        // A pure node always has exactly one result.
+        let orig_value = self.func.dfg.inst_results(inst)[0];


What about something like iadd_carry, which is a pure instruction with two results, how does that fit in here? We used to return tuples and then project fields out of them, but now that we are using the DFG directly, how is this handled?

This could be simplified as

Suggested change

// A pure node always has exactly one result.

let orig_value = self.func.dfg.inst_results(inst)[0];

let orig_value = self.func.dfg.first_result(inst);

Ah, great point with first_result, I hadn't remembered that method existed. Updated!

Regarding multi-output instructions in general: yes, this was a bit of a compromise for now. Jamey and I went back and forth a bit on an Inst-centric vs. Value-centric design in the elaborator and ultimately basing everything on Values was far, far simpler; so to keep everything straight, we need to deal only with single-result instructions.

We judged this to be acceptable-ish because the most common multi-result instructions are calls. There are indeed also adds-with-carries but, for now, not optimizing these probably will not have a material impact on speedups.

Ultimately I'm curious whether we might be able to adapt a tuple-types-and-projection approach to CLIF as well, though that's a much larger change.

cranelift/codegen/src/egraph.rs

bjorn3 · 2022-12-06T20:17:24Z

How are alias values printed in the textual format of clif ir? And can it be read back?

They currently aren't; because union values aren't present in the layout, the usual CLIF writer won't see them (it ignores all insts and values not "in the function" as per the layout). I agree it would be great to come up with a text format for egraph-stage CLIF; maybe as a followup to the initial PR?

Sure.

fitzgen

LGTM with all my feedback addressed. Really excited for this!

cranelift/codegen/src/egraph/cost.rs

cranelift/codegen/src/loop_analysis.rs

cranelift/codegen/src/context.rs

cfallin · 2022-12-06T21:52:13Z

Updated and addressed everything, I think! (Left the one thread about multi-result instructions open; that's something we'll want to address somehow eventually, but I think isn't a quick/easy thing to address in this PR.) Thanks for the review!

… (k1 - k2)`.

cfallin · 2022-12-06T23:42:26Z

Interesting benchmark discovery:

This PR's work branched from main on Nov 19, which was between Cranelift: Make heap_addr return calculated base + index + offset #5231 landing and Cranelift: consider heap's guard pages when legalizing heap_addr #5335 landing, i.e., when we accidentally had turned on explicit bounds-checks unconditionally;
On the egraph-in-dfg branch in my fork (this PR's original branch), I continue to see the speedups above;
On the rebase (egraph-in-dfg-rebase in my fork), the speedups disappear into the noise.

This seems to indicate to me that on worse code (explicit bounds checks), this approach is a little more efficient, while otherwise it's more-or-less equivalent.

Given the two other benefits of flexibility (writing rewrite rules in a simple way) and verifiability, on balance I suspect this will still make sense to turn on by default once fuzz-clean. (I'm doing a followup PR to add it to fuzzing configs.) But I'm very open to discussion on that; one alternative is to keep adding rewrite rules until we see a very clear code-quality (runtime) win.

This PR reverts bytecodealliance#5128 (commit b3333bf), adding back the ability for the fuzzing config generator to set the `use_egraphs` Cranelift option. This will start to fuzz the egraphs-based optimization framework again, now that bytecodealliance#5382 has landed.

…5388) This PR reverts #5128 (commit b3333bf), adding back the ability for the fuzzing config generator to set the `use_egraphs` Cranelift option. This will start to fuzz the egraphs-based optimization framework again, now that #5382 has landed.

bjorn3 · 2022-12-14T15:27:20Z

I just did some benchmarking of egraphs and the perf improvement is huge on the benchmark I tried:

Benchmark 1: ./raytracer_cg_clif
  Time (mean ± σ):      8.553 s ±  0.010 s    [User: 8.539 s, System: 0.014 s]
  Range (min … max):    8.543 s …  8.568 s    10 runs
 
Benchmark 2: ./raytracer_cg_clif_egraph
  Time (mean ± σ):      6.068 s ±  0.017 s    [User: 6.057 s, System: 0.011 s]
  Range (min … max):    6.047 s …  6.108 s    10 runs
 
Benchmark 3: ./raytracer_cg_clif_release
  Time (mean ± σ):      6.450 s ±  0.021 s    [User: 6.439 s, System: 0.012 s]
  Range (min … max):    6.410 s …  6.482 s    10 runs
 
Benchmark 4: ./raytracer_cg_clif_release_egraph
  Time (mean ± σ):      5.853 s ±  0.053 s    [User: 5.841 s, System: 0.012 s]
  Range (min … max):    5.779 s …  5.908 s    10 runs
 
Summary
  './raytracer_cg_clif_release_egraph' ran
    1.04 ± 0.01 times faster than './raytracer_cg_clif_egraph'
    1.10 ± 0.01 times faster than './raytracer_cg_clif_release'
    1.46 ± 0.01 times faster than './raytracer_cg_clif'

(release uses cranelift's speed_and_size opt_level as well as mir inlining and other rustc mir optimizations)

This PR follows up on #5382 and #5391, which rebuilt the egraph-based optimization framework to be more performant, by enabling it by default. Based on performance results in #5382 (my measurements on SpiderMonkey and bjorn3's independent confirmation with cg_clif), it seems that this is reasonable to enable. Now that we have been fuzzing compiler configurations with egraph opts (#5388) for 6 weeks, having fixed a few fuzzbugs that came up (#5409, #5420, #5438) and subsequently received no further reports from OSS-Fuzz, I believe it is stable enough to rely on. This PR enables `use_egraphs`, and also normalizes its meaning: previously it forced optimization (it basically meant "turn on the egraph optimization machinery"), now it runs egraph opts if the opt level indicates (it means "use egraphs to optimize if we are going to optimize"). The conditionals in the top-level pass driver are a little subtle, but will get simpler once we can remove the non-egraph path (which we plan to do eventually!). Fixes #5181.

In bytecodealliance#5382 ("egraph support: rewrite to work in terms of CLIF data structures"), we added the `trace-log` feature to the set of default features for `cranelift-codegen`. I think this was an accident, probably added while debugging and overlooked when cleaning up to merge. So let's undo that change. Fixes bytecodealliance#6548.

In #5382 ("egraph support: rewrite to work in terms of CLIF data structures"), we added the `trace-log` feature to the set of default features for `cranelift-codegen`. I think this was an accident, probably added while debugging and overlooked when cleaning up to merge. So let's undo that change. Fixes #6548.

…e#6549) In bytecodealliance#5382 ("egraph support: rewrite to work in terms of CLIF data structures"), we added the `trace-log` feature to the set of default features for `cranelift-codegen`. I think this was an accident, probably added while debugging and overlooked when cleaning up to merge. So let's undo that change. Fixes bytecodealliance#6548.

In #5382 ("egraph support: rewrite to work in terms of CLIF data structures"), we added the `trace-log` feature to the set of default features for `cranelift-codegen`. I think this was an accident, probably added while debugging and overlooked when cleaning up to merge. So let's undo that change. Fixes #6548.

…e#6549) In bytecodealliance#5382 ("egraph support: rewrite to work in terms of CLIF data structures"), we added the `trace-log` feature to the set of default features for `cranelift-codegen`. I think this was an accident, probably added while debugging and overlooked when cleaning up to merge. So let's undo that change. Fixes bytecodealliance#6548.

In #5382 ("egraph support: rewrite to work in terms of CLIF data structures"), we added the `trace-log` feature to the set of default features for `cranelift-codegen`. I think this was an accident, probably added while debugging and overlooked when cleaning up to merge. So let's undo that change. Fixes #6548.

cfallin requested a review from fitzgen December 6, 2022 05:21

github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:meta Everything related to the meta-language. labels Dec 6, 2022

cfallin force-pushed the egraph-in-dfg-rebase branch from 1d69034 to 743f8f1 Compare December 6, 2022 05:56

bjorn3 reviewed Dec 6, 2022

View reviewed changes

cranelift/codegen/Cargo.toml Outdated Show resolved Hide resolved

bjorn3 reviewed Dec 6, 2022

View reviewed changes

cranelift/codegen/Cargo.toml Show resolved Hide resolved

cfallin force-pushed the egraph-in-dfg-rebase branch 2 times, most recently from 6096ff0 to 88e6d00 Compare December 6, 2022 17:37

cfallin force-pushed the egraph-in-dfg-rebase branch from 88e6d00 to 777bd97 Compare December 6, 2022 17:51

cfallin force-pushed the egraph-in-dfg-rebase branch from 777bd97 to 149b8c8 Compare December 6, 2022 18:43

fitzgen reviewed Dec 6, 2022

View reviewed changes

fitzgen approved these changes Dec 6, 2022

View reviewed changes

cranelift/codegen/src/egraph/cost.rs Show resolved Hide resolved

cranelift/codegen/src/loop_analysis.rs Outdated Show resolved Hide resolved

cranelift/codegen/src/context.rs Outdated Show resolved Hide resolved

cfallin added 3 commits December 6, 2022 13:30

Review feedback.

fe74d3c

Review feedback.

3a62b8a

Review feedback.

0c98b9a

Bugfix: cprop rule: (x + k1) - k2 becomes x - (k2 - k1), not `x -…

40928ab

… (k1 - k2)`.

cfallin force-pushed the egraph-in-dfg-rebase branch from f1e5926 to 40928ab Compare December 6, 2022 22:26

cfallin merged commit f980def into bytecodealliance:main Dec 6, 2022

cfallin deleted the egraph-in-dfg-rebase branch December 6, 2022 22:59

cfallin mentioned this pull request Dec 6, 2022

Fuzzing: add use_egraphs option back to fuzzing config generator. #5388

Merged

cfallin mentioned this pull request Jan 18, 2023

Enable egraph-based optimization by default. #5587

Merged

TimonPost mentioned this pull request Jun 9, 2023

Disable trace logs by default to optimize module compilation #6548

Closed

jameysharp mentioned this pull request Jun 9, 2023

cranelift: don't enable trace-log feature by default #6549

Merged

jameysharp mentioned this pull request Jun 9, 2023

backport "cranelift: don't enable trace-log feature by default" (#6549) #6552

Merged

jameysharp mentioned this pull request Jun 9, 2023

backport "cranelift: don't enable trace-log feature by default" (#6549) #6553

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

egraph support: rewrite to work in terms of CLIF data structures. #5382

egraph support: rewrite to work in terms of CLIF data structures. #5382

cfallin commented Dec 6, 2022 •

edited

Loading

cfallin commented Dec 6, 2022

bjorn3 commented Dec 6, 2022

cfallin commented Dec 6, 2022

fitzgen left a comment

fitzgen Dec 6, 2022

cfallin Dec 6, 2022

bjorn3 commented Dec 6, 2022

fitzgen left a comment

cfallin commented Dec 6, 2022

cfallin commented Dec 6, 2022

bjorn3 commented Dec 14, 2022

		// A pure node always has exactly one result.
		let orig_value = self.func.dfg.inst_results(inst)[0];

	// A pure node always has exactly one result.
	let orig_value = self.func.dfg.inst_results(inst)[0];
	let orig_value = self.func.dfg.first_result(inst);

egraph support: rewrite to work in terms of CLIF data structures. #5382

egraph support: rewrite to work in terms of CLIF data structures. #5382

Conversation

cfallin commented Dec 6, 2022 • edited Loading

cfallin commented Dec 6, 2022

bjorn3 commented Dec 6, 2022

cfallin commented Dec 6, 2022

fitzgen left a comment

Choose a reason for hiding this comment

fitzgen Dec 6, 2022

Choose a reason for hiding this comment

cfallin Dec 6, 2022

Choose a reason for hiding this comment

bjorn3 commented Dec 6, 2022

fitzgen left a comment

Choose a reason for hiding this comment

cfallin commented Dec 6, 2022

cfallin commented Dec 6, 2022

bjorn3 commented Dec 14, 2022

cfallin commented Dec 6, 2022 •

edited

Loading