Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

egraph support: rewrite to work in terms of CLIF data structures. #5382

Merged
merged 5 commits into from
Dec 6, 2022

Conversation

cfallin
Copy link
Member

@cfallin cfallin commented Dec 6, 2022

This work rewrites the "egraph"-based optimization framework in
Cranelift to operate on aegraphs (acyclic egraphs) represented in the
CLIF itself rather than as a separate data structure to which and from
which we translate the CLIF.

The basic idea is to add a new kind of value, a "union", that is like an
alias but refers to two other values rather than one. This allows us to
represent an eclass of enodes (values) as a tree. The union node allows
for a value to have multiple representations: either constituent value
could be used, and (in well-formed CLIF produced by correct
optimization rules) they must be equivalent.

Like the old egraph infrastructure, we take advantage of acyclicity and
eager rule application to do optimization in a single pass. Like before,
we integrate GVN (during the optimization pass) and LICM (during
elaboration).

Unlike the old egraph infrastructure, everything stays in the
DataFlowGraph. "Pure" enodes are represented as instructions that have
values attached, but that are not placed into the function layout. When
entering "egraph" form, we remove them from the layout while optimizing.
When leaving "egraph" form, during elaboration, we can place an
instruction back into the layout the first time we elaborate the enode;
if we elaborate it more than once, we clone the instruction.

The implementation performs two passes overall:

  • One, a forward pass in RPO (to see defs before uses), that (i) removes
    "pure" instructions from the layout and (ii) optimizes as it goes. As
    before, we eagerly optimize, so we form the entire union of optimized
    forms of a value before we see any uses of that value. This lets us
    rewrite uses to use the most "up-to-date" form of the value and
    canonicalize and optimize that form.

    The eager rewriting and acyclic representation make each other work
    (we could not eagerly rewrite if there were cycles; and acyclicity
    does not miss optimization opportunities only because the first time
    we introduce a value, we immediately produce its "best" form). This
    design choice is also what allows us to avoid the "parent pointers"
    and fixpoint loop of traditional egraphs.

    This forward optimization pass keeps a scoped hashmap to "intern"
    nodes (thus performing GVN), and also interleaves on a per-instruction
    level with alias analysis. The interleaving with alias analysis allows
    alias analysis to see the most optimized form of each address (so it
    can see equivalences), and allows the next value to see any
    equivalences (reuses of loads or stored values) that alias analysis
    uncovers.

  • Two, a forward pass in domtree preorder, that "elaborates" pure enodes
    back into the layout, possibly in multiple places if needed. This
    tracks the loop nest and hoists nodes as needed, performing LICM as it
    goes. Note that by doing this in forward order, we avoid the
    "fixpoint" that traditional LICM needs: we hoist a def before its
    uses, so when we place a node, we place it in the right place the
    first time rather than moving later.

This PR replaces the old (a)egraph implementation. It removes both the
cranelift-egraph crate and the logic in cranelift-codegen that uses it.

On spidermonkey.wasm running a simple recursive Fibonacci
microbenchmark, this work shows 5.5% compile-time reduction and 7.7%
runtime improvement (speedup). More benchmarks to come.

Most of this implementation was done in (very productive) pair
programming sessions with Jamey Sharp, thus:

Co-authored-by: Jamey Sharp jsharp@fastly.com

@cfallin cfallin requested a review from fitzgen December 6, 2022 05:21
@cfallin
Copy link
Member Author

cfallin commented Dec 6, 2022

I'll see about splitting this into logical pieces in separate commits tomorrow; I wanted to get the rebased / cleaned-up state up first. I'll run some more benchmarks as well.

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:meta Everything related to the meta-language. labels Dec 6, 2022
@cfallin cfallin force-pushed the egraph-in-dfg-rebase branch from 1d69034 to 743f8f1 Compare December 6, 2022 05:56
cranelift/codegen/Cargo.toml Outdated Show resolved Hide resolved
@bjorn3
Copy link
Contributor

bjorn3 commented Dec 6, 2022

How are alias values printed in the textual format of clif ir? And can it be read back?

@cfallin cfallin force-pushed the egraph-in-dfg-rebase branch 2 times, most recently from 6096ff0 to 88e6d00 Compare December 6, 2022 17:37
@cfallin
Copy link
Member Author

cfallin commented Dec 6, 2022

How are alias values printed in the textual format of clif ir? And can it be read back?

They currently aren't; because union values aren't present in the layout, the usual CLIF writer won't see them (it ignores all insts and values not "in the function" as per the layout). I agree it would be great to come up with a text format for egraph-stage CLIF; maybe as a followup to the initial PR?

@cfallin cfallin force-pushed the egraph-in-dfg-rebase branch from 88e6d00 to 777bd97 Compare December 6, 2022 17:51
This work rewrites the "egraph"-based optimization framework in
Cranelift to operate on aegraphs (acyclic egraphs) represented in the
CLIF itself rather than as a separate data structure to which and from
which we translate the CLIF.

The basic idea is to add a new kind of value, a "union", that is like an
alias but refers to two other values rather than one.  This allows us to
represent an eclass of enodes (values) as a tree. The union node allows
for a value to have *multiple representations*: either constituent value
could be used, and (in well-formed CLIF produced by correct
optimization rules) they must be equivalent.

Like the old egraph infrastructure, we take advantage of acyclicity and
eager rule application to do optimization in a single pass. Like before,
we integrate GVN (during the optimization pass) and LICM (during
elaboration).

Unlike the old egraph infrastructure, everything stays in the
DataFlowGraph. "Pure" enodes are represented as instructions that have
values attached, but that are not placed into the function layout. When
entering "egraph" form, we remove them from the layout while optimizing.
When leaving "egraph" form, during elaboration, we can place an
instruction back into the layout the first time we elaborate the enode;
if we elaborate it more than once, we clone the instruction.

The implementation performs two passes overall:

- One, a forward pass in RPO (to see defs before uses), that (i) removes
  "pure" instructions from the layout and (ii) optimizes as it goes. As
  before, we eagerly optimize, so we form the entire union of optimized
  forms of a value before we see any uses of that value. This lets us
  rewrite uses to use the most "up-to-date" form of the value and
  canonicalize and optimize that form.

  The eager rewriting and acyclic representation make each other work
  (we could not eagerly rewrite if there were cycles; and acyclicity
  does not miss optimization opportunities only because the first time
  we introduce a value, we immediately produce its "best" form). This
  design choice is also what allows us to avoid the "parent pointers"
  and fixpoint loop of traditional egraphs.

  This forward optimization pass keeps a scoped hashmap to "intern"
  nodes (thus performing GVN), and also interleaves on a per-instruction
  level with alias analysis. The interleaving with alias analysis allows
  alias analysis to see the most optimized form of each address (so it
  can see equivalences), and allows the next value to see any
  equivalences (reuses of loads or stored values) that alias analysis
  uncovers.

- Two, a forward pass in domtree preorder, that "elaborates" pure enodes
  back into the layout, possibly in multiple places if needed. This
  tracks the loop nest and hoists nodes as needed, performing LICM as it
  goes. Note that by doing this in forward order, we avoid the
  "fixpoint" that traditional LICM needs: we hoist a def before its
  uses, so when we place a node, we place it in the right place the
  first time rather than moving later.

This PR replaces the old (a)egraph implementation. It removes both the
cranelift-egraph crate and the logic in cranelift-codegen that uses it.

On `spidermonkey.wasm` running a simple recursive Fibonacci
microbenchmark, this work shows 5.5% compile-time reduction and 7.7%
runtime improvement (speedup).

Most of this implementation was done in (very productive) pair
programming sessions with Jamey Sharp, thus:

Co-authored-by: Jamey Sharp <jsharp@fastly.com>
@cfallin cfallin force-pushed the egraph-in-dfg-rebase branch from 777bd97 to 149b8c8 Compare December 6, 2022 18:43
Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(posting partial review because the diff keeps changing as I go and making things outdated)

cranelift/codegen/src/context.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/ctxhash.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/ctxhash.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/egraph.rs Outdated Show resolved Hide resolved
Comment on lines 199 to 200
// A pure node always has exactly one result.
let orig_value = self.func.dfg.inst_results(inst)[0];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • What about something like iadd_carry, which is a pure instruction with two results, how does that fit in here? We used to return tuples and then project fields out of them, but now that we are using the DFG directly, how is this handled?
  • This could be simplified as
    Suggested change
    // A pure node always has exactly one result.
    let orig_value = self.func.dfg.inst_results(inst)[0];
    let orig_value = self.func.dfg.first_result(inst);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, great point with first_result, I hadn't remembered that method existed. Updated!

Regarding multi-output instructions in general: yes, this was a bit of a compromise for now. Jamey and I went back and forth a bit on an Inst-centric vs. Value-centric design in the elaborator and ultimately basing everything on Values was far, far simpler; so to keep everything straight, we need to deal only with single-result instructions.

We judged this to be acceptable-ish because the most common multi-result instructions are calls. There are indeed also adds-with-carries but, for now, not optimizing these probably will not have a material impact on speedups.

Ultimately I'm curious whether we might be able to adapt a tuple-types-and-projection approach to CLIF as well, though that's a much larger change.

cranelift/codegen/src/egraph.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/egraph.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/egraph.rs Outdated Show resolved Hide resolved
@bjorn3
Copy link
Contributor

bjorn3 commented Dec 6, 2022

How are alias values printed in the textual format of clif ir? And can it be read back?

They currently aren't; because union values aren't present in the layout, the usual CLIF writer won't see them (it ignores all insts and values not "in the function" as per the layout). I agree it would be great to come up with a text format for egraph-stage CLIF; maybe as a followup to the initial PR?

Sure.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with all my feedback addressed. Really excited for this!

cranelift/codegen/src/egraph/cost.rs Show resolved Hide resolved
cranelift/codegen/src/loop_analysis.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/context.rs Outdated Show resolved Hide resolved
@cfallin
Copy link
Member Author

cfallin commented Dec 6, 2022

Updated and addressed everything, I think! (Left the one thread about multi-result instructions open; that's something we'll want to address somehow eventually, but I think isn't a quick/easy thing to address in this PR.) Thanks for the review!

@cfallin cfallin force-pushed the egraph-in-dfg-rebase branch from f1e5926 to 40928ab Compare December 6, 2022 22:26
@cfallin cfallin merged commit f980def into bytecodealliance:main Dec 6, 2022
@cfallin cfallin deleted the egraph-in-dfg-rebase branch December 6, 2022 22:59
@cfallin
Copy link
Member Author

cfallin commented Dec 6, 2022

Interesting benchmark discovery:

This seems to indicate to me that on worse code (explicit bounds checks), this approach is a little more efficient, while otherwise it's more-or-less equivalent.

Given the two other benefits of flexibility (writing rewrite rules in a simple way) and verifiability, on balance I suspect this will still make sense to turn on by default once fuzz-clean. (I'm doing a followup PR to add it to fuzzing configs.) But I'm very open to discussion on that; one alternative is to keep adding rewrite rules until we see a very clear code-quality (runtime) win.

cfallin added a commit to cfallin/wasmtime that referenced this pull request Dec 6, 2022
This PR reverts bytecodealliance#5128 (commit b3333bf),
adding back the ability for the fuzzing config generator to set
the `use_egraphs` Cranelift option. This will start to fuzz the
egraphs-based optimization framework again, now that bytecodealliance#5382 has landed.
cfallin added a commit that referenced this pull request Dec 7, 2022
…5388)

This PR reverts #5128 (commit b3333bf),
adding back the ability for the fuzzing config generator to set
the `use_egraphs` Cranelift option. This will start to fuzz the
egraphs-based optimization framework again, now that #5382 has landed.
@bjorn3
Copy link
Contributor

bjorn3 commented Dec 14, 2022

I just did some benchmarking of egraphs and the perf improvement is huge on the benchmark I tried:

Benchmark 1: ./raytracer_cg_clif
  Time (mean ± σ):      8.553 s ±  0.010 s    [User: 8.539 s, System: 0.014 s]
  Range (min … max):    8.543 s …  8.568 s    10 runs
 
Benchmark 2: ./raytracer_cg_clif_egraph
  Time (mean ± σ):      6.068 s ±  0.017 s    [User: 6.057 s, System: 0.011 s]
  Range (min … max):    6.047 s …  6.108 s    10 runs
 
Benchmark 3: ./raytracer_cg_clif_release
  Time (mean ± σ):      6.450 s ±  0.021 s    [User: 6.439 s, System: 0.012 s]
  Range (min … max):    6.410 s …  6.482 s    10 runs
 
Benchmark 4: ./raytracer_cg_clif_release_egraph
  Time (mean ± σ):      5.853 s ±  0.053 s    [User: 5.841 s, System: 0.012 s]
  Range (min … max):    5.779 s …  5.908 s    10 runs
 
Summary
  './raytracer_cg_clif_release_egraph' ran
    1.04 ± 0.01 times faster than './raytracer_cg_clif_egraph'
    1.10 ± 0.01 times faster than './raytracer_cg_clif_release'
    1.46 ± 0.01 times faster than './raytracer_cg_clif'

(release uses cranelift's speed_and_size opt_level as well as mir inlining and other rustc mir optimizations)

cfallin added a commit that referenced this pull request Jan 19, 2023
This PR follows up on #5382 and #5391, which rebuilt the egraph-based optimization framework to be more performant, by enabling it by default.

Based on performance results in #5382 (my measurements on SpiderMonkey and bjorn3's independent confirmation with cg_clif), it seems that this is reasonable to enable. Now that we have been fuzzing compiler configurations with egraph opts (#5388) for 6 weeks, having fixed a few fuzzbugs that came up (#5409, #5420, #5438) and subsequently received no further reports from OSS-Fuzz, I believe it is stable enough to rely on.

This PR enables `use_egraphs`, and also normalizes its meaning: previously it forced optimization (it basically meant "turn on the egraph optimization machinery"), now it runs egraph opts if the opt level indicates (it means "use egraphs to optimize if we are going to optimize"). The conditionals in the top-level pass driver are a little subtle, but will get simpler once we can remove the non-egraph path (which we plan to do eventually!).

Fixes #5181.
jameysharp added a commit to jameysharp/wasmtime that referenced this pull request Jun 9, 2023
In bytecodealliance#5382 ("egraph support: rewrite to work in terms of CLIF data
structures"), we added the `trace-log` feature to the set of default
features for `cranelift-codegen`. I think this was an accident, probably
added while debugging and overlooked when cleaning up to merge.

So let's undo that change. Fixes bytecodealliance#6548.
jameysharp added a commit that referenced this pull request Jun 9, 2023
In #5382 ("egraph support: rewrite to work in terms of CLIF data
structures"), we added the `trace-log` feature to the set of default
features for `cranelift-codegen`. I think this was an accident, probably
added while debugging and overlooked when cleaning up to merge.

So let's undo that change. Fixes #6548.
jameysharp added a commit to jameysharp/wasmtime that referenced this pull request Jun 9, 2023
…e#6549)

In bytecodealliance#5382 ("egraph support: rewrite to work in terms of CLIF data
structures"), we added the `trace-log` feature to the set of default
features for `cranelift-codegen`. I think this was an accident, probably
added while debugging and overlooked when cleaning up to merge.

So let's undo that change. Fixes bytecodealliance#6548.
jameysharp added a commit to jameysharp/wasmtime that referenced this pull request Jun 9, 2023
…e#6549)

In bytecodealliance#5382 ("egraph support: rewrite to work in terms of CLIF data
structures"), we added the `trace-log` feature to the set of default
features for `cranelift-codegen`. I think this was an accident, probably
added while debugging and overlooked when cleaning up to merge.

So let's undo that change. Fixes bytecodealliance#6548.
alexcrichton pushed a commit that referenced this pull request Jun 12, 2023
In #5382 ("egraph support: rewrite to work in terms of CLIF data
structures"), we added the `trace-log` feature to the set of default
features for `cranelift-codegen`. I think this was an accident, probably
added while debugging and overlooked when cleaning up to merge.

So let's undo that change. Fixes #6548.
jameysharp added a commit to jameysharp/wasmtime that referenced this pull request Jun 12, 2023
…e#6549)

In bytecodealliance#5382 ("egraph support: rewrite to work in terms of CLIF data
structures"), we added the `trace-log` feature to the set of default
features for `cranelift-codegen`. I think this was an accident, probably
added while debugging and overlooked when cleaning up to merge.

So let's undo that change. Fixes bytecodealliance#6548.
jameysharp added a commit that referenced this pull request Jun 12, 2023
In #5382 ("egraph support: rewrite to work in terms of CLIF data
structures"), we added the `trace-log` feature to the set of default
features for `cranelift-codegen`. I think this was an accident, probably
added while debugging and overlooked when cleaning up to merge.

So let's undo that change. Fixes #6548.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:meta Everything related to the meta-language. cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants