Pathological rebuild performance #239

meithecatte · 2023-02-21T05:30:58Z

Consider an analysis that is overwhelmingly likely to propagate updates throughout the e-graph. In my use-case, I was computing AstSize in my Analysis, as I needed it for some heuristics. The below example models this by always returning DidMerge(true, true).

use egg::*;

// Runtime is extremely sensitive to this argument
const SIZE: usize = 14;

type EGraph = egg::EGraph<SymbolLang, MyAnalysis>;

struct MyAnalysis;

impl Analysis<SymbolLang> for MyAnalysis {
    type Data = ();

    fn make(_: &EGraph, _: &SymbolLang) {}
    fn merge(&mut self, _: &mut (), _: ()) -> DidMerge {
        DidMerge(true, true)
    }
}

fn wrap(egraph: &mut EGraph, mut v: Id) {
    for _ in 0..SIZE {
        let f = egraph.add(SymbolLang { op: "f".into(), children: vec![v] });
        let g = egraph.add(SymbolLang { op: "g".into(), children: vec![v] });
        egraph.union(f, g);
        v = f;
    }
}

fn main() {
    env_logger::init();

    let mut egraph = EGraph::new(MyAnalysis);
    let x = egraph.add(SymbolLang { op: "x".into(), children: vec![] });
    let y = egraph.add(SymbolLang { op: "y".into(), children: vec![] });
    wrap(&mut egraph, x);
    wrap(&mut egraph, y);
    egraph.rebuild();

    egraph.union(x, y);
    egraph.rebuild();
}

Running this with RUST_LOG=egg=info cargo r --release prints:

[2023-02-21T05:20:57Z INFO  egg::egraph] REBUILT! in 0.000s
      Old: hc size 58, eclasses: 30
      New: hc size 58, eclasses: 30
      unions: 0, trimmed nodes: 0
[2023-02-21T05:21:15Z INFO  egg::egraph] REBUILT! in 18.454s
      Old: hc size 58, eclasses: 29
      New: hc size 58, eclasses: 15
      unions: 14, trimmed nodes: 28

The issue is that process_unions iterates through analysis_pending in a random order, instead of bottom-up. Performing the latter accurately might be non-trivial, but while I haven't thought through this fully, I believe a very good approximation, that would recover the optimal asymptotic complexity, is to count how many times the analysis data of each e-node has been handled during the current rebuild, and in each iteration pick the one where this count is lowest (with something like BinaryHeap).

The text was updated successfully, but these errors were encountered:

meithecatte · 2023-02-21T05:48:55Z

Hmm, I think that the counting solution won't be optimal - it'll probably avoid the exponential behavior, but in some cases it'll still be quadratic in the worst case.

mwillsey · 2023-02-21T17:27:03Z

I'm happy to take a fix here if you have one. Is this something you ran into in practice?

meithecatte · 2023-02-22T07:31:23Z

Yes, I ran into this after I tried including the minimum AstSize in my analysis. Most stuff worked fine, and then it encountered an e-graph that seemed to have hung during rebuilding. At first I thought I screwed up how another part of the analysis handles cycles.

mwillsey · 2023-02-23T19:41:30Z

Ah, interesting. Good catch.

I won't have time to get to this for a while, but I'm happy to merge a fix if you happen to make one.

meithecatte · 2023-02-23T20:02:11Z

I'm not sure what's the best approach here. One option I'm considering is to compute a topological sort of the e-graph at some point during the rebuild, and use that to guide the process of rebuilding the analysis data (picking the e-class that comes first in the topological order).

This ordering might become stale after the rebuild continues, but it should still be useful as a heuristic.

The question, then, is when to compute the toposort. One option would be the first time we reach the analysis update loop, but if we end up updating 3 nodes, that's obviously inefficient. So perhaps maintain a HashMap that remembers how many times we've updated the analysis for a node, and once it passes some small threshold, compute the toposort?

The issue then is to pick this threshold. The optimal choice will depend on how expensive the analysis is to compute, and thus will vary across applications. This makes me want to expose it as a tweakable knob, but on the other hand that is an entirely different game than avoiding rebuild times measured in days 🙃

I suppose we'll have to see how introducing this toposort affects rebuild times in the typical case.

Bastacyclop · 2023-04-10T11:08:12Z

In my custom implementation of analyses, I used a queue instead of a stack for analysis_pending. Would that help here?

I have a Rust version used in my egg-sketches library https://github.com/Bastacyclop/egg-sketches/blob/main/src/analysis.rs ;
and a Scala version used in the Rise language research project https://github.com/rise-lang/shine/blob/main/src/main/scala/rise/eqsat/Analysis.scala

Bastacyclop · 2023-04-10T15:12:09Z

I wrote a quick benchmark and it seems that my implementation using a queue does not have the pathological performance issue:

> cargo test --release -- --ignored compare_analyses
---- original analysis
first rebuild: 0.000s
second rebuild: 20.056s
---- custom analysis
first rebuild: 0.000s
second rebuild: 0.000s
analysis: 0.000s

The 'original analysis' case is the example from @meithecatte .
The 'custom analysis' case is the example where there is no analysis built into the e-graph. Instead, the analysis is performed from scratch using a custom queue-based implementation, after rebuilding. It's also possible to use the queue-based implementation during rebuilding (done in the Scala implementation mentioned above), and I think that it should perform the same.

mwillsey · 2023-04-10T21:24:51Z

I think switching to a queue seems reasonable, I'd merge such a change. @meithecatte do you think it's possible to check if that works on your workload?

meithecatte · 2023-04-10T21:38:17Z

I will have to find the relevant revision of my repo again, but check the following first: whereas my testcase iteratively expands ?a => (f ?a), (g ?a) and unions the two together, try ?a => (f ?a), (g (h ?a)).

mwillsey · 2023-04-11T16:55:34Z

I don't quite understand those rules. What's the , doing? Are those multipatterns?

meithecatte · 2023-04-11T17:07:54Z

I had something like this in mind:

fn wrap<A: Analysis<SymbolLang>>(egraph: &mut EGraph<SymbolLang, A>, mut v: Id) {
    for _ in 0..SIZE {
        let f = egraph.add(SymbolLang { op: "f".into(), children: vec![v] });
        let h = egraph.add(SymbolLang { op: "h".into(), children: vec![v] });
        let g = egraph.add(SymbolLang { op: "g".into(), children: vec![h] });
        egraph.union(f, g);
        v = f;
    }
}

I now had the opportunity to run this test, and it looks like this example has quadratic behavior, with SIZE = 1000 taking over a second.

This is an improvement over the current potentially-exponential behavior.

I'll think this through to see if there's a case with worse performance.

meithecatte · 2023-04-16T15:07:07Z

I'd like to note, that if O(n²) performance, as above, is acceptable, then the following rebuilding algorithm can be proven to have this complexity in the worst case, in the case where there are no cycles:

Throughout the timespan of one rebuild, track the number of times we've recomputed the analysis for a particular node. At each step, pick from the set of nodes we know are "dirty" the node with the lowest value of this count. Ties can be broken as desired.

Proof sketch: since we're assuming that there are no cycles, consider the topological sort topo[1..=n], numbered in the direction where topo[1] is not the parent of any node. Note that once a prefix of the toposort doesn't have any elements in the dirty set, it'll stay that way. Inductively prove that the analysis of topo[i] will be computed at most i times, and that when this first happens for a particular i, all the other counters are also at most i.

If there are cycles, I don't think we can guarantee much, apart from termination with reasonable assumptions about the algebraic properties of the analysis itself. However from what I can see when thinking through an example with a cycle, it seems to me that both "least-recomputed" and "simple queue" handle this with similar performance characteristics.

Now that I think of it, though, it might be the case that the "simple queue" algorithm can also be proven to be worst-case quadratic with these assumptions. I'll have to think about this further.

mwillsey · 2023-04-20T20:12:09Z

Nice! So the solution is really just use a queue? @Bastacyclop (or @meithecatte) would you like to file a PR?

Bastacyclop · 2023-04-21T08:04:27Z

Happy to make a simple PR towards the end of next week. It would then be interesting to know what kind of performance impact the change has on some big applications.

I've been using a simple implementation of a queue-ordered set (/ queue with unique elements) with O(1) push/pop but double memory footprint:

pub struct HashSetQueuePop<T> {
    map: HashSet<T>,
    queue: std::collections::VecDeque<T>,
}

Any idea if there exists a better implementation of this in the Rust ecosystem? Although with T = (L, Id) it's not such a big deal. Maybe one can customize the currently used IndexSet implementation to maintain queue ordering?

mwillsey · 2023-04-24T18:27:26Z

I tried a simpler solution in #250, could y'all check if it meets your needs?

Edit: whoops I said #240 instead of #250.

yihozhang · 2023-04-28T19:02:25Z

I think there is a worst-case O(n) algorithm for e-class analysis, which is to first run the rebuilding to fixpoint without doing any e-class analysis, at the same time tracking what set of old e-classes each new e-class corresponds to. We can then use this information to do an amortized maintenance of the e-class analysis in one scan. This works since rebuilding does not require an up-to-date e-class analysis

meithecatte · 2023-04-28T19:04:10Z

Do note that running the analysis is allowed to introduce new unions.

yihozhang · 2023-04-28T19:04:17Z

Oh nevermind, this might not work

mwillsey mentioned this issue Apr 24, 2023

Use a queue for pending_analysis #250

Closed

Bastacyclop mentioned this issue Apr 28, 2023

Use a unique queue for pending analyses #253

Merged

mwillsey closed this as completed in #253 May 2, 2023

Bastacyclop mentioned this issue Oct 25, 2023

An extra bottom-up recursive extractor egraphs-good/extraction-gym#9

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pathological rebuild performance #239

Pathological rebuild performance #239

meithecatte commented Feb 21, 2023

meithecatte commented Feb 21, 2023

mwillsey commented Feb 21, 2023

meithecatte commented Feb 22, 2023

mwillsey commented Feb 23, 2023

meithecatte commented Feb 23, 2023

Bastacyclop commented Apr 10, 2023

Bastacyclop commented Apr 10, 2023

mwillsey commented Apr 10, 2023

meithecatte commented Apr 10, 2023

mwillsey commented Apr 11, 2023

meithecatte commented Apr 11, 2023

meithecatte commented Apr 16, 2023

mwillsey commented Apr 20, 2023 •

edited

Loading

Bastacyclop commented Apr 21, 2023

mwillsey commented Apr 24, 2023 •

edited

Loading

yihozhang commented Apr 28, 2023

meithecatte commented Apr 28, 2023

yihozhang commented Apr 28, 2023

Pathological rebuild performance #239

Pathological rebuild performance #239

Comments

meithecatte commented Feb 21, 2023

meithecatte commented Feb 21, 2023

mwillsey commented Feb 21, 2023

meithecatte commented Feb 22, 2023

mwillsey commented Feb 23, 2023

meithecatte commented Feb 23, 2023

Bastacyclop commented Apr 10, 2023

Bastacyclop commented Apr 10, 2023

mwillsey commented Apr 10, 2023

meithecatte commented Apr 10, 2023

mwillsey commented Apr 11, 2023

meithecatte commented Apr 11, 2023

meithecatte commented Apr 16, 2023

mwillsey commented Apr 20, 2023 • edited Loading

Bastacyclop commented Apr 21, 2023

mwillsey commented Apr 24, 2023 • edited Loading

yihozhang commented Apr 28, 2023

meithecatte commented Apr 28, 2023

yihozhang commented Apr 28, 2023

mwillsey commented Apr 20, 2023 •

edited

Loading

mwillsey commented Apr 24, 2023 •

edited

Loading