Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pathological rebuild performance #239

Closed
meithecatte opened this issue Feb 21, 2023 · 18 comments · Fixed by #253
Closed

Pathological rebuild performance #239

meithecatte opened this issue Feb 21, 2023 · 18 comments · Fixed by #253

Comments

@meithecatte
Copy link
Contributor

Consider an analysis that is overwhelmingly likely to propagate updates throughout the e-graph. In my use-case, I was computing AstSize in my Analysis, as I needed it for some heuristics. The below example models this by always returning DidMerge(true, true).

use egg::*;

// Runtime is extremely sensitive to this argument
const SIZE: usize = 14;

type EGraph = egg::EGraph<SymbolLang, MyAnalysis>;

struct MyAnalysis;

impl Analysis<SymbolLang> for MyAnalysis {
    type Data = ();

    fn make(_: &EGraph, _: &SymbolLang) {}
    fn merge(&mut self, _: &mut (), _: ()) -> DidMerge {
        DidMerge(true, true)
    }
}

fn wrap(egraph: &mut EGraph, mut v: Id) {
    for _ in 0..SIZE {
        let f = egraph.add(SymbolLang { op: "f".into(), children: vec![v] });
        let g = egraph.add(SymbolLang { op: "g".into(), children: vec![v] });
        egraph.union(f, g);
        v = f;
    }
}

fn main() {
    env_logger::init();

    let mut egraph = EGraph::new(MyAnalysis);
    let x = egraph.add(SymbolLang { op: "x".into(), children: vec![] });
    let y = egraph.add(SymbolLang { op: "y".into(), children: vec![] });
    wrap(&mut egraph, x);
    wrap(&mut egraph, y);
    egraph.rebuild();

    egraph.union(x, y);
    egraph.rebuild();
}

Running this with RUST_LOG=egg=info cargo r --release prints:

[2023-02-21T05:20:57Z INFO  egg::egraph] REBUILT! in 0.000s
      Old: hc size 58, eclasses: 30
      New: hc size 58, eclasses: 30
      unions: 0, trimmed nodes: 0
[2023-02-21T05:21:15Z INFO  egg::egraph] REBUILT! in 18.454s
      Old: hc size 58, eclasses: 29
      New: hc size 58, eclasses: 15
      unions: 14, trimmed nodes: 28

The issue is that process_unions iterates through analysis_pending in a random order, instead of bottom-up. Performing the latter accurately might be non-trivial, but while I haven't thought through this fully, I believe a very good approximation, that would recover the optimal asymptotic complexity, is to count how many times the analysis data of each e-node has been handled during the current rebuild, and in each iteration pick the one where this count is lowest (with something like BinaryHeap).

@meithecatte
Copy link
Contributor Author

Hmm, I think that the counting solution won't be optimal - it'll probably avoid the exponential behavior, but in some cases it'll still be quadratic in the worst case.

@mwillsey
Copy link
Member

I'm happy to take a fix here if you have one. Is this something you ran into in practice?

@meithecatte
Copy link
Contributor Author

Yes, I ran into this after I tried including the minimum AstSize in my analysis. Most stuff worked fine, and then it encountered an e-graph that seemed to have hung during rebuilding. At first I thought I screwed up how another part of the analysis handles cycles.

@mwillsey
Copy link
Member

Ah, interesting. Good catch.

I won't have time to get to this for a while, but I'm happy to merge a fix if you happen to make one.

@meithecatte
Copy link
Contributor Author

I'm not sure what's the best approach here. One option I'm considering is to compute a topological sort of the e-graph at some point during the rebuild, and use that to guide the process of rebuilding the analysis data (picking the e-class that comes first in the topological order).

This ordering might become stale after the rebuild continues, but it should still be useful as a heuristic.

The question, then, is when to compute the toposort. One option would be the first time we reach the analysis update loop, but if we end up updating 3 nodes, that's obviously inefficient. So perhaps maintain a HashMap that remembers how many times we've updated the analysis for a node, and once it passes some small threshold, compute the toposort?

The issue then is to pick this threshold. The optimal choice will depend on how expensive the analysis is to compute, and thus will vary across applications. This makes me want to expose it as a tweakable knob, but on the other hand that is an entirely different game than avoiding rebuild times measured in days 🙃

I suppose we'll have to see how introducing this toposort affects rebuild times in the typical case.

@Bastacyclop
Copy link
Contributor

In my custom implementation of analyses, I used a queue instead of a stack for analysis_pending. Would that help here?

@Bastacyclop
Copy link
Contributor

I wrote a quick benchmark and it seems that my implementation using a queue does not have the pathological performance issue:

> cargo test --release -- --ignored compare_analyses
---- original analysis
first rebuild: 0.000s
second rebuild: 20.056s
---- custom analysis
first rebuild: 0.000s
second rebuild: 0.000s
analysis: 0.000s

The 'original analysis' case is the example from @meithecatte .
The 'custom analysis' case is the example where there is no analysis built into the e-graph. Instead, the analysis is performed from scratch using a custom queue-based implementation, after rebuilding. It's also possible to use the queue-based implementation during rebuilding (done in the Scala implementation mentioned above), and I think that it should perform the same.

@mwillsey
Copy link
Member

I think switching to a queue seems reasonable, I'd merge such a change. @meithecatte do you think it's possible to check if that works on your workload?

@meithecatte
Copy link
Contributor Author

I will have to find the relevant revision of my repo again, but check the following first: whereas my testcase iteratively expands ?a => (f ?a), (g ?a) and unions the two together, try ?a => (f ?a), (g (h ?a)).

@mwillsey
Copy link
Member

I don't quite understand those rules. What's the , doing? Are those multipatterns?

@meithecatte
Copy link
Contributor Author

I had something like this in mind:

fn wrap<A: Analysis<SymbolLang>>(egraph: &mut EGraph<SymbolLang, A>, mut v: Id) {
    for _ in 0..SIZE {
        let f = egraph.add(SymbolLang { op: "f".into(), children: vec![v] });
        let h = egraph.add(SymbolLang { op: "h".into(), children: vec![v] });
        let g = egraph.add(SymbolLang { op: "g".into(), children: vec![h] });
        egraph.union(f, g);
        v = f;
    }
}

I now had the opportunity to run this test, and it looks like this example has quadratic behavior, with SIZE = 1000 taking over a second.

This is an improvement over the current potentially-exponential behavior.

I'll think this through to see if there's a case with worse performance.

@meithecatte
Copy link
Contributor Author

I'd like to note, that if O(n²) performance, as above, is acceptable, then the following rebuilding algorithm can be proven to have this complexity in the worst case, in the case where there are no cycles:

Throughout the timespan of one rebuild, track the number of times we've recomputed the analysis for a particular node. At each step, pick from the set of nodes we know are "dirty" the node with the lowest value of this count. Ties can be broken as desired.

Proof sketch: since we're assuming that there are no cycles, consider the topological sort topo[1..=n], numbered in the direction where topo[1] is not the parent of any node. Note that once a prefix of the toposort doesn't have any elements in the dirty set, it'll stay that way. Inductively prove that the analysis of topo[i] will be computed at most i times, and that when this first happens for a particular i, all the other counters are also at most i.

If there are cycles, I don't think we can guarantee much, apart from termination with reasonable assumptions about the algebraic properties of the analysis itself. However from what I can see when thinking through an example with a cycle, it seems to me that both "least-recomputed" and "simple queue" handle this with similar performance characteristics.

Now that I think of it, though, it might be the case that the "simple queue" algorithm can also be proven to be worst-case quadratic with these assumptions. I'll have to think about this further.

@mwillsey
Copy link
Member

mwillsey commented Apr 20, 2023

Nice! So the solution is really just use a queue? @Bastacyclop (or @meithecatte) would you like to file a PR?

@Bastacyclop
Copy link
Contributor

Happy to make a simple PR towards the end of next week. It would then be interesting to know what kind of performance impact the change has on some big applications.

I've been using a simple implementation of a queue-ordered set (/ queue with unique elements) with O(1) push/pop but double memory footprint:

pub struct HashSetQueuePop<T> {
    map: HashSet<T>,
    queue: std::collections::VecDeque<T>,
}

Any idea if there exists a better implementation of this in the Rust ecosystem? Although with T = (L, Id) it's not such a big deal. Maybe one can customize the currently used IndexSet implementation to maintain queue ordering?

@mwillsey
Copy link
Member

mwillsey commented Apr 24, 2023

I tried a simpler solution in #250, could y'all check if it meets your needs?

Edit: whoops I said #240 instead of #250.

@yihozhang
Copy link
Contributor

I think there is a worst-case O(n) algorithm for e-class analysis, which is to first run the rebuilding to fixpoint without doing any e-class analysis, at the same time tracking what set of old e-classes each new e-class corresponds to. We can then use this information to do an amortized maintenance of the e-class analysis in one scan. This works since rebuilding does not require an up-to-date e-class analysis

@meithecatte
Copy link
Contributor Author

Do note that running the analysis is allowed to introduce new unions.

@yihozhang
Copy link
Contributor

Oh nevermind, this might not work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants