-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pathological rebuild performance #239
Comments
Hmm, I think that the counting solution won't be optimal - it'll probably avoid the exponential behavior, but in some cases it'll still be quadratic in the worst case. |
I'm happy to take a fix here if you have one. Is this something you ran into in practice? |
Yes, I ran into this after I tried including the minimum AstSize in my analysis. Most stuff worked fine, and then it encountered an e-graph that seemed to have hung during rebuilding. At first I thought I screwed up how another part of the analysis handles cycles. |
Ah, interesting. Good catch. I won't have time to get to this for a while, but I'm happy to merge a fix if you happen to make one. |
I'm not sure what's the best approach here. One option I'm considering is to compute a topological sort of the e-graph at some point during the rebuild, and use that to guide the process of rebuilding the analysis data (picking the e-class that comes first in the topological order). This ordering might become stale after the rebuild continues, but it should still be useful as a heuristic. The question, then, is when to compute the toposort. One option would be the first time we reach the analysis update loop, but if we end up updating 3 nodes, that's obviously inefficient. So perhaps maintain a The issue then is to pick this threshold. The optimal choice will depend on how expensive the analysis is to compute, and thus will vary across applications. This makes me want to expose it as a tweakable knob, but on the other hand that is an entirely different game than avoiding rebuild times measured in days 🙃 I suppose we'll have to see how introducing this toposort affects rebuild times in the typical case. |
In my custom implementation of analyses, I used a queue instead of a stack for
|
I wrote a quick benchmark and it seems that my implementation using a queue does not have the pathological performance issue:
The 'original analysis' case is the example from @meithecatte . |
I think switching to a queue seems reasonable, I'd merge such a change. @meithecatte do you think it's possible to check if that works on your workload? |
I will have to find the relevant revision of my repo again, but check the following first: whereas my testcase iteratively expands |
I don't quite understand those rules. What's the |
I had something like this in mind: fn wrap<A: Analysis<SymbolLang>>(egraph: &mut EGraph<SymbolLang, A>, mut v: Id) {
for _ in 0..SIZE {
let f = egraph.add(SymbolLang { op: "f".into(), children: vec![v] });
let h = egraph.add(SymbolLang { op: "h".into(), children: vec![v] });
let g = egraph.add(SymbolLang { op: "g".into(), children: vec![h] });
egraph.union(f, g);
v = f;
}
} I now had the opportunity to run this test, and it looks like this example has quadratic behavior, with This is an improvement over the current potentially-exponential behavior. I'll think this through to see if there's a case with worse performance. |
I'd like to note, that if O(n²) performance, as above, is acceptable, then the following rebuilding algorithm can be proven to have this complexity in the worst case, in the case where there are no cycles:
Proof sketch: since we're assuming that there are no cycles, consider the topological sort If there are cycles, I don't think we can guarantee much, apart from termination with reasonable assumptions about the algebraic properties of the analysis itself. However from what I can see when thinking through an example with a cycle, it seems to me that both "least-recomputed" and "simple queue" handle this with similar performance characteristics. Now that I think of it, though, it might be the case that the "simple queue" algorithm can also be proven to be worst-case quadratic with these assumptions. I'll have to think about this further. |
Nice! So the solution is really just use a queue? @Bastacyclop (or @meithecatte) would you like to file a PR? |
Happy to make a simple PR towards the end of next week. It would then be interesting to know what kind of performance impact the change has on some big applications. I've been using a simple implementation of a queue-ordered set (/ queue with unique elements) with O(1) push/pop but double memory footprint: pub struct HashSetQueuePop<T> {
map: HashSet<T>,
queue: std::collections::VecDeque<T>,
} Any idea if there exists a better implementation of this in the Rust ecosystem? Although with |
I think there is a worst-case O(n) algorithm for e-class analysis, which is to first run the rebuilding to fixpoint without doing any e-class analysis, at the same time tracking what set of old e-classes each new e-class corresponds to. We can then use this information to do an amortized maintenance of the e-class analysis in one scan. This works since rebuilding does not require an up-to-date e-class analysis |
Do note that running the analysis is allowed to introduce new unions. |
Oh nevermind, this might not work |
Consider an analysis that is overwhelmingly likely to propagate updates throughout the e-graph. In my use-case, I was computing
AstSize
in myAnalysis
, as I needed it for some heuristics. The below example models this by always returningDidMerge(true, true)
.Running this with
RUST_LOG=egg=info cargo r --release
prints:The issue is that
process_unions
iterates throughanalysis_pending
in a random order, instead of bottom-up. Performing the latter accurately might be non-trivial, but while I haven't thought through this fully, I believe a very good approximation, that would recover the optimal asymptotic complexity, is to count how many times the analysis data of each e-node has been handled during the current rebuild, and in each iteration pick the one where this count is lowest (with something likeBinaryHeap
).The text was updated successfully, but these errors were encountered: