Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An extra bottom-up recursive extractor #9

Merged
merged 4 commits into from
Dec 14, 2023
Merged

Conversation

TrevorHansen
Copy link
Contributor

@TrevorHansen TrevorHansen commented Sep 8, 2023

Hi,

Before I saw #8 I made a a recursive extractor, too.

This code is much ugglier than #8, but is faster on some problems (for example acyclic graphs). On acyclic graphs it's able to extract in one pass through, without building the dependencies.

Taking the cumulative times for all three extractors on my machine:

Loaded 454 jsons.
...
Cumulative time for bottom-up: 3386ms
Cumulative time for bottom-up-analysis: 2483ms
Cumulative time for bottom-up-recursive: 977ms

bottom-up is the one currently in,
bottom-up-analysis is #8,
bottom-up-recursive is this PR.

Is there benefit in having all three, or should the two new ones be combined?

@mwillsey
Copy link
Member

Good question! @Bastacyclop what do you think? Offhand, I'm leaning toward fewer, more distinct algorithms, so unless we can characterize that these have different advantages, I'd prefer to have just one.

@Bastacyclop
Copy link
Contributor

Running the benchmarks with my plot.py version gives:

Loaded 675 jsons.
---- output/babble -- bottom-up results:
tree        mean: 383.4740
dag         mean: 202.1272
micros      mean: 4099.8324
tree   quantiles:    38.00, 201.00, 368.00, 531.00, 1138.00
dag    quantiles:    31.00, 124.50, 199.00, 266.50, 476.00
micros quantiles: 145.00, 1710.50, 3735.00, 5617.50, 16994.00
---- output/babble -- bottom-up-analysis results:
tree        mean: 383.4740
dag         mean: 202.1618
micros      mean: 9924.3410
tree   quantiles:    38.00, 201.00, 368.00, 531.00, 1138.00
dag    quantiles:    31.00, 124.50, 199.00, 266.50, 476.00
micros quantiles: 293.00, 3834.50, 9014.00, 13213.50, 34602.00
---- output/babble -- bottom-up-recursive results:
tree        mean: 383.4740
dag         mean: 202.1272
micros      mean: 6283.2775
tree   quantiles:    38.00, 201.00, 368.00, 531.00, 1138.00
dag    quantiles:    31.00, 124.50, 199.00, 266.50, 476.00
micros quantiles: 173.00, 2303.50, 5119.00, 8275.50, 25213.00

---- output/egg -- bottom-up results:
tree        mean: 3.7143
dag         mean: 3.2857
micros      mean: 7137.1071
tree   quantiles:    1.00, 1.00, 3.00, 6.00, 13.00
dag    quantiles:    1.00, 1.00, 3.00, 5.00, 13.00
micros quantiles: 12.00, 20.50, 57.00, 463.25, 148309.00
---- output/egg -- bottom-up-analysis results:
tree        mean: 3.7143
dag         mean: 3.2857
micros      mean: 16172.1786
tree   quantiles:    1.00, 1.00, 3.00, 6.00, 13.00
dag    quantiles:    1.00, 1.00, 3.00, 5.00, 13.00
micros quantiles: 19.00, 50.25, 88.00, 861.75, 317778.00
---- output/egg -- bottom-up-recursive results:
tree        mean: 3.7143
dag         mean: 3.2857
micros      mean: 9666.6786
tree   quantiles:    1.00, 1.00, 3.00, 6.00, 13.00
dag    quantiles:    1.00, 1.00, 3.00, 5.00, 13.00
micros quantiles: 13.00, 29.00, 78.50, 693.75, 183555.00

---- output/flexc -- bottom-up results:
tree        mean: 322.0000
dag         mean: 85.0000
micros      mean: 50959.7857
tree   quantiles:    89.00, 92.00, 404.50, 546.00, 546.00
dag    quantiles:    35.00, 37.00, 91.50, 137.00, 137.00
micros quantiles: 30125.00, 33549.75, 54544.50, 58975.50, 102810.00
---- output/flexc -- bottom-up-analysis results:
tree        mean: 322.0000
dag         mean: 84.4286
micros      mean: 43439.8571
tree   quantiles:    89.00, 92.00, 404.50, 546.00, 546.00
dag    quantiles:    35.00, 36.00, 91.50, 136.00, 137.00
micros quantiles: 28394.00, 31866.75, 44750.00, 49550.25, 79446.00
---- output/flexc -- bottom-up-recursive results:
tree        mean: 322.0000
dag         mean: 85.0000
micros      mean: 36553.7857
tree   quantiles:    89.00, 92.00, 404.50, 546.00, 546.00
dag    quantiles:    35.00, 37.00, 91.50, 137.00, 137.00
micros quantiles: 26775.00, 30454.50, 35102.00, 38385.25, 73826.00

---- output/tensat -- bottom-up results:
tree        mean: 1858437671761.2795
dag         mean: 5.9862
micros      mean: 737891.6000
tree   quantiles:    2.69, 4.32, 966.23, 2320915813054.39, 9300713475693.02
dag    quantiles:    0.82, 0.94, 4.41, 8.35, 18.81
micros quantiles: 869.00, 22696.25, 230507.50, 1727811.50, 2385394.00
---- output/tensat -- bottom-up-analysis results:
tree        mean: 1858437671761.2795
dag         mean: 5.9793
micros      mean: 584151.0000
tree   quantiles:    2.69, 4.32, 966.23, 2320915813054.39, 9300713475693.02
dag    quantiles:    0.82, 0.94, 4.41, 8.34, 18.80
micros quantiles: 495.00, 18215.00, 143353.50, 1247708.75, 2311130.00
---- output/tensat -- bottom-up-recursive results:
tree        mean: 1858437671761.2795
dag         mean: 5.9916
micros      mean: 69755.5000
tree   quantiles:    2.69, 4.32, 966.23, 2320915813054.39, 9300713475693.02
dag    quantiles:    0.82, 0.94, 4.41, 8.36, 18.84
micros quantiles: 168.00, 10949.75, 54289.00, 69088.25, 349218.00
  1. tree costs are basically the same, as expected.
  2. bottom up recursive is faster than bottom up analysis for all datasets, in particular for tensat: is this only from skipping building dependencies or is there more going on?

I feel like we should be able to combine the three bottom up versions, with some more experiments and code cleanup. Maybe by (1) doing a first pass without building dependencies; and (2) if dependencies are used, doing a bottom up analysis based on unique queues. The question would be whether using unique queues for the second stage brings performance benefits or not (I think it should: egraphs-good/egg#239). To properly evaluate that I would like to see datasests with more costly to compute, child-dependent cost functions.

@Bastacyclop
Copy link
Contributor

PS: one thing to consider is that in egg the dependencies don't need to be computed as they are already stored in the e-graph.

@mwillsey
Copy link
Member

mwillsey commented Nov 2, 2023

I would like to consider computation of dependencies (parents) as somewhat negligible, as its only linear and as @Bastacyclop says it's already there in many contexts.

I'd like to preserve the "dumb" bottom up extractor as a base case. Ideally, we could consolidate the "smarter" bottom-up extractors into one (i.e., those that do not aim to do cost sharing, as a possible definition). Thoughts on that?

@oflatt
Copy link
Member

oflatt commented Dec 13, 2023

Before I read this closely I also worked on a bottom-up extractor (#20)
Sorry for duplicated work! I would also be happy with consolidating this, #20, and #8 if possible.

@TrevorHansen
Copy link
Contributor Author

With the recent changes to the bottom-up extractor(#20), there's now only a small time advantage for the extractor that I proposed introducing (Note these times differ from before because extra problems have been added). Currently:
Cumulative time for faster-bottom-up: 2060ms [The one in #20]
Cumulative time for bottom-up-recursive: 1533ms [The one in this PR].
Cumulative time for bottom-up: 4471ms

Meaning that the extractor in this PR is only about 25% faster than the others, but is much uglier.

However, there are some tweaks to the "faster-bottom-up" extractor which brings down its runtime to almost the same as the extractor in this PR:

###################################################
faster-bottom-up vs faster-bottom-up-old


extractors: ['faster-bottom-up', 'faster-bottom-up-old']
cumulative time for faster-bottom-up: 1649ms
cumulative time for faster-bottom-up-old: 2060ms
cumulative tree cost for faster-bottom-up: 18584377265237
cumulative tree cost for faster-bottom-up-old: 18584377265237
cumulative dag cost for faster-bottom-up: 78037
cumulative dag cost for faster-bottom-up-old: 78037
Cumulative time for faster-bottom-up: 1649ms
Cumulative time for faster-bottom-up-old: 2060ms
faster-bottom-up / faster-bottom-up-old
geo mean
tree: 1.0000
dag: 1.0000
micros: 0.8184
quantiles
tree:   1.0000, 1.0000, 1.0000, 1.0000, 1.0000
dag:    1.0000, 1.0000, 1.0000, 1.0000, 1.0000
micros: 0.3611, 0.7825, 0.8227, 0.8611, 1.8333

So I've changed this PR to now just introduce some small speedups to the faster-bottom-up extractor, as well as fixing up attribution to @Bastacyclop.

Copy link
Member

@oflatt oflatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Thanks for cleaning this up and summarizing the results.

@oflatt oflatt merged commit 8b33f88 into egraphs-good:main Dec 14, 2023
2 checks passed
@Bastacyclop
Copy link
Contributor

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants