-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scalability: incorporate early pruning optimizations #368
Conversation
* added flag for turning on/off lazy maintain optimization
…xperiment results; Add warning message and test for early pruning
* fixed sampling config test * improved Executor documentation
Codecov Report
@@ Coverage Diff @@
## master #368 +/- ##
==========================================
+ Coverage 84.46% 84.67% +0.20%
==========================================
Files 51 51
Lines 3902 3948 +46
==========================================
+ Hits 3296 3343 +47
+ Misses 606 605 -1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Doris the changes overall look good, also thanks for doing the quick fix for the SQLExecutor. I think there is more work that needs to be done to make the pruning actually compatible with the SQLExecutor case, but maybe we can merge in the PandasExecutor implementation first.
Thanks @thyneb19! There definitely needs to be more work to support pruning in SQLExecutor in the future, which would require adding an implementation for |
…ales (#262) * Add support to improve temporal action to display different timescales * Resolve PR comments * Add support to improve temporal action to display different timescales * Resolve PR comments * Reformat files using black * "All-column" vis when only few columns in dataframe #199 (#336) Co-authored-by: Caitlyn Chen <caitlynachen@berkeley.edu> Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * documentation and cleaning * added notebook gallery * update README * removed scatterplot message in SQLExecutor * fixed typo in SQL documentation * update README and bump version * bump version * clear propagated vis data intent after PandasExecutor completes execute (#297) * fix black to stable version * Scalability: incorporate early pruning optimizations (#368) * changes from perf branch to config * added flag for turning on/off lazy maintain optimization * merged in approx early pruning code * increase overall sampling start and cap * Adjust width and length criteria for early pruning vislist based on experiment results; Add warning message and test for early pruning * black version update * version lock on black * * fixed sql tests (added approx to execute constructor) * fixed sampling config test * improved Executor documentation * timescale feature * adding weekday * adding docs * bugfix for y axis line chart export * fixing temporal axis by adding timescale variable in Clause Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> Co-authored-by: Caitlyn Chen <caitlynachen@gmail.com> Co-authored-by: Caitlyn Chen <caitlynachen@berkeley.edu>
Overview
Incorporating early pruning optimizations described in our recent paper to speed up cases where the visualization search space is large (e.g., VisList contains more than top K visualizations and the dataframe has a large number of rows).
Changes
execute_approx_sample
for approximating samples for early pruningOthers:
df.unique()
Example Output
Here are some results from preliminary experiments on the 1M Airbnb dataset. We duplicate a single visualization N number of times and measuring the performance. The time is the sum of the time it takes to go through the search space and the time that it takes to recompute the top K visualization upon display of the VisList. The significant speedup in the former time supersedes extra time it takes for recompute, around 17, typically no more than a few more than 15 (which k=15).
Zooming out, the speedup can be significant when searching through over 100+ vis: