Scalability: incorporate early pruning optimizations #368

dorisjlee · 2021-04-27T20:35:41Z

Overview

Incorporating early pruning optimizations described in our recent paper to speed up cases where the visualization search space is large (e.g., VisList contains more than top K visualizations and the dataframe has a large number of rows).

Changes

Adding in execute_approx_sample for approximating samples for early pruning
Apply early pruning based on empirically-tested width and length criteria
Added tests and warning messages when early pruning conditions are met

Others:

Increasing the sampling start and cap for overall sampling
Adding config flag to turn lazy maintenance on/off
Added cached method for df.unique()

Example Output

Here are some results from preliminary experiments on the 1M Airbnb dataset. We duplicate a single visualization N number of times and measuring the performance. The time is the sum of the time it takes to go through the search space and the time that it takes to recompute the top K visualization upon display of the VisList. The significant speedup in the former time supersedes extra time it takes for recompute, around 17, typically no more than a few more than 15 (which k=15).

Zooming out, the speedup can be significant when searching through over 100+ vis:

* added flag for turning on/off lazy maintain optimization

…xperiment results; Add warning message and test for early pruning

* fixed sampling config test * improved Executor documentation

codecov · 2021-04-28T15:02:16Z

Codecov Report

Merging #368 (1cf6439) into master (1dbbcb9) will increase coverage by 0.20%.
The diff coverage is 97.93%.

@@            Coverage Diff             @@
##           master     #368      +/-   ##
==========================================
+ Coverage   84.46%   84.67%   +0.20%     
==========================================
  Files          51       51              
  Lines        3902     3948      +46     
==========================================
+ Hits         3296     3343      +47     
+ Misses        606      605       -1

Impacted Files	Coverage Δ
lux/executor/Executor.py	`80.85% <88.88%> (+1.36%)`	⬆️
lux/executor/PandasExecutor.py	`95.93% <95.00%> (-0.18%)`	⬇️
lux/_config/config.py	`86.86% <100.00%> (+0.60%)`	⬆️
lux/core/frame.py	`82.06% <100.00%> (+0.32%)`	⬆️
lux/core/series.py	`55.55% <100.00%> (+1.70%)`	⬆️
lux/executor/SQLExecutor.py	`84.45% <100.00%> (ø)`
lux/interestingness/interestingness.py	`90.90% <100.00%> (+2.95%)`	⬆️
lux/processor/Compiler.py	`98.01% <100.00%> (+0.04%)`	⬆️
lux/vis/Vis.py	`75.73% <100.00%> (+0.14%)`	⬆️
lux/vis/VisList.py	`52.84% <100.00%> (+1.51%)`	⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1dbbcb9...1cf6439. Read the comment docs.

thyneb19

Hey Doris the changes overall look good, also thanks for doing the quick fix for the SQLExecutor. I think there is more work that needs to be done to make the pruning actually compatible with the SQLExecutor case, but maybe we can merge in the PandasExecutor implementation first.

dorisjlee · 2021-04-28T19:14:46Z

Thanks @thyneb19! There definitely needs to be more work to support pruning in SQLExecutor in the future, which would require adding an implementation for execute_approx_sample. Will merge this in for now!

…ales (#262) * Add support to improve temporal action to display different timescales * Resolve PR comments * Add support to improve temporal action to display different timescales * Resolve PR comments * Reformat files using black * "All-column" vis when only few columns in dataframe #199 (#336) Co-authored-by: Caitlyn Chen <caitlynachen@berkeley.edu> Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * documentation and cleaning * added notebook gallery * update README * removed scatterplot message in SQLExecutor * fixed typo in SQL documentation * update README and bump version * bump version * clear propagated vis data intent after PandasExecutor completes execute (#297) * fix black to stable version * Scalability: incorporate early pruning optimizations (#368) * changes from perf branch to config * added flag for turning on/off lazy maintain optimization * merged in approx early pruning code * increase overall sampling start and cap * Adjust width and length criteria for early pruning vislist based on experiment results; Add warning message and test for early pruning * black version update * version lock on black * * fixed sql tests (added approx to execute constructor) * fixed sampling config test * improved Executor documentation * timescale feature * adding weekday * adding docs * bugfix for y axis line chart export * fixing temporal axis by adding timescale variable in Clause Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> Co-authored-by: Caitlyn Chen <caitlynachen@gmail.com> Co-authored-by: Caitlyn Chen <caitlynachen@berkeley.edu>

dorisjlee added 4 commits April 26, 2021 10:32

changes from perf branch to config

5797898

* added flag for turning on/off lazy maintain optimization

merged in approx early pruning code

33591df

increase overall sampling start and cap

a9ef4a2

Adjust width and length criteria for early pruning vislist based on e…

93249fa

…xperiment results; Add warning message and test for early pruning

dorisjlee requested a review from thyneb19 April 27, 2021 20:35

dorisjlee added 3 commits April 27, 2021 17:45

black version update

9f240f6

version lock on black

f8ab446

* fixed sql tests (added approx to execute constructor)

1cf6439

* fixed sampling config test * improved Executor documentation

thyneb19 approved these changes Apr 28, 2021

View reviewed changes

dorisjlee merged commit a0cb921 into lux-org:master Apr 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scalability: incorporate early pruning optimizations #368

Scalability: incorporate early pruning optimizations #368

dorisjlee commented Apr 27, 2021

codecov bot commented Apr 28, 2021

thyneb19 left a comment

dorisjlee commented Apr 28, 2021

Scalability: incorporate early pruning optimizations #368

Scalability: incorporate early pruning optimizations #368

Conversation

dorisjlee commented Apr 27, 2021

Overview

Changes

Example Output

codecov bot commented Apr 28, 2021

Codecov Report

thyneb19 left a comment

Choose a reason for hiding this comment

dorisjlee commented Apr 28, 2021