Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(selectors): remove janky Predicate class and unify Selectors under a single interface #9917

Merged
merged 9 commits into from
Aug 27, 2024

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Aug 25, 2024

Description of changes

This is a refactor of selectors internals.

The disparity between the Predicate class and the Selector has always
bothered me, because not every selector can be easily made into a predicate.

And, for the ones that can, there are some often-used ones that are really
awkward to express as predicates, like the c selector, which expressed as
a predicate led to performance problems and IMO unnecessarily complex solutions
to that performance problem.

I thought to myself, "What are selectors doing fundamentally?"

And the answer I came up with is that they are producing a sequence of column
names that can be used to subset columns in a table.

Set operations ended up being a much more natural way to model the problem, while
allowing subclasses of Selector to customize operations for whatever reason
e.g., avoiding any overhead if there's a fast path (selecting no columns) or if
overriding is necessary for the functionality (e.g., And).

So, I turned Predicate into Where, and rewrote a bunch of the selectors
that were previously based on where into their own specialized classes, and
boiled everything down to expanding the names of a selector based on an input
table, and rewrote the top-level expand method in terms of name expansion.

Previously we were also incurring the overhead of table column selection in
every case, and for a good chunk of selectors, we can avoid that overhead by
computing directly on the table's columns.

@cpcloud cpcloud added this to the 9.4 milestone Aug 25, 2024
@cpcloud cpcloud added internals Issues or PRs related to ibis's internal APIs refactor Issues or PRs related to refactoring the codebase labels Aug 25, 2024
@cpcloud cpcloud force-pushed the cleanup-selectors branch 4 times, most recently from 6bd0a97 to fde885c Compare August 25, 2024 18:51
@cpcloud
Copy link
Member Author

cpcloud commented Aug 25, 2024

Since we benchmark selectors for the wide table case, I ran them, and there's a consistent 10% improvement.

Nothing to write home about, but I'll take the win.

-------------------------------------------------------------------------------- benchmark 'test_selectors[10000]': 2 tests --------------------------------------------------------------------------------
Name (time in ms)                             Min                 Max                Mean            StdDev              Median                IQR            Outliers     OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_selectors[10000] (0001_949fbea)     919.4184 (1.10)     937.7005 (1.11)     928.4652 (1.10)     8.3591 (2.51)     927.5156 (1.10)     15.7539 (2.96)          2;0  1.0770 (0.91)          5           1
test_selectors[10000] (NOW)              837.6641 (1.0)      845.8418 (1.0)      841.4304 (1.0)      3.3308 (1.0)      840.3802 (1.0)       5.3216 (1.0)           2;0  1.1885 (1.0)           5           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------- benchmark 'test_selectors[1000]': 2 tests -------------------------------------------------------------------------------
Name (time in ms)                           Min                 Max               Mean             StdDev             Median               IQR            Outliers      OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_selectors[1000] (0001_949fbea)     86.8199 (1.13)      87.4022 (1.0)      87.0923 (1.05)      0.1968 (1.0)      87.0289 (1.12)     0.3323 (1.0)           3;0  11.4821 (0.95)          9           1
test_selectors[1000] (NOW)              76.9421 (1.0)      110.7883 (1.27)     82.6303 (1.0)      12.3097 (62.55)    77.4166 (1.0)      1.0610 (3.19)          2;2  12.1021 (1.0)          13           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean
========================================================================================= 2 passed in 9.53s ==========================================================================================

@cpcloud cpcloud force-pushed the cleanup-selectors branch 2 times, most recently from 5478f1f to ce9d31d Compare August 25, 2024 19:04
ibis/selectors.py Outdated Show resolved Hide resolved
@cpcloud cpcloud added the ci-run-cloud Add this label to trigger a run of BigQuery, Snowflake, and Databricks backends in CI label Aug 25, 2024
@ibis-docs-bot ibis-docs-bot bot removed the ci-run-cloud Add this label to trigger a run of BigQuery, Snowflake, and Databricks backends in CI label Aug 25, 2024
@cpcloud cpcloud force-pushed the cleanup-selectors branch 3 times, most recently from a170ef9 to 2fd8f0c Compare August 25, 2024 21:11
@cpcloud cpcloud requested a review from jcrist August 26, 2024 13:16
Copy link
Member

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks like a nice improvement!

ibis/common/selectors.py Show resolved Hide resolved
ibis/common/selectors.py Outdated Show resolved Hide resolved
ibis/common/selectors.py Show resolved Hide resolved
ibis/common/selectors.py Show resolved Hide resolved
ibis/selectors.py Show resolved Hide resolved
@cpcloud cpcloud force-pushed the cleanup-selectors branch from e1dcfe3 to c5b8e3a Compare August 26, 2024 15:05
@cpcloud cpcloud requested a review from jcrist August 26, 2024 18:40
@cpcloud cpcloud merged commit c15a229 into ibis-project:main Aug 27, 2024
81 checks passed
@cpcloud cpcloud deleted the cleanup-selectors branch August 27, 2024 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internals Issues or PRs related to ibis's internal APIs refactor Issues or PRs related to refactoring the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants