-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add projection to FilterExec #12281
feat: Add projection to FilterExec #12281
Conversation
cc744be
to
9456c00
Compare
b0aacd6
to
80b2ab3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very nice 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you @eejbyfeldt. It seems like the with_projection()
API could end up being a method of the ExecutionPlan
in the long run, wdyt? (It would improve the performance of all operators which internally refer to some columns but its downstream operators do not actually require that column)
@@ -535,11 +538,27 @@ fn try_pushdown_through_union( | |||
Ok(Some(Arc::new(UnionExec::new(new_children)))) | |||
} | |||
|
|||
trait EmbededProjection: ExecutionPlan + Sized { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EmbededProjection -> EmbeddedProjection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 0c05fb9
@@ -82,6 +82,11 @@ impl ProjectionMapping { | |||
.map(|map| Self { map }) | |||
} | |||
|
|||
pub fn from_indices(indices: &[usize], schema: &SchemaRef) -> Result<Self> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A brief docstring would be helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 0c05fb9
@@ -191,6 +194,28 @@ impl Partitioning { | |||
_ => false, | |||
} | |||
} | |||
|
|||
pub fn project( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we write a docstring here also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 0c05fb9
80a879a
to
0c05fb9
Compare
Probably worth considering in the future. The argument against making it part of ExecutionPlan is that there are some operators like Addressed the comments. |
08)--------------AggregateExec: mode=Partial, gby=[v1@0 as v1, v2@1 as v2], aggr=[max(having_test.v1)] | ||
09)----------------MemoryExec: partitions=1, partition_sizes=[1] | ||
01)CoalesceBatchesExec: target_batch_size=8192 | ||
02)--FilterExec: max(having_test.v1)@2 = 3, projection=[v1@0, v2@1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very cool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really nice improvement -- thank you @eejbyfeldt .
Thank you @Dandandan and @berkaysynnada for the reviews
I went throught he plan changes carefully and they are 👌 👨🍳
@@ -123,64 +123,61 @@ physical_plan | |||
22)------------------------------------------HashJoinExec: mode=Partitioned, join_type=Inner, on=[(p_partkey@0, ps_partkey@0)], projection=[p_partkey@0, p_mfgr@1, ps_suppkey@3, ps_supplycost@4] | |||
23)--------------------------------------------CoalesceBatchesExec: target_batch_size=8192 | |||
24)----------------------------------------------RepartitionExec: partitioning=Hash([p_partkey@0], 4), input_partitions=4 | |||
25)------------------------------------------------ProjectionExec: expr=[p_partkey@0 as p_partkey, p_mfgr@1 as p_mfgr] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these plans are doubly good -- they save two copies -- one in FilterExec
and one in CoalesceBatchesExec
🚀 |
FWIW I ran clickbench and also saw improvement on Q30 and Q31
Which correspond to several queries where the filter is on a string column ( |
Which issue does this PR close?
Closes #5436.
Rationale for this change
Improves performance by avoiding copying some of the columns during execution of FilterExec.
Comparision tpch_mem10
What changes are included in this PR?
Adds a projection to FilterExec similarly to #9236 for HashJoinExec. It also resuses a lot of the logic from that PR to implement the pushdown.
Are these changes tested?
Existing tests.
Are there any user-facing changes?
Yes, projections can now be pushed into the FilterExec.