-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Pushdown dereference expression to Parquet reader #187
Conversation
Add PushDownDereferenceExpression to pushdown dereferences right above TableScan Add MergeNestedColumn to convert valid dereference into ColumnHandle in TableScan Add NestedColumn into HiveColumnHandle Change in ParquetReader to use NestedColumn in file reading
Thanks for submitting this @qqibrow! I've started looking at the PR a and have a couple of initial high-level comments:
I'll post later with some other comments about the abstraction and some thoughts I have in relation to #18 |
@martint thanks for replying!
Current code rely on the recursion of PlanOptimizer. https://github.com/prestosql/presto/pull/187/files#diff-bceb73c4a557ca9212fb58de061b4076R155 |
With BTW, I started capturing more thoughts about how this and other forms of pushdown could work in the short term: https://github.com/prestosql/presto/wiki/Pushdown-of-complex-operations |
@qqibrow Thanks a lot of reposting this PR! In case you need some examples of how to use Rules to enforce dereference expression. This was my approach: prestodb/presto@3773a21#diff-f17c5e8c28d5cd670607a75e09f8cbbb In the The Side note: You may find out there are some |
@Yaliang thanks for the help! I will take a look. will ping you in slack if more questions :) |
@qqibrow Are you still working on this? |
@phd1994 Yes. This is a working version, but needs refactoring. As discussed, we will break this into 3 parts:
|
@qqibrow - wanted to see if I could help in any way with this effort. I noticed zhenxiao had a PR over in prestodb here that looks to be (1) above. Not sure if it helps, but I ported that over to prestosql here. Looks like your patch has the necessary changes to the Hive connector. Is the remaining work this item from the plan? Just trying to gauge how close we are and what I can do to help. |
@qqibrow Are you still working on this? |
This has already been merged as part of another PR |
Following design:
Dereference expressions will be pushed down to the projection right above tableScan, which saves CPU/Memory/Network cost.
for query:
current plan:
enable dereference pushdown:
MergeNestedColumns
MergeNestedColumns will detect "project above tableScan" pattern and try to push nestedColumn metadata into TableScan. An new Metadata API
getNestedColumnHandles
is created to support custom nested column pushdown logic.getNestedColumnHandles in Hive
getNestedColumnHandles
in Hive return every dereference as independent HiveColumnHandle. AddedOptional<NestedColumn>
in HiveColumnHandle.After all query plan will looks like:
before:
after: