-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable Parquet Row and Page Filtering by default (WIP) #3828
Conversation
a0cb27c
to
e37e7b9
Compare
e37e7b9
to
9359dfa
Compare
A small update here is that when I ran the tpch benchmarks against the default parquet files created by the benchmark I did not see any improvement. Also, there was some sort of error with the page index code which I need to track down |
Specifically made the parquet files like this:
And then ran
FYI @Ted-Jiang -- haven't had a chance to file this as a ticket or look more carefully into it |
Thanks for testing this, i will try to figure it out tomorrow. |
9359dfa
to
c249b07
Compare
Draft until
ConfigOptions
#3822 is mergedWhich issue does this PR close?
Closes #3463
closes #4085
re #3462
Rationale for this change
This PR turns on parquet scan predicate pushdown (see #3462) by default -- I am putting it up early as part of the testing process (so we can work through any issues it may uncover)
This feature promises to be one of the most significant performance improvements for DataFusion reading from parquet in a while. All the hard work was done by @Ted-Jiang @thinkharderdev and @tustvold
What changes are included in this PR?
Enable pushing filters into the scan directly
Note this feature can be disabled by setting the
datafusion.execution.parquet.pushdown_filters
configuration setting to false.Are there any user-facing changes?
Hopefully faster performance