-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify the logic for column pruning, projection, and filtering of table scans. #213
Conversation
@liancheng and @AndreSchumacher, please take a look. |
Merged build triggered. |
Merged build started. |
Merged build finished. |
All automated tests passed. |
Two unused imports are left in |
object ParquetScans extends Strategy { | ||
def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { | ||
// TODO: need to support writing to other types of files. | ||
case logical.WriteToFile(path, child) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems a little confusing to have a write operation within a scan strategy here... Maybe we should move this into a separate strategy object similar to HiveStrategies.DataSinks
?
LGTM, much cleaner :) |
…e scans for both Hive and Parquet relations. Fix tests now that we are doing a better job of column pruning.
Merged build triggered. |
Merged build started. |
Merged build finished. |
All automated tests passed. |
Merged, thanks |
* use beta candidate * all backports added to build * move/add supervize with checkpointing test to hdfs * add kerberos args to supervise test * fix job watchers * add native blas * remove old supervise test
Using cascading delete
This removes duplicated logic, dead code and casting when planning parquet table scans and hive table scans.
Other changes:
WHERE false
from getting pushed into table scans, sinceHiveTableScan
(reasonably) refuses to apply partition pruning predicates to non-partitioned tables.