Unify the logic for column pruning, projection, and filtering of table scans. #213

marmbrus · 2014-03-24T06:02:09Z

This removes duplicated logic, dead code and casting when planning parquet table scans and hive table scans.

Other changes:

Fix tests now that we are doing a better job of column pruning (i.e., since pruning predicates are applied before we even start scanning tuples, columns required by these predicates do not need to be included in the output of the scan unless they are also included in the final output of this logical plan fragment).
Add rule to simplify trivial filters. This was required to avoid WHERE false from getting pushed into table scans, since HiveTableScan (reasonably) refuses to apply partition pruning predicates to non-partitioned tables.

marmbrus · 2014-03-24T06:08:34Z

@liancheng and @AndreSchumacher, please take a look.

AmplabJenkins · 2014-03-24T06:09:58Z

Merged build triggered.

AmplabJenkins · 2014-03-24T06:09:58Z

Merged build started.

AmplabJenkins · 2014-03-24T07:08:46Z

Merged build finished.

AmplabJenkins · 2014-03-24T07:08:47Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13383/

liancheng · 2014-03-24T09:17:44Z

Two unused imports are left in HiveStrategies.scala.

liancheng · 2014-03-24T10:00:41Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala

+  object ParquetScans extends Strategy {
+    def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
+      // TODO: need to support writing to other types of files.
+      case logical.WriteToFile(path, child) =>


Seems a little confusing to have a write operation within a scan strategy here... Maybe we should move this into a separate strategy object similar to HiveStrategies.DataSinks?

liancheng · 2014-03-24T10:13:05Z

LGTM, much cleaner :)

…e scans for both Hive and Parquet relations. Fix tests now that we are doing a better job of column pruning.

AmplabJenkins · 2014-03-25T04:10:35Z

Merged build triggered.

AmplabJenkins · 2014-03-25T04:10:35Z

Merged build started.

AmplabJenkins · 2014-03-25T05:10:26Z

Merged build finished.

AmplabJenkins · 2014-03-25T05:10:27Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13420/

pwendell · 2014-03-25T05:16:30Z

Merged, thanks

…apache#213)

* use beta candidate * all backports added to build * move/add supervize with checkpointing test to hdfs * add kerberos args to supervise test * fix job watchers * add native blas * remove old supervise test

Using cascading delete

…apache#213)

liancheng reviewed Mar 24, 2014
View reviewed changes

marmbrus added 3 commits March 24, 2014 20:04

Unify the logic for column pruning, projection, and filtering of tabl…

0f2c6f5

…e scans for both Hive and Parquet relations. Fix tests now that we are doing a better job of column pruning.

Address comments.

834ce08

Move one more bit of parquet stuff into the core SQLContext.

48ce403

asfgit closed this in b637f2d Mar 25, 2014

marmbrus deleted the strategyCleanup branch March 27, 2014 00:06

mccheah pushed a commit to mccheah/spark that referenced this pull request Oct 12, 2017

Merge pull request apache#213 from palantir/rk/force-dw

eec2345

jamesrgrinter pushed a commit to jamesrgrinter/spark that referenced this pull request Apr 22, 2018

MapR [SPARK-16] Change Spark version in Warden files and configure.sh (…

489e21d

…apache#213)

bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019

Merge pull request apache#213 from mnaser/patch-1

b8386b6

Using cascading delete

wangyum mentioned this pull request Jun 7, 2020

[WIP][SPARK-31919][SQL] Push down more predicates through Join #28741

Closed

arjunshroff pushed a commit to arjunshroff/spark that referenced this pull request Nov 24, 2020

MapR [SPARK-16] Change Spark version in Warden files and configure.sh (…

042df82

…apache#213)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify the logic for column pruning, projection, and filtering of table scans. #213

Unify the logic for column pruning, projection, and filtering of table scans. #213

marmbrus commented Mar 24, 2014

marmbrus commented Mar 24, 2014

AmplabJenkins commented Mar 24, 2014

AmplabJenkins commented Mar 24, 2014

AmplabJenkins commented Mar 24, 2014

AmplabJenkins commented Mar 24, 2014

liancheng commented Mar 24, 2014

liancheng Mar 24, 2014

liancheng commented Mar 24, 2014

AmplabJenkins commented Mar 25, 2014

AmplabJenkins commented Mar 25, 2014

AmplabJenkins commented Mar 25, 2014

AmplabJenkins commented Mar 25, 2014

pwendell commented Mar 25, 2014

Unify the logic for column pruning, projection, and filtering of table scans. #213

Unify the logic for column pruning, projection, and filtering of table scans. #213

Conversation

marmbrus commented Mar 24, 2014

marmbrus commented Mar 24, 2014

AmplabJenkins commented Mar 24, 2014

AmplabJenkins commented Mar 24, 2014

AmplabJenkins commented Mar 24, 2014

AmplabJenkins commented Mar 24, 2014

liancheng commented Mar 24, 2014

liancheng Mar 24, 2014

Choose a reason for hiding this comment

liancheng commented Mar 24, 2014

AmplabJenkins commented Mar 25, 2014

AmplabJenkins commented Mar 25, 2014

AmplabJenkins commented Mar 25, 2014

AmplabJenkins commented Mar 25, 2014

pwendell commented Mar 25, 2014