[SPARK-25363][SQL] Fix schema pruning in where clause by ignoring unnecessary root fields #22357

viirya · 2018-09-07T05:39:09Z

What changes were proposed in this pull request?

Schema pruning doesn't work if nested column is used in where clause.

For example,

sql("select name.first from contacts where name.first = 'David'")

== Physical Plan ==
*(1) Project [name#19.first AS first#40]
+- *(1) Filter (isnotnull(name#19) && (name#19.first = David))
   +- *(1) FileScan parquet [name#19] Batched: false, Format: Parquet, PartitionFilters: [], 
    PushedFilters: [IsNotNull(name)], ReadSchema: struct<name:struct<first:string,middle:string,last:string>>

In above query plan, the scan node reads the entire schema of name column.

This issue is reported by:
#21320 (comment)

The cause is that we infer a root field from expression IsNotNull(name). However, for such expression, we don't really use the nested fields of this root field, so we can ignore the unnecessary nested fields.

How was this patch tested?

Unit tests.

viirya · 2018-09-07T05:39:47Z

cc @dbtsai

dbtsai · 2018-09-07T06:02:34Z

...est/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala

+  testSchemaPruning("select a single complex field and in where clause") {
+    val query = sql("select name.first from contacts where name.first = 'Jane'")
+    checkScan(query, "struct<name:struct<first:string>>")
+    checkAnswer(query, Row("Jane") :: Nil)


can you add another tests that select name.first and name.last, and apply where clause on name.first. We should only read name.first and name.last without name.middle.

Yes. Added test case for it.

HyukjinKwon · 2018-09-07T06:27:39Z

...src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala

@@ -196,6 +196,7 @@ private[sql] object ParquetSchemaPruning extends Rule[LogicalPlan] {
   */
  private def getRootFields(expr: Expression): Seq[RootField] = {
    expr match {
+      case IsNotNull(_: Attribute) | IsNull(_: Attribute) => Seq.empty


Hmmmm .. shouldn't we exclude this only for filters?

If this is in projects, I think we also don't need to include all nested fields?

But the case mentioned here looks specific to the pushed filter itself. Can we add a simple test for project case as well?

I mean, for instance, this case select address is not null, name.last from contacts it wouldn't work. I thought this is a quick bandaid fix to resolve a basic case.

Thanks. This was a case I didn't test. Fixed it and added test case.

SparkQA · 2018-09-07T07:05:01Z

Test build #95780 has finished for PR 22357 at commit 3de6ee2.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-09-07T07:38:28Z

retest this please

SparkQA · 2018-09-07T11:20:23Z

Test build #95794 has finished for PR 22357 at commit d68f808.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-09-07T11:25:55Z

Test build #95787 has finished for PR 22357 at commit 3de6ee2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2018-09-07T11:27:00Z

retest this please.

SparkQA · 2018-09-07T14:59:05Z

Test build #95799 has finished for PR 22357 at commit d68f808.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mallman · 2018-09-07T18:02:54Z

Hi @viirya,

Thanks for this PR! I have an alternative implementation which I'd like to submit for comparison. My implementation was something I removed from my original patch.

I hope to have my PR submitted sometime today. I have another PR to submit, too. I'll be sure to refer to your PR in mine.

Cheers.

mallman · 2018-09-08T04:42:35Z

I have reconstructed my original patch for this issue, but I've discovered it will require more work to complete. However, as part of that reconstruction I've discovered a couple of cases where our patches create different physical plans. The query results are the same, but I'm not sure which—if either—plan is correct. I want to go into detail on that, but it's complicated and I have to call it quits tonight. I have a flight in the morning, and I'll be on break next week.

In the meantime, I'll just copy and paste two queries—based on the data in ParquetSchemaPruningSuite.scala—with two query plans each.

First query:

select employer.id from contacts where employer is not null

This PR (as of d68f808) produces:

== Physical Plan ==
*(1) Project [employer#4442.id AS id#4452]
+- *(1) Filter isnotnull(employer#4442)
   +- *(1) FileScan parquet [employer#4442,p#4443] Batched: false, Format: Parquet,
    PartitionCount: 2, PartitionFilters: [], PushedFilters: [IsNotNull(employer)],
    ReadSchema: struct<employer:struct<id:int>>

My WIP patch produces:

== Physical Plan ==
*(1) Project [employer#4442.id AS id#4452]
+- *(1) Filter isnotnull(employer#4442)
   +- *(1) FileScan parquet [employer#4442,p#4443] Batched: false, Format: Parquet,
    PartitionCount: 2, PartitionFilters: [], PushedFilters: [IsNotNull(employer)],
    ReadSchema: struct<employer:struct<id:int,company:struct<name:string,address:string>>>

Second query:

select employer.id from contacts where employer.id = 0

This PR produces:

== Physical Plan ==
*(1) Project [employer#4297.id AS id#4308]
+- *(1) Filter (isnotnull(employer#4297) && (employer#4297.id = 0))
   +- *(1) FileScan parquet [employer#4297,p#4298] Batched: false, Format: Parquet,
    PartitionCount: 2, PartitionFilters: [], PushedFilters: [IsNotNull(employer)],
    ReadSchema: struct<employer:struct<id:int>>

My WIP patch produces:

== Physical Plan ==
*(1) Project [employer#4445.id AS id#4456]
+- *(1) Filter (isnotnull(employer#4445.id) && (employer#4445.id = 0))
   +- *(1) FileScan parquet [employer#4445,p#4446] Batched: false, Format: Parquet,
    PartitionCount: 2, PartitionFilters: [], PushedFilters: [],
    ReadSchema: struct<employer:struct<id:int>>

I wanted to give my thoughts on the differences of these in detail, but I have to wrap up my work for the night. I'll be visiting family next week. I don't know how responsive I'll be in that time, but I'll at least try to check back.

Cheers.

viirya · 2018-09-09T00:01:37Z

Thanks! @mallman

For the first query, I think the query plan produced by your WIP patch is not correct. We don't need to read the company:struct from employer:struct.

For the second, your WIP patch doesn't push down IsNotNull(employer) predicate into
FileScan node.

That is the important difference I noticed for now.

dbtsai · 2018-09-10T04:41:04Z

@mallman It will be great that we can have this fix in 2.4 release as this can dramatically reduce the data being read in many applications which is the purpose of the original work.

As we're going to cut 2.4 RC soon, and you will not be available next week, can you post your implementation so we can work together to make this happen? We can now track the co-authorships in Spark and Github with contribution stats, so it will be cool to see multiple people collaborate on one PR :)

Thanks.

dbtsai · 2018-09-10T05:01:26Z

...src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala

@@ -196,6 +201,9 @@ private[sql] object ParquetSchemaPruning extends Rule[LogicalPlan] {
   */
  private def getRootFields(expr: Expression): Seq[RootField] = {
    expr match {
+      // Those expressions don't really use the nested fields of a root field.
+      case i@(IsNotNull(_: Attribute) | IsNull(_: Attribute)) =>
+        getRootFields(i.children(0)).map(_.copy(contentAccessed = false))
      case att: Attribute =>
        RootField(StructField(att.name, att.dataType, att.nullable), derivedFromAtt = true) :: Nil
      case SelectedField(field) => RootField(field, derivedFromAtt = false) :: Nil


How about

case IsNotNull(_: Attribute) | IsNull(_: Attribute) => expr.children.flatMap(getRootFields).map(_.copy(contentAccessed = false)) case _ => expr.children.flatMap(getRootFields)

dbtsai · 2018-09-10T05:11:12Z

In general, the approach looks right to me except couple minor points. Thanks.

dbtsai · 2018-09-10T05:12:23Z

...src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala

   */
-  private case class RootField(field: StructField, derivedFromAtt: Boolean)
+  private case class RootField(field: StructField, derivedFromAtt: Boolean,
+                               contentAccessed: Boolean = true)


Formatting and please elaborate the comment

dbtsai · 2018-09-10T05:13:17Z

...src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala

@@ -17,7 +17,7 @@

 package org.apache.spark.sql.execution.datasources.parquet

-import org.apache.spark.sql.catalyst.expressions.{And, Attribute, AttributeReference, Expression, NamedExpression}
+import org.apache.spark.sql.catalyst.expressions.{And, Attribute, AttributeReference, Expression, IsNotNull, IsNull, NamedExpression}


line too long.

which can be wildcard when there are more than 6 entities per https://github.com/databricks/scala-style-guide#imports

dbtsai · 2018-09-10T05:32:11Z

cc @beettlle

dbtsai · 2018-09-10T05:42:23Z

...src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala

@@ -110,7 +110,12 @@ private[sql] object ParquetSchemaPruning extends Rule[LogicalPlan] {
    val projectionRootFields = projects.flatMap(getRootFields)
    val filterRootFields = filters.flatMap(getRootFields)

-    (projectionRootFields ++ filterRootFields).distinct
+    val (rootFields, optRootFields) = (projectionRootFields ++ filterRootFields)
+      .distinct.partition(_.contentAccessed)


Some comments here please.

dbtsai · 2018-09-10T05:51:08Z

...est/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala

+    checkAnswer(query2, Row("Jane", "Doe") :: Nil)
+
+    val query3 = sql("select name.first from contacts " +
+      "where employer.company.name = 'abc' and p = 1")


Let's say a user adds where employer.company is not null, can we still read schema with employer:struct<company:struct<name:string>>> as we only mark contentAccessed = false when IsNotNull is on an attribute?

Added one query test for this case. Thanks.

When there is a nested field access in the query like employer.company.name, then we don't need other fields inside employ.company other than name.

But if there is no such access but just employer.company is not null in where clause, it will read full schema of employ.company.

HyukjinKwon · 2018-09-10T07:17:11Z

...src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala

@@ -196,6 +201,9 @@ private[sql] object ParquetSchemaPruning extends Rule[LogicalPlan] {
   */
  private def getRootFields(expr: Expression): Seq[RootField] = {
    expr match {
+      // Those expressions don't really use the nested fields of a root field.
+      case i@(IsNotNull(_: Attribute) | IsNull(_: Attribute)) =>


nit: -> i @ (IsNotNull(_: ...

HyukjinKwon · 2018-09-10T07:19:24Z

...src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala

@@ -156,7 +161,7 @@ private[sql] object ParquetSchemaPruning extends Rule[LogicalPlan] {
    // in the resulting schema may differ from their ordering in the logical relation's
    // original schema
    val mergedSchema = requestedRootFields
-      .map { case RootField(field, _) => StructType(Array(field)) }
+      .map { case RootField(field, _, _) => StructType(Array(field)) }


Not a big deal but .map { root: RootField => StructType(Array(root.field)) } per https://github.com/databricks/scala-style-guide#pattern-matching

viirya · 2018-09-10T08:55:21Z

Thanks @dbtsai and @HyukjinKwon. Your comments are addressed.

viirya · 2018-09-10T09:32:49Z

...src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala

+      case IsNotNull(SelectedField(field)) =>
+        RootField(field, derivedFromAtt = false, contentAccessed = false) :: Nil
+      case IsNull(SelectedField(field)) =>
+        RootField(field, derivedFromAtt = false, contentAccessed = false) :: Nil


@dbtsai The question you mentioned at https://github.com/apache/spark/pull/22357/files#r216204022 was addressed by this.

viirya · 2018-09-10T16:10:27Z

Btw, this PR isn't intended to address filter push down for schema pruning. I do think it should be another one topic.

SparkQA · 2018-09-10T20:04:45Z

Test build #95884 has finished for PR 22357 at commit d0b82e6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-09-10T20:19:04Z

Test build #95871 has finished for PR 22357 at commit d0b82e6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2018-09-11T02:42:50Z

FYI, per further checking code and discussion with @dbtsai regarding with predicate pushdown, we know that predicate push down only works for primitive types on Parquet datasource. So both IsNotNull(employer) and IsNotNull(employer.id) are not actually pushed down to work at Parquet reader.

felixcheung · 2018-09-11T03:53:14Z

if recall, parquet reader can have filter pushdown? only not so in spark parquet data source?

mallman · 2018-09-11T03:59:53Z

FYI, per further checking code and discussion with @dbtsai regarding with predicate pushdown, we know that predicate push down only works for primitive types on Parquet datasource. So both IsNotNull(employer) and IsNotNull(employer.id) are not actually pushed down to work at Parquet reader

I would expect IsNotNull(employer.id) to be pushed down. In any case, I misunderstood what that PushedFilters metadata item means in the FileScan part of the physical plan. I thought that was a Parquet filter, but sometimes it is not. In any case, I'm not concerned about supporting filter push down at this point. My concern is around its side effects, but that has been allayed.

mallman · 2018-09-11T04:54:28Z

...src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala

+    // For them, if there are any nested fields accessed in the query, we don't need to add root
+    // field access of above expressions.
+    // For example, for a query `SELECT name.first FROM contacts WHERE name IS NOT NULL`,
+    // we don't need to read nested fields of `name` struct other than `first` field.


I'm having trouble accepting this statement, but perhaps I'm reading too much into it (or not enough). Let me illustrate with a couple of queries and their physical plans.

Assuming the data model in ParquetSchemaPruningSuite.scala, the physical plan for the query

select employer.id from contacts where employer is not null

is

== Physical Plan == *(1) Project [employer#36.id AS id#46] +- *(1) Filter isnotnull(employer#36) +- *(1) FileScan parquet [employer#36,p#37] Batched: false, Format: Parquet, PartitionCount: 2, PartitionFilters: [], PushedFilters: [IsNotNull(employer)], ReadSchema: struct<employer:struct<id:int>>

The physical plan for the query

select employer.id from contacts where employer.id is not null

is

== Physical Plan == *(1) Project [employer#36.id AS id#47] +- *(1) Filter (isnotnull(employer#36) && isnotnull(employer#36.id)) +- *(1) FileScan parquet [employer#36,p#37] Batched: false, Format: Parquet, PartitionCount: 2, PartitionFilters: [], PushedFilters: [IsNotNull(employer)], ReadSchema: struct<employer:struct<id:int>>

The read schemata are the same, but the query filters are not. The file scan for the second query looks as I would expect, but the scan for the first query appears to only read employer.id even though it needs to check employer is not null. If it only reads employer.id, how does it check that employer.company is not null? Perhaps employer.id is null but employer.company is not null for some row...

I have run some tests to validate that this PR is returning the correct results for both queries, and it is. But I don't understand why.

For the first query, the constrain is employer is not null.

When employer.id is not null, employer will always not be null; as a result, this PR will work.

However, when employer.id is null, employer can be null or something, so we need to check if employer is something to return a null of employer.id.

I checked in the ParquetFilter, IsNotNull(employer) will be ignored since it's not a valid parquet filter as parquet doesn't support pushdown on the struct; thus, with this PR, this query will return wrong answer.

I think in this scenario, as @mallman suggested, we might need to read the full data.

I checked in the ParquetFilter, IsNotNull(employer) will be ignored since it's not a valid parquet filter as parquet doesn't support pushdown on the struct; thus, with this PR, this query will return wrong answer.

We may not worry about wrong answer from datasource like Parquet in predicate pushdown. As not all predicates are supported by pushdown, we always have a SparkSQL Filter on top of scan node to make sure to receive correct answer.

A complex column is null and its fields are null are different. I think we don't need to read all the fields to check if the complex column is null. In other words, in above case, when we only read employer.id and it is null, the predicate employer is not null will still be true because it is a complex column containing a null field.

@viirya, I see your point about the difference between a complex type being null and a subfield being null. So to answer the following query

select address from contacts where name is not null

do we need to read any of the fields in name? Or perhaps just read one arbitrary field of simple type, like name.first? That's surprising, but I'm starting to believe it's true.

Currently under this PR, name will be fully read. This is not perfect. However, to pick one arbitrary field from name sounds a little bit hacky to me. WDYT? cc @dbtsai @cloud-fan

Btw, I think this is not seen as schema pruning case in the sense of original PR, so maybe we can leave it as it for now.

Yeah, I'm okay with leaving it as-is.

Instead of reading any arbitrary field of simple type (which may not exist if it's a deeply nested struct), I think we should implement the pushdown with complex type in parquet with similar logic, and let parquet reader handle it.

@viirya Can you create a followup JIRA for this?

Thanks.

dbtsai · 2018-09-11T06:39:20Z

FYI, @mallman I'm working on having ParquetFilter to support IsNotNull(employer.id) to be pushed into parquet reader.

HyukjinKwon · 2018-09-11T08:27:54Z

Can anyone point me out if there are non addressed comments or problems here? Looks pretty good to me. I think this is rather a bandaid, small and safe fix to get into branch-2.4.

SparkQA · 2018-09-11T13:14:16Z

Test build #95931 has finished for PR 22357 at commit d684a0f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mallman · 2018-09-11T14:18:40Z

...est/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala

+    checkAnswer(query4, Row("Jane", "abc") :: Nil)
+  }
+
+  testSchemaPruning("select nullable complex field and having is null predicate") {


Do you mean having is not null predicate?

Oops, yes, thanks.

mallman · 2018-09-11T14:36:47Z

@viirya Please amend

spark/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala

Lines 303 to 306 in d684a0f

    
           // Pruning with filters is currently unsupported. As-is, the file reader will read the id column 
        
           // and the entire coL2 struct. Once pruning with filters has been implemented we can uncomment 
        
           // this line 
        
           // checkScan(query, "struct<id:int,coL2:struct<B:int>>")

to remove the explanatory comment, and uncomment the commented-out line of code.

mallman · 2018-09-11T14:48:36Z

I have some bad news. The methods testSchemaPruning and testMixedCasePruning do not set the configuration settings as expected. Fixing that reveals 6 failing tests for the mixed case tests. One of those failing tests involves testing the scan and answer for a query involving a filter condition.

Based on what I'm seeing, I think it's fair to say that schema pruning is broken under certain circumstances when using a table schema that includes column names with upper-case characters (note that the test schema for contacts in ParquetSchemaPruningSuite.scala includes no fields with upper-case characters).

Fortunately schema pruning is disabled by default, and I think it's still considered "experimental" technology.

I think that fixing ParquetSchemaPruningSuite.scala is pretty straightforward. Fixing the newly failing unit tests will be more effort.

In any case, I will create an issue in Jira and submit a PR.

SparkQA · 2018-09-11T15:06:13Z

Test build #95945 has finished for PR 22357 at commit 03bf540.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2018-09-11T15:12:27Z

retest this please.

mallman · 2018-09-11T15:22:19Z

FYI, the PR I previously mentioned about fixing the use of withSQLConf is #22394.

mallman · 2018-09-11T15:41:59Z

This LGTM. I'm not going to submit a PR for my approach to this problem.

Thanks @viirya!

mallman · 2018-09-11T16:03:08Z

And FYI this is the Jira issue I promised in
#22357 (comment) yesterday: https://issues.apache.org/jira/browse/SPARK-25407.

mallman · 2018-09-11T16:03:35Z

FYI, @mallman I'm working on having ParquetFilter to support IsNotNull(employer.id) to be pushed into parquet reader.

That would be pretty cool.

dbtsai · 2018-09-11T18:51:55Z

LGTM.

Thank you all for participating the discussion. @cloud-fan and @gatorsmile, do you have any further comment? If not, I would like to merge it tomorrow into both master and rc branch as it's an important performance fix for schema pruning.

Thanks.

SparkQA · 2018-09-11T19:19:47Z

Test build #95950 has finished for PR 22357 at commit 03bf540.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-09-12T03:42:45Z

LGTM from me too.

…ecessary root fields ## What changes were proposed in this pull request? Schema pruning doesn't work if nested column is used in where clause. For example, ``` sql("select name.first from contacts where name.first = 'David'") == Physical Plan == *(1) Project [name#19.first AS first#40] +- *(1) Filter (isnotnull(name#19) && (name#19.first = David)) +- *(1) FileScan parquet [name#19] Batched: false, Format: Parquet, PartitionFilters: [], PushedFilters: [IsNotNull(name)], ReadSchema: struct<name:struct<first:string,middle:string,last:string>> ``` In above query plan, the scan node reads the entire schema of `name` column. This issue is reported by: #21320 (comment) The cause is that we infer a root field from expression `IsNotNull(name)`. However, for such expression, we don't really use the nested fields of this root field, so we can ignore the unnecessary nested fields. ## How was this patch tested? Unit tests. Closes #22357 from viirya/SPARK-25363. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: DB Tsai <d_tsai@apple.com> (cherry picked from commit 3030b82) Signed-off-by: DB Tsai <d_tsai@apple.com>

dbtsai · 2018-09-12T17:44:47Z

Thanks all again. Merged into 2.4 branch and master.

…ecessary root fields ## What changes were proposed in this pull request? Schema pruning doesn't work if nested column is used in where clause. For example, ``` sql("select name.first from contacts where name.first = 'David'") == Physical Plan == *(1) Project [name#19.first AS first#40] +- *(1) Filter (isnotnull(name#19) && (name#19.first = David)) +- *(1) FileScan parquet [name#19] Batched: false, Format: Parquet, PartitionFilters: [], PushedFilters: [IsNotNull(name)], ReadSchema: struct<name:struct<first:string,middle:string,last:string>> ``` In above query plan, the scan node reads the entire schema of `name` column. This issue is reported by: apache#21320 (comment) The cause is that we infer a root field from expression `IsNotNull(name)`. However, for such expression, we don't really use the nested fields of this root field, so we can ignore the unnecessary nested fields. ## How was this patch tested? Unit tests. Closes apache#22357 from viirya/SPARK-25363. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: DB Tsai <d_tsai@apple.com>

…ecessary root fields Schema pruning doesn't work if nested column is used in where clause. For example, ``` sql("select name.first from contacts where name.first = 'David'") == Physical Plan == *(1) Project [name#19.first AS first#40] +- *(1) Filter (isnotnull(name#19) && (name#19.first = David)) +- *(1) FileScan parquet [name#19] Batched: false, Format: Parquet, PartitionFilters: [], PushedFilters: [IsNotNull(name)], ReadSchema: struct<name:struct<first:string,middle:string,last:string>> ``` In above query plan, the scan node reads the entire schema of `name` column. This issue is reported by: apache#21320 (comment) The cause is that we infer a root field from expression `IsNotNull(name)`. However, for such expression, we don't really use the nested fields of this root field, so we can ignore the unnecessary nested fields. Unit tests. Closes apache#22357 from viirya/SPARK-25363. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: DB Tsai <d_tsai@apple.com> (cherry picked from commit 3030b82) Signed-off-by: DB Tsai <d_tsai@apple.com> Ref: LIHADOOP-48531

Fix schema pruning when used in where clause.

3de6ee2

dbtsai reviewed Sep 7, 2018

View reviewed changes

dbtsai mentioned this pull request Sep 7, 2018

[SPARK-4502][SQL] Parquet nested column pruning - foundation #21320

Closed

HyukjinKwon reviewed Sep 7, 2018

View reviewed changes

Address comments.

d68f808

dbtsai reviewed Sep 10, 2018

View reviewed changes

HyukjinKwon reviewed Sep 10, 2018

View reviewed changes

HyukjinKwon approved these changes Sep 10, 2018

View reviewed changes

Address comments.

d0b82e6

viirya force-pushed the SPARK-25363 branch from 9168884 to d0b82e6 Compare September 10, 2018 09:28

viirya commented Sep 10, 2018

View reviewed changes

mallman reviewed Sep 11, 2018

View reviewed changes

Add one more test case.

d684a0f

mallman reviewed Sep 11, 2018

View reviewed changes

Address comments.

03bf540

asfgit closed this in 3030b82 Sep 12, 2018

viirya deleted the SPARK-25363 branch December 27, 2023 18:35

[SPARK-25363][SQL] Fix schema pruning in where clause by ignoring unnecessary root fields #22357

[SPARK-25363][SQL] Fix schema pruning in where clause by ignoring unnecessary root fields #22357

Conversation

viirya commented Sep 7, 2018

What changes were proposed in this pull request?

How was this patch tested?

viirya commented Sep 7, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HyukjinKwon Sep 7, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Sep 7, 2018

HyukjinKwon commented Sep 7, 2018

SparkQA commented Sep 7, 2018

SparkQA commented Sep 7, 2018

viirya commented Sep 7, 2018

SparkQA commented Sep 7, 2018

mallman commented Sep 7, 2018

mallman commented Sep 8, 2018

viirya commented Sep 9, 2018

dbtsai commented Sep 10, 2018

Choose a reason for hiding this comment

dbtsai commented Sep 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HyukjinKwon Sep 10, 2018 • edited Loading

Choose a reason for hiding this comment

dbtsai commented Sep 10, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

viirya commented Sep 10, 2018

Choose a reason for hiding this comment

viirya commented Sep 10, 2018

SparkQA commented Sep 10, 2018

SparkQA commented Sep 10, 2018

viirya commented Sep 11, 2018

felixcheung commented Sep 11, 2018

mallman commented Sep 11, 2018

mallman Sep 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbtsai commented Sep 11, 2018

HyukjinKwon commented Sep 11, 2018

SparkQA commented Sep 11, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mallman commented Sep 11, 2018

mallman commented Sep 11, 2018

SparkQA commented Sep 11, 2018

viirya commented Sep 11, 2018

mallman commented Sep 11, 2018

mallman commented Sep 11, 2018

mallman commented Sep 11, 2018

mallman commented Sep 11, 2018

dbtsai commented Sep 11, 2018

SparkQA commented Sep 11, 2018

HyukjinKwon commented Sep 12, 2018

dbtsai commented Sep 12, 2018

HyukjinKwon Sep 7, 2018 •

edited

Loading

dbtsai commented Sep 10, 2018 •

edited

Loading

HyukjinKwon Sep 10, 2018 •

edited

Loading

mallman Sep 11, 2018 •

edited

Loading