Skip to content

Commit

Permalink
[SPARK-26677][BUILD] Update Parquet to 1.10.1 with notEq pushdown fix.
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?

Update to Parquet Java 1.10.1.

## How was this patch tested?

Added a test from HyukjinKwon that validates the notEq case from SPARK-26677.

Closes apache#23704 from rdblue/SPARK-26677-fix-noteq-parquet-bug.

Lead-authored-by: Ryan Blue <blue@apache.org>
Co-authored-by: Hyukjin Kwon <gurwls223@apache.org>
Co-authored-by: Ryan Blue <rdblue@users.noreply.github.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit f72d217)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
  • Loading branch information
3 people authored and kai-chi committed Jul 25, 2019
1 parent 555287d commit d5316d7
Show file tree
Hide file tree
Showing 4 changed files with 26 additions and 11 deletions.
10 changes: 5 additions & 5 deletions dev/deps/spark-deps-hadoop-2.7
Original file line number Diff line number Diff line change
Expand Up @@ -160,13 +160,13 @@ orc-shims-1.5.4.jar
oro-2.0.8.jar
osgi-resource-locator-1.0.1.jar
paranamer-2.8.jar
parquet-column-1.10.0.jar
parquet-common-1.10.0.jar
parquet-encoding-1.10.0.jar
parquet-column-1.10.1.jar
parquet-common-1.10.1.jar
parquet-encoding-1.10.1.jar
parquet-format-2.4.0.jar
parquet-hadoop-1.10.0.jar
parquet-hadoop-1.10.1.jar
parquet-hadoop-bundle-1.6.0.jar
parquet-jackson-1.10.0.jar
parquet-jackson-1.10.1.jar
protobuf-java-2.5.0.jar
py4j-0.10.7.jar
pyrolite-4.13.jar
Expand Down
10 changes: 5 additions & 5 deletions dev/deps/spark-deps-hadoop-3.1
Original file line number Diff line number Diff line change
Expand Up @@ -178,13 +178,13 @@ orc-shims-1.5.4.jar
oro-2.0.8.jar
osgi-resource-locator-1.0.1.jar
paranamer-2.8.jar
parquet-column-1.10.0.jar
parquet-common-1.10.0.jar
parquet-encoding-1.10.0.jar
parquet-column-1.10.1.jar
parquet-common-1.10.1.jar
parquet-encoding-1.10.1.jar
parquet-format-2.4.0.jar
parquet-hadoop-1.10.0.jar
parquet-hadoop-1.10.1.jar
parquet-hadoop-bundle-1.6.0.jar
parquet-jackson-1.10.0.jar
parquet-jackson-1.10.1.jar
protobuf-java-2.5.0.jar
py4j-0.10.7.jar
pyrolite-4.13.jar
Expand Down
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@
<!-- Version used for internal directory structure -->
<hive.version.short>3.0.0.1</hive.version.short>
<derby.version>10.12.1.1</derby.version>
<parquet.version>1.10.0</parquet.version>
<parquet.version>1.10.1</parquet.version>
<orc.version>1.5.4</orc.version>
<orc.classifier></orc.classifier>
<hive.parquet.version>1.6.0</hive.parquet.version>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -891,6 +891,21 @@ class ParquetQuerySuite extends QueryTest with ParquetTest with SharedSQLContext
}
}
}

test("SPARK-26677: negated null-safe equality comparison should not filter matched row groups") {
(true :: false :: Nil).foreach { vectorized =>
withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> vectorized.toString) {
withTempPath { path =>
// Repeated values for dictionary encoding.
Seq(Some("A"), Some("A"), None).toDF.repartition(1)
.write.parquet(path.getAbsolutePath)
val df = spark.read.parquet(path.getAbsolutePath)
checkAnswer(stripSparkFilter(df.where("NOT (value <=> 'A')")), df)
}
}
}
}

}

object TestingUDT {
Expand Down

0 comments on commit d5316d7

Please sign in to comment.