[SPARK-24206][SQL][FOLLOW-UP] Update DataSourceReadBenchmark benchmark results #21625

maropu · 2018-06-24T03:14:44Z

What changes were proposed in this pull request?

This pr corrected the default configuration (spark.master=local[1]) for benchmarks. Also, this updated performance results on the AWS r3.xlarge.

How was this patch tested?

N/A

maropu · 2018-06-24T03:15:05Z

This pr is a follow-up of #21288 (comment)

HyukjinKwon · 2018-06-24T03:48:45Z

sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala

-        /*
-        Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
-        Partitioned Table:                   Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
-        --------------------------------------------------------------------------------------------


Seems missed to update.

oh, thanks. I'll update soon.

oh, I hit the bug in csv parsing when updating this benchmark...

scala> val dir = "/tmp/spark-csv/csv" scala> spark.range(10).selectExpr("id % 2 AS p", "id").write.mode("overwrite").partitionBy("p").csv(dir) scala> spark.read.csv(dir).selectExpr("sum(p)").collect() 18/06/25 13:12:51 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 5) java.lang.NullPointerException at org.apache.spark.sql.execution.datasources.csv.UnivocityParser.org$apache$spark$sql$execution$datasources$csv$UnivocityParser$$convert(UnivocityParser.scala:197) at org.apache.spark.sql.execution.datasources.csv.UnivocityParser.parse(UnivocityParser.scala:190) at org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$5.apply(UnivocityParser.scala:309) at org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$5.apply(UnivocityParser.scala:309) at org.apache.spark.sql.execution.datasources.FailureSafeParser.parse(FailureSafeParser.scala:61) ...

I filed a jira; https://issues.apache.org/jira/browse/SPARK-24645

@maropu, if the JIRA blocks this PR and looks taking a while to fix it, please feel free to set the configuration to false within this benchmark and proceed. Technically, looks that's what the benchmark originally covered at that time it's merged in. Setting it true can be separately done in the JIRA you opened.

yea, I though I would do so first, but I couldn't because I hit another bug when the column pruning disabled...;

./bin/spark-shell --conf spark.sql.csv.parser.columnPruning.enabled=false scala> val dir = "/tmp/spark-csv/csv" scala> spark.range(10).selectExpr("id % 2 AS p", "id").write.mode("overwrite").partitionBy("p").csv(dir) scala> spark.read.csv(dir).selectExpr("sum(p)").collect() 18/06/25 13:48:46 ERROR Executor: Exception in task 2.0 in stage 2.0 (TID 7) java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Integer at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:101) at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getInt(rows.scala:41) ...

@HyukjinKwon I'm currently fixing this now. But, it seems this bug is similar to SPARK-24645. So, would it be better to merge this fix with SPARK-24645?

Anyway, I updated the results by applying #21631

SparkQA · 2018-06-24T06:55:24Z

Test build #92264 has finished for PR 21625 at commit 2352820.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-06-24T20:45:25Z

sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala

@@ -39,9 +39,11 @@ import org.apache.spark.util.{Benchmark, Utils}
 object DataSourceReadBenchmark {
  val conf = new SparkConf()
    .setAppName("DataSourceReadBenchmark")
-    .setIfMissing("spark.master", "local[1]")
+    // Since `spark.master` always exists, overrides this value
+    .set("spark.master", "local[1]")


Thank you for fixing this and updating the result, @maropu .

SparkQA · 2018-06-25T11:31:29Z

Test build #92294 has finished for PR 21625 at commit 4e76ffd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gengliangwang

LGTM

HyukjinKwon · 2018-06-28T01:20:26Z

LGTM too

Merged to master.

Fix

2352820

HyukjinKwon reviewed Jun 24, 2018

View reviewed changes

dongjoon-hyun reviewed Jun 24, 2018

View reviewed changes

maropu mentioned this pull request Jun 25, 2018

[SPARK-24645][SQL] Skip parsing when csvColumnPruning enabled and partitions scanned only #21631

Closed

Fix

4e76ffd

gengliangwang reviewed Jun 28, 2018

View reviewed changes

asfgit closed this in 1c9acc2 Jun 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-24206][SQL][FOLLOW-UP] Update DataSourceReadBenchmark benchmark results #21625

[SPARK-24206][SQL][FOLLOW-UP] Update DataSourceReadBenchmark benchmark results #21625

maropu commented Jun 24, 2018

maropu commented Jun 24, 2018

HyukjinKwon Jun 24, 2018

maropu Jun 24, 2018

maropu Jun 25, 2018

maropu Jun 25, 2018

HyukjinKwon Jun 25, 2018 •

edited

Loading

maropu Jun 25, 2018

maropu Jun 25, 2018

maropu Jun 25, 2018

SparkQA commented Jun 24, 2018

dongjoon-hyun Jun 24, 2018

SparkQA commented Jun 25, 2018

gengliangwang left a comment

HyukjinKwon commented Jun 28, 2018

[SPARK-24206][SQL][FOLLOW-UP] Update DataSourceReadBenchmark benchmark results #21625

[SPARK-24206][SQL][FOLLOW-UP] Update DataSourceReadBenchmark benchmark results #21625

Conversation

maropu commented Jun 24, 2018

What changes were proposed in this pull request?

How was this patch tested?

maropu commented Jun 24, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HyukjinKwon Jun 25, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jun 24, 2018

Choose a reason for hiding this comment

SparkQA commented Jun 25, 2018

gengliangwang left a comment

Choose a reason for hiding this comment

HyukjinKwon commented Jun 28, 2018

HyukjinKwon Jun 25, 2018 •

edited

Loading