[SPARK-29292][STREAMING][SQL][BUILD] Get streaming, catalyst, sql compiling for Scala 2.13 #29078

srowen · 2020-07-12T22:50:15Z

What changes were proposed in this pull request?

Continuation of #28971 which lets streaming, catalyst and sql compile for 2.13. Same idea.

Why are the changes needed?

Eventually, we need to support a Scala 2.13 build, perhaps in Spark 3.1.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests. (2.13 was not tested; this is about getting it to compile without breaking 2.12)

srowen · 2020-07-12T22:51:07Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala

@@ -46,7 +46,7 @@ class HadoopFileLinesReader(

  def this(file: PartitionedFile, conf: Configuration) = this(file, None, conf)

-  private val iterator = {
+  private val _iterator = {


This is to resolve a weird compile error:

[ERROR] [Error] /Users/seanowen/Documents/spark_2.13/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala:49: weaker access privileges in overriding final def iterator: Iterator[org.apache.hadoop.io.Text] (defined in trait Iterator) override should not be private [ERROR] [Error] /Users/seanowen/Documents/spark_2.13/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileWholeTextReader.scala:38: weaker access privileges in overriding final def iterator: Iterator[org.apache.hadoop.io.Text] (defined in trait Iterator) override should not be private

Not sure, but easy to avoid the name conflict entirely.

srowen · 2020-07-12T22:51:38Z

sql/core/src/test/scala/org/apache/spark/sql/DatasetPrimitiveSuite.scala

@@ -223,16 +223,6 @@ class DatasetPrimitiveSuite extends QueryTest with SharedSparkSession {
    checkDataset(Seq(Queue(true)).toDS(), Queue(true))
    checkDataset(Seq(Queue("test")).toDS(), Queue("test"))
    checkDataset(Seq(Queue(Tuple1(1))).toDS(), Queue(Tuple1(1)))
-
-    checkDataset(Seq(ArrayBuffer(1)).toDS(), ArrayBuffer(1))


For complex reasons, I don't think these tests / use cases are possible as-is in Scala 2.13. They are niche, so I jsut removed them.

Although this means the removal of test coverage in Scala 2.12, I'm +1 for now. We can add back later after we finished everything in Scala 2.13.

dongjoon-hyun · 2020-07-12T22:53:24Z

Thank you, @srowen .

SparkQA · 2020-07-12T23:11:20Z

Test build #125735 has finished for PR 29078 at commit 1cd9dd7.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-07-13T00:26:47Z

Test build #125738 has finished for PR 29078 at commit 00eceec.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-07-13T02:38:38Z

Test build #125742 has finished for PR 29078 at commit 77fb2be.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2020-07-13T02:58:43Z

Jenkins retest this please

SparkQA · 2020-07-13T03:04:25Z

Test build #125745 has started for PR 29078 at commit 77fb2be.

dongjoon-hyun · 2020-07-13T15:33:54Z

Retest this please

SparkQA · 2020-07-13T16:49:12Z

Test build #125779 has finished for PR 29078 at commit 77fb2be.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2020-07-13T17:34:58Z

Weird one:

KafkaContinuousSinkSuite:
[info] - streaming - write to kafka with topic field *** FAILED *** (31 seconds, 238 milliseconds)
[info]   The code passed to eventually never returned normally. Attempted 14 times over 30.741770862000003 seconds. Last failure message: 
[info]   Decoded objects do not match expected objects:
[info]   expected: WrappedArray(1, 2, 3, 4, 5)
[info]   actual:   WrappedArray()
[info]   assertnotnull(upcast(getcolumnbyordinal(0, IntegerType), IntegerType, - root class: "scala.Int"))
[info]   +- upcast(getcolumnbyordinal(0, IntegerType), IntegerType, - root class: "scala.Int")
[info]      +- getcolumnbyordinal(0, IntegerType)
[info]   
[info]            . (KafkaSinkSuite.scala:340)

It's consistent, so I believe this patch causes it, but not yet seeing any obvious reason. I'll try to reproduce and isolate locally.

dongjoon-hyun · 2020-07-13T19:02:48Z

@srowen . Interestingly, the root cause of the failure is the following inside ContinuousMemoryStream. Can we revert .map(_.toSeq) and use a different approach?

  private val records = Seq.fill(numPartitions)(new ListBuffer[UnsafeRow])
- private val recordEndpoint = new ContinuousRecordEndpoint(records, this)
+ private val recordEndpoint = new ContinuousRecordEndpoint(records.map(_.toSeq), this)

srowen · 2020-07-13T20:51:15Z

Oh good catch, I think that's it. The issue is that ListBuffer.toSeq does not simply return itself (not sure why). So this passes copies of the mutable ListBuffer which is later updated, so the component never sees the update.

This is the kind of bug to look out for, to be sure - even in 2.12, I suppose, given this. I am slightly worried there's a case that tests don't catch. I manually reviewed this PR vs occurrences of ListBuffer and don't think there are more instances here, but will examine the previous PR too.

I suspect this kind of thing will come up in a few places in the 2.13 build as .toSeq will in these cases most definitely make a copy of a mutable collection. Which is even good in a way, as we should update the code to be clearer about the fact that it's specifically a mutable Seq being passed -- but, will probably require more debugging in 2.13 later.

dongjoon-hyun · 2020-07-13T20:53:15Z

Thank you for the fix, @srowen !

SparkQA · 2020-07-14T02:01:45Z

Test build #125790 has finished for PR 29078 at commit 370dabe.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class ContinuousRecordEndpoint(buckets: Seq[mutable.Seq[UnsafeRow]], lock: Object)

HyukjinKwon

Looks fine to me

dongjoon-hyun

+1, LGTM. Thank you, @srowen and @HyukjinKwon .
Merged to master for Apache Spark 3.1.0 on December 2020.

… for Scala 2.13 compilation ### What changes were proposed in this pull request? Same as #29078 and #28971 . This makes the rest of the default modules (i.e. those you get without specifying `-Pyarn` etc) compile under Scala 2.13. It does not close the JIRA, as a result. this also of course does not demonstrate that tests pass yet in 2.13. Note, this does not fix the `repl` module; that's separate. ### Why are the changes needed? Eventually, we need to support a Scala 2.13 build, perhaps in Spark 3.1. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests. (2.13 was not tested; this is about getting it to compile without breaking 2.12) Closes #29111 from srowen/SPARK-29292.3. Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

Get streaming, catalyst, sql compiling for Scala 2.13

1cd9dd7

srowen self-assigned this Jul 12, 2020

probot-autolabeler bot added DSTREAM SQL labels Jul 12, 2020

srowen commented Jul 12, 2020

View reviewed changes

Fix JavaConsumerStrategySuite

00eceec

Fix JavaConsumerStrategySuite

77fb2be

Fix ContinuousMemoryStream

370dabe

HyukjinKwon approved these changes Jul 14, 2020

View reviewed changes

dongjoon-hyun approved these changes Jul 14, 2020

View reviewed changes

dongjoon-hyun closed this in d6a68e0 Jul 14, 2020

srowen mentioned this pull request Jul 14, 2020

[SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation #29111

Closed

srowen deleted the SPARK-29292.2 branch August 16, 2020 18:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-29292][STREAMING][SQL][BUILD] Get streaming, catalyst, sql compiling for Scala 2.13 #29078

[SPARK-29292][STREAMING][SQL][BUILD] Get streaming, catalyst, sql compiling for Scala 2.13 #29078

srowen commented Jul 12, 2020

srowen Jul 12, 2020

srowen Jul 12, 2020

dongjoon-hyun Jul 14, 2020

dongjoon-hyun commented Jul 12, 2020

SparkQA commented Jul 12, 2020

SparkQA commented Jul 13, 2020

SparkQA commented Jul 13, 2020

srowen commented Jul 13, 2020

SparkQA commented Jul 13, 2020

dongjoon-hyun commented Jul 13, 2020

SparkQA commented Jul 13, 2020

srowen commented Jul 13, 2020

dongjoon-hyun commented Jul 13, 2020 •

edited

Loading

srowen commented Jul 13, 2020

dongjoon-hyun commented Jul 13, 2020

SparkQA commented Jul 14, 2020

HyukjinKwon left a comment

dongjoon-hyun left a comment

[SPARK-29292][STREAMING][SQL][BUILD] Get streaming, catalyst, sql compiling for Scala 2.13 #29078

[SPARK-29292][STREAMING][SQL][BUILD] Get streaming, catalyst, sql compiling for Scala 2.13 #29078

Conversation

srowen commented Jul 12, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

srowen Jul 12, 2020

Choose a reason for hiding this comment

srowen Jul 12, 2020

Choose a reason for hiding this comment

dongjoon-hyun Jul 14, 2020

Choose a reason for hiding this comment

dongjoon-hyun commented Jul 12, 2020

SparkQA commented Jul 12, 2020

SparkQA commented Jul 13, 2020

SparkQA commented Jul 13, 2020

srowen commented Jul 13, 2020

SparkQA commented Jul 13, 2020

dongjoon-hyun commented Jul 13, 2020

SparkQA commented Jul 13, 2020

srowen commented Jul 13, 2020

dongjoon-hyun commented Jul 13, 2020 • edited Loading

srowen commented Jul 13, 2020

dongjoon-hyun commented Jul 13, 2020

SparkQA commented Jul 14, 2020

HyukjinKwon left a comment

Choose a reason for hiding this comment

dongjoon-hyun left a comment

Choose a reason for hiding this comment

dongjoon-hyun commented Jul 13, 2020 •

edited

Loading