Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-29292][STREAMING][SQL][BUILD] Get streaming, catalyst, sql compiling for Scala 2.13 #29078

Closed
wants to merge 4 commits into from

Conversation

srowen
Copy link
Member

@srowen srowen commented Jul 12, 2020

What changes were proposed in this pull request?

Continuation of #28971 which lets streaming, catalyst and sql compile for 2.13. Same idea.

Why are the changes needed?

Eventually, we need to support a Scala 2.13 build, perhaps in Spark 3.1.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests. (2.13 was not tested; this is about getting it to compile without breaking 2.12)

@@ -46,7 +46,7 @@ class HadoopFileLinesReader(

def this(file: PartitionedFile, conf: Configuration) = this(file, None, conf)

private val iterator = {
private val _iterator = {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to resolve a weird compile error:

[ERROR] [Error] /Users/seanowen/Documents/spark_2.13/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala:49: weaker access privileges in overriding
final def iterator: Iterator[org.apache.hadoop.io.Text] (defined in trait Iterator)
  override should not be private
[ERROR] [Error] /Users/seanowen/Documents/spark_2.13/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileWholeTextReader.scala:38: weaker access privileges in overriding
final def iterator: Iterator[org.apache.hadoop.io.Text] (defined in trait Iterator)
  override should not be private

Not sure, but easy to avoid the name conflict entirely.

@@ -223,16 +223,6 @@ class DatasetPrimitiveSuite extends QueryTest with SharedSparkSession {
checkDataset(Seq(Queue(true)).toDS(), Queue(true))
checkDataset(Seq(Queue("test")).toDS(), Queue("test"))
checkDataset(Seq(Queue(Tuple1(1))).toDS(), Queue(Tuple1(1)))

checkDataset(Seq(ArrayBuffer(1)).toDS(), ArrayBuffer(1))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For complex reasons, I don't think these tests / use cases are possible as-is in Scala 2.13. They are niche, so I jsut removed them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this means the removal of test coverage in Scala 2.12, I'm +1 for now. We can add back later after we finished everything in Scala 2.13.

@dongjoon-hyun
Copy link
Member

Thank you, @srowen .

@SparkQA
Copy link

SparkQA commented Jul 12, 2020

Test build #125735 has finished for PR 29078 at commit 1cd9dd7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 13, 2020

Test build #125738 has finished for PR 29078 at commit 00eceec.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 13, 2020

Test build #125742 has finished for PR 29078 at commit 77fb2be.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member Author

srowen commented Jul 13, 2020

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Jul 13, 2020

Test build #125745 has started for PR 29078 at commit 77fb2be.

@dongjoon-hyun
Copy link
Member

Retest this please

@SparkQA
Copy link

SparkQA commented Jul 13, 2020

Test build #125779 has finished for PR 29078 at commit 77fb2be.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member Author

srowen commented Jul 13, 2020

Weird one:

KafkaContinuousSinkSuite:
[info] - streaming - write to kafka with topic field *** FAILED *** (31 seconds, 238 milliseconds)
[info]   The code passed to eventually never returned normally. Attempted 14 times over 30.741770862000003 seconds. Last failure message: 
[info]   Decoded objects do not match expected objects:
[info]   expected: WrappedArray(1, 2, 3, 4, 5)
[info]   actual:   WrappedArray()
[info]   assertnotnull(upcast(getcolumnbyordinal(0, IntegerType), IntegerType, - root class: "scala.Int"))
[info]   +- upcast(getcolumnbyordinal(0, IntegerType), IntegerType, - root class: "scala.Int")
[info]      +- getcolumnbyordinal(0, IntegerType)
[info]   
[info]            . (KafkaSinkSuite.scala:340)

It's consistent, so I believe this patch causes it, but not yet seeing any obvious reason. I'll try to reproduce and isolate locally.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jul 13, 2020

@srowen . Interestingly, the root cause of the failure is the following inside ContinuousMemoryStream. Can we revert .map(_.toSeq) and use a different approach?

  private val records = Seq.fill(numPartitions)(new ListBuffer[UnsafeRow])
- private val recordEndpoint = new ContinuousRecordEndpoint(records, this)
+ private val recordEndpoint = new ContinuousRecordEndpoint(records.map(_.toSeq), this)

@srowen
Copy link
Member Author

srowen commented Jul 13, 2020

Oh good catch, I think that's it. The issue is that ListBuffer.toSeq does not simply return itself (not sure why). So this passes copies of the mutable ListBuffer which is later updated, so the component never sees the update.

This is the kind of bug to look out for, to be sure - even in 2.12, I suppose, given this. I am slightly worried there's a case that tests don't catch. I manually reviewed this PR vs occurrences of ListBuffer and don't think there are more instances here, but will examine the previous PR too.

I suspect this kind of thing will come up in a few places in the 2.13 build as .toSeq will in these cases most definitely make a copy of a mutable collection. Which is even good in a way, as we should update the code to be clearer about the fact that it's specifically a mutable Seq being passed -- but, will probably require more debugging in 2.13 later.

@dongjoon-hyun
Copy link
Member

Thank you for the fix, @srowen !

@SparkQA
Copy link

SparkQA commented Jul 14, 2020

Test build #125790 has finished for PR 29078 at commit 370dabe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class ContinuousRecordEndpoint(buckets: Seq[mutable.Seq[UnsafeRow]], lock: Object)

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @srowen and @HyukjinKwon .
Merged to master for Apache Spark 3.1.0 on December 2020.

dongjoon-hyun pushed a commit that referenced this pull request Jul 15, 2020
… for Scala 2.13 compilation

### What changes were proposed in this pull request?

Same as #29078 and #28971 . This makes the rest of the default modules (i.e. those you get without specifying `-Pyarn` etc) compile under Scala 2.13. It does not close the JIRA, as a result. this also of course does not demonstrate that tests pass yet in 2.13.

Note, this does not fix the `repl` module; that's separate.

### Why are the changes needed?

Eventually, we need to support a Scala 2.13 build, perhaps in Spark 3.1.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests. (2.13 was not tested; this is about getting it to compile without breaking 2.12)

Closes #29111 from srowen/SPARK-29292.3.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@srowen srowen deleted the SPARK-29292.2 branch August 16, 2020 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants