Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-34060][SQL][FOLLOWUP] Preserve serializability of canonicalized CatalogTable #31197

Closed

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Jan 15, 2021

What changes were proposed in this pull request?

Replace toMap by map(identity).toMap while getting canonicalized representation of CatalogTable. CatalogTable became not serializable after #31112 due to usage of filterKeys. The workaround was taken from scala/bug#7005.

Why are the changes needed?

This prevents the errors like:

[info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1
[info]   Cause: java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1

Does this PR introduce any user-facing change?

Should not.

How was this patch tested?

By running the test suite affected by #31112:

$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableDropPartitionSuite"

@SparkQA
Copy link

SparkQA commented Jan 15, 2021

Test build #134107 has started for PR 31197 at commit 81d6786.

@SparkQA
Copy link

SparkQA commented Jan 15, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38691/

@github-actions github-actions bot added the SQL label Jan 15, 2021
@SparkQA
Copy link

SparkQA commented Jan 15, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38691/

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.
According to the original patch, do we need this for master/3.1/3.0, @MaxGekk ?

@SparkQA
Copy link

SparkQA commented Jan 15, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38707/

@MaxGekk
Copy link
Member Author

MaxGekk commented Jan 15, 2021

According to the original patch, do we need this for master/3.1/3.0 ...

@dongjoon-hyun Yes

@SparkQA
Copy link

SparkQA commented Jan 15, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38707/

@SparkQA
Copy link

SparkQA commented Jan 15, 2021

Test build #134124 has finished for PR 31197 at commit d26ea30.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

dongjoon-hyun pushed a commit that referenced this pull request Jan 16, 2021
…d CatalogTable

### What changes were proposed in this pull request?
Replace `toMap` by `map(identity).toMap` while getting canonicalized representation of `CatalogTable`. `CatalogTable` became not serializable after #31112 due to usage of `filterKeys`. The workaround was taken from scala/bug#7005.

### Why are the changes needed?
This prevents the errors like:
```
[info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1
[info]   Cause: java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1
```

### Does this PR introduce _any_ user-facing change?
Should not.

### How was this patch tested?
By running the test suite affected by #31112:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableDropPartitionSuite"
```

Closes #31197 from MaxGekk/fix-caching-hive-table-2-followup.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit c3d81fb)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun pushed a commit that referenced this pull request Jan 16, 2021
…d CatalogTable

### What changes were proposed in this pull request?
Replace `toMap` by `map(identity).toMap` while getting canonicalized representation of `CatalogTable`. `CatalogTable` became not serializable after #31112 due to usage of `filterKeys`. The workaround was taken from scala/bug#7005.

### Why are the changes needed?
This prevents the errors like:
```
[info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1
[info]   Cause: java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1
```

### Does this PR introduce _any_ user-facing change?
Should not.

### How was this patch tested?
By running the test suite affected by #31112:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableDropPartitionSuite"
```

Closes #31197 from MaxGekk/fix-caching-hive-table-2-followup.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit c3d81fb)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@dongjoon-hyun
Copy link
Member

Thank you, @MaxGekk . Merged to master/3.1/3.0.

@dongjoon-hyun
Copy link
Member

Do you think if we can have a way for us to prevent this kind of regression, @cloud-fan , @HyukjinKwon , @MaxGekk ?

@dongjoon-hyun
Copy link
Member

Also, cc @gatorsmile

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jan 17, 2021

I think it's a bug fixed in Scala 2.13. Once we switch it to 2.13 by default and drop 2.12, it will prevent permanently I guess.

@dongjoon-hyun
Copy link
Member

I think we cannot drop 2.12 easily in Apache Spark 3.x line like Hadoop 2.7.x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants