[SPARK-34060][SQL][FOLLOWUP] Preserve serializability of canonicalized CatalogTable #31197

MaxGekk · 2021-01-15T14:27:43Z

What changes were proposed in this pull request?

Replace toMap by map(identity).toMap while getting canonicalized representation of CatalogTable. CatalogTable became not serializable after #31112 due to usage of filterKeys. The workaround was taken from scala/bug#7005.

Why are the changes needed?

This prevents the errors like:

[info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1
[info]   Cause: java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1

Does this PR introduce any user-facing change?

Should not.

How was this patch tested?

By running the test suite affected by #31112:

$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableDropPartitionSuite"

SparkQA · 2021-01-15T14:54:49Z

Test build #134107 has started for PR 31197 at commit 81d6786.

SparkQA · 2021-01-15T15:43:19Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38691/

SparkQA · 2021-01-15T16:15:29Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38691/

dongjoon-hyun

+1, LGTM.
According to the original patch, do we need this for master/3.1/3.0, @MaxGekk ?

SparkQA · 2021-01-15T19:50:08Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38707/

MaxGekk · 2021-01-15T20:06:19Z

According to the original patch, do we need this for master/3.1/3.0 ...

@dongjoon-hyun Yes

SparkQA · 2021-01-15T20:27:40Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38707/

SparkQA · 2021-01-15T21:33:48Z

Test build #134124 has finished for PR 31197 at commit d26ea30.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

…d CatalogTable ### What changes were proposed in this pull request? Replace `toMap` by `map(identity).toMap` while getting canonicalized representation of `CatalogTable`. `CatalogTable` became not serializable after #31112 due to usage of `filterKeys`. The workaround was taken from scala/bug#7005. ### Why are the changes needed? This prevents the errors like: ``` [info] org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1 [info] Cause: java.io.NotSerializableException: scala.collection.immutable.MapLike$$anon$1 ``` ### Does this PR introduce _any_ user-facing change? Should not. ### How was this patch tested? By running the test suite affected by #31112: ``` $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableDropPartitionSuite" ``` Closes #31197 from MaxGekk/fix-caching-hive-table-2-followup. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit c3d81fb) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

dongjoon-hyun · 2021-01-16T01:03:35Z

Thank you, @MaxGekk . Merged to master/3.1/3.0.

dongjoon-hyun · 2021-01-16T23:33:20Z

Do you think if we can have a way for us to prevent this kind of regression, @cloud-fan , @HyukjinKwon , @MaxGekk ?

dongjoon-hyun · 2021-01-16T23:34:03Z

Also, cc @gatorsmile

HyukjinKwon · 2021-01-17T09:32:40Z

I think it's a bug fixed in Scala 2.13. Once we switch it to 2.13 by default and drop 2.12, it will prevent permanently I guess.

dongjoon-hyun · 2021-01-18T00:10:11Z

I think we cannot drop 2.12 easily in Apache Spark 3.x line like Hadoop 2.7.x.

toMap -> map(identity)

81d6786

HyukjinKwon approved these changes Jan 15, 2021

View reviewed changes

cloud-fan approved these changes Jan 15, 2021

View reviewed changes

github-actions bot added the SQL label Jan 15, 2021

Fix for scala 2.13

d26ea30

dongjoon-hyun approved these changes Jan 15, 2021

View reviewed changes

dongjoon-hyun closed this in c3d81fb Jan 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-34060][SQL][FOLLOWUP] Preserve serializability of canonicalized CatalogTable #31197

[SPARK-34060][SQL][FOLLOWUP] Preserve serializability of canonicalized CatalogTable #31197

MaxGekk commented Jan 15, 2021 •

edited

Loading

SparkQA commented Jan 15, 2021

SparkQA commented Jan 15, 2021

SparkQA commented Jan 15, 2021

dongjoon-hyun left a comment

SparkQA commented Jan 15, 2021

MaxGekk commented Jan 15, 2021

SparkQA commented Jan 15, 2021

SparkQA commented Jan 15, 2021

dongjoon-hyun commented Jan 16, 2021

dongjoon-hyun commented Jan 16, 2021

dongjoon-hyun commented Jan 16, 2021

HyukjinKwon commented Jan 17, 2021 •

edited

Loading

dongjoon-hyun commented Jan 18, 2021

[SPARK-34060][SQL][FOLLOWUP] Preserve serializability of canonicalized CatalogTable #31197

[SPARK-34060][SQL][FOLLOWUP] Preserve serializability of canonicalized CatalogTable #31197

Conversation

MaxGekk commented Jan 15, 2021 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented Jan 15, 2021

SparkQA commented Jan 15, 2021

SparkQA commented Jan 15, 2021

dongjoon-hyun left a comment

Choose a reason for hiding this comment

SparkQA commented Jan 15, 2021

MaxGekk commented Jan 15, 2021

SparkQA commented Jan 15, 2021

SparkQA commented Jan 15, 2021

dongjoon-hyun commented Jan 16, 2021

dongjoon-hyun commented Jan 16, 2021

dongjoon-hyun commented Jan 16, 2021

HyukjinKwon commented Jan 17, 2021 • edited Loading

dongjoon-hyun commented Jan 18, 2021

MaxGekk commented Jan 15, 2021 •

edited

Loading

HyukjinKwon commented Jan 17, 2021 •

edited

Loading