Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-41708][SQL][FOLLOWUP] WriteFiles should replace exprId using new query #39468

Closed
wants to merge 2 commits into from

Conversation

ulysses-you
Copy link
Contributor

What changes were proposed in this pull request?

This is the followup of #39277, does three things:

  • replace WriteFiles attribute exprId using new query to avoid potential issue
  • remove unnecessary explain info with WriteFiles
  • cleanup unnecessary Logging

Why are the changes needed?

Improve the implementation of WriteFiles

Does this PR introduce any user-facing change?

no

How was this patch tested?

add test

@ulysses-you
Copy link
Contributor Author

cc @cloud-fan

@ulysses-you ulysses-you closed this Jan 9, 2023
@ulysses-you ulysses-you reopened this Jan 9, 2023
@github-actions github-actions bot added the SQL label Jan 9, 2023
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 21e9691 Jan 10, 2023
@ulysses-you ulysses-you deleted the SPARK-41708-followup branch January 10, 2023 23:55
dongjoon-hyun pushed a commit that referenced this pull request Jan 16, 2023
…elper#replaceNotIncludedMsg` to remove `@hashCode`

### What changes were proposed in this pull request?
The daily build GA `Master Scala 2.13 + Hadoop 3 + JDK 8` failed after #39468 merged, the failed tests includes:

- org.apache.spark.sql.SQLQueryTestSuite
- org.apache.spark.sql.hive.thriftserver.ThriftServerQueryTestSuite

and the failure error message is similar to the following:

```
2023-01-11T01:03:46.3478323Z 01:03:46.347 ERROR org.apache.spark.sql.SQLQueryTestSuite: Error using configs:
2023-01-11T01:03:46.3677880Z [info]- explain.sql *** FAILED *** (348 milliseconds)
2023-01-11T01:03:46.3678710Z [info]  explain.sql
2023-01-11T01:03:46.3679479Z [info]  Expected "...es.CatalogFileIndex[7d811218, [key, val]
2023-01-11T01:03:46.3680509Z [info]  +- Project [key#x, val#x]
2023-01-11T01:03:46.3681033Z [info]     +- SubqueryAlias spark_catalog.default.explain_temp4
2023-01-11T01:03:46.3684259Z [info]        +- Relation spark_catalog.default.explain_temp4[key#x,val#x] parquet
2023-01-11T01:03:46.3684922Z [info]
2023-01-11T01:03:46.3685766Z [info]  == Optimized Logical Plan ==
2023-01-11T01:03:46.3687590Z [info]  InsertIntoHadoopFsRelationCommand Location [not included in comparison]/{warehouse_dir}/explain_temp5], Append, `spark_catalog`.`default`.`explain_temp5`, org.apache.spark.sql.execution.datasources.CatalogFileIndex7d811218, [key, val]
2023-01-11T01:03:46.3688465Z [info]  +- WriteFiles
2023-01-11T01:03:46.3690929Z [info]     +- Sort [val#x ASC NULLS FIRST], false
2023-01-11T01:03:46.3691387Z [info]        +- Project [key#x, empty2null(val#x) AS val#x]
2023-01-11T01:03:46.3692078Z [info]           +- Relation spark_catalog.default.explain_temp4[key#x,val#x] parquet
2023-01-11T01:03:46.3692549Z [info]
2023-01-11T01:03:46.3693443Z [info]  == Physical Plan ==
2023-01-11T01:03:46.3695233Z [info]  Execute InsertIntoHadoopFsRelationCommand Location [not included in comparison]/{warehouse_dir}/explain_temp5], Append, `spark_catalog`.`default`.`explain_temp5`, org.apache.spark.sql.execution.datasources.CatalogFileIndex7d811218], [key, val]
2023-01-11T01:03:46.3696100Z [info]  +- Writ...", but got "...es.CatalogFileIndex[cdfa8472, [key, val]
2023-01-11T01:03:46.3698327Z [info]  +- Project [key#x, val#x]
2023-01-11T01:03:46.3698881Z [info]     +- SubqueryAlias spark_catalog.default.explain_temp4
2023-01-11T01:03:46.3699680Z [info]        +- Relation spark_catalog.default.explain_temp4[key#x,val#x] parquet
2023-01-11T01:03:46.3704986Z [info]
2023-01-11T01:03:46.3705457Z [info]  == Optimized Logical Plan ==
2023-01-11T01:03:46.3717140Z [info]  InsertIntoHadoopFsRelationCommand Location [not included in comparison]/{warehouse_dir}/explain_temp5], Append, `spark_catalog`.`default`.`explain_temp5`, org.apache.spark.sql.execution.datasources.CatalogFileIndexcdfa8472, [key, val]
2023-01-11T01:03:46.3718309Z [info]  +- WriteFiles
2023-01-11T01:03:46.3718964Z [info]     +- Sort [val#x ASC NULLS FIRST], false
2023-01-11T01:03:46.3719752Z [info]        +- Project [key#x, empty2null(val#x) AS val#x]
2023-01-11T01:03:46.3723046Z [info]           +- Relation spark_catalog.default.explain_temp4[key#x,val#x] parquet
2023-01-11T01:03:46.3723598Z [info]
2023-01-11T01:03:46.3726955Z [info]  == Physical Plan ==
2023-01-11T01:03:46.3728111Z [info]  Execute InsertIntoHadoopFsRelationCommand Location [not included in comparison]/{warehouse_dir}/explain_temp5], Append, `spark_catalog`.`default`.`explain_temp5`, org.apache.spark.sql.execution.datasources.CatalogFileIndexcdfa8472], [key, val]
2023-01-11T01:03:46.3898445Z [info]  +- Writ..." Result did not match for query #21
2023-01-11T01:03:46.3902948Z [info]  EXPLAIN EXTENDED INSERT INTO TABLE explain_temp5 SELECT * FROM explain_temp4 (SQLQueryTestSuite.scala:495)
2023-01-11T01:03:46.3903881Z [info]  org.scalatest.exceptions.TestFailedException:
2023-01-11T01:03:46.3904492Z [info]  at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
2023-01-11T01:03:46.3905449Z [info]  at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
2023-01-11T01:03:46.3906493Z [info]  at org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564)
2023-01-11T01:03:46.3907683Z [info]  at org.scalatest.Assertions.assertResult(Assertions.scala:847)
2023-01-11T01:03:46.3908243Z [info]  at org.scalatest.Assertions.assertResult$(Assertions.scala:842)
2023-01-11T01:03:46.3908812Z [info]  at org.scalatest.funsuite.AnyFunSuite.assertResult(AnyFunSuite.scala:1564)
2023-01-11T01:03:46.3910011Z [info]  at org.apache.spark.sql.SQLQueryTestSuite.$anonfun$runQueries$11(SQLQueryTestSuite.scala:495)
2023-01-11T01:03:46.3910611Z [info]  at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:563)
2023-01-11T01:03:46.3911163Z [info]  at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:561)
2023-01-11T01:03:46.3912094Z [info]  at scala.collection.AbstractIterable.foreach(Iterable.scala:926)
2023-01-11T01:03:46.3912781Z [info]  at org.apache.spark.sql.SQLQueryTestSuite.$anonfun$runQueries$9(SQLQueryTestSuite.scala:486)
2023-01-11T01:03:46.3913371Z [info]  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
2023-01-11T01:03:46.3914237Z [info]  at org.scalatest.Assertions.withClue(Assertions.scala:1065)
2023-01-11T01:03:46.3915165Z [info]  at org.scalatest.Assertions.withClue$(Assertions.scala:1052)
2023-01-11T01:03:46.3915725Z [info]  at org.scalatest.funsuite.AnyFunSuite.withClue(AnyFunSuite.scala:1564)
2023-01-11T01:03:46.3916341Z [info]  at org.apache.spark.sql.SQLQueryTestSuite.runQueries(SQLQueryTestSuite.scala:462)
2023-01-11T01:03:46.3917485Z [info]  at org.apache.spark.sql.SQLQueryTestSuite.$anonfun$runTest$35(SQLQueryTestSuite.scala:364)
2023-01-11T01:03:46.3918517Z [info]  at org.apache.spark.sql.SQLQueryTestSuite.$anonfun$runTest$35$adapted(SQLQueryTestSuite.scala:362)
2023-01-11T01:03:46.3919102Z [info]  at scala.collection.immutable.List.foreach(List.scala:333)
2023-01-11T01:03:46.3919675Z [info]  at org.apache.spark.sql.SQLQueryTestSuite.runTest(SQLQueryTestSuite.scala:362)
2023-01-11T01:03:46.3921754Z [info]  at org.apache.spark.sql.SQLQueryTestSuite.$anonfun$createScalaTestCase$6(SQLQueryTestSuite.scala:269)
2023-01-11T01:03:46.3922358Z [info]  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
2023-01-11T01:03:46.3923784Z [info]  at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
2023-01-11T01:03:46.3924473Z [info]  at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
2023-01-11T01:03:46.3925286Z [info]  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
2023-01-11T01:03:46.3926199Z [info]  at org.scalatest.Transformer.apply(Transformer.scala:22)
2023-01-11T01:03:46.3927071Z [info]  at org.scalatest.Transformer.apply(Transformer.scala:20)
2023-01-11T01:03:46.3928583Z [info]  at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
2023-01-11T01:03:46.3929225Z [info]  at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:207)
2023-01-11T01:03:46.3930091Z [info]  at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
2023-01-11T01:03:46.3933329Z [info]  at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
2023-01-11T01:03:46.3933893Z [info]  at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
2023-01-11T01:03:46.3934875Z [info]  at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
2023-01-11T01:03:46.3935479Z [info]  at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
2023-01-11T01:03:46.3936453Z [info]  at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:66)
2023-01-11T01:03:46.3937318Z [info]  at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
2023-01-11T01:03:46.3940707Z [info]  at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
2023-01-11T01:03:46.3941350Z [info]  at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:66)
2023-01-11T01:03:46.3941962Z [info]  at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
2023-01-11T01:03:46.3943332Z [info]  at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
2023-01-11T01:03:46.3944504Z [info]  at scala.collection.immutable.List.foreach(List.scala:333)
2023-01-11T01:03:46.3950194Z [info]  at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
2023-01-11T01:03:46.3950748Z [info]  at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
2023-01-11T01:03:46.3951912Z [info]  at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
2023-01-11T01:03:46.3952515Z [info]  at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
2023-01-11T01:03:46.3953476Z [info]  at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
2023-01-11T01:03:46.3954069Z [info]  at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
2023-01-11T01:03:46.3966445Z [info]  at org.scalatest.Suite.run(Suite.scala:1114)
2023-01-11T01:03:46.3967583Z [info]  at org.scalatest.Suite.run$(Suite.scala:1096)
2023-01-11T01:03:46.3968377Z [info]  at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
2023-01-11T01:03:46.3969537Z [info]  at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
2023-01-11T01:03:46.3970510Z [info]  at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
2023-01-11T01:03:46.3971298Z [info]  at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
2023-01-11T01:03:46.3972182Z [info]  at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)
2023-01-11T01:03:46.3973529Z [info]  at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:66)
2023-01-11T01:03:46.3974433Z [info]  at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
2023-01-11T01:03:46.3977778Z [info]  at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
2023-01-11T01:03:46.3984781Z [info]  at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
2023-01-11T01:03:46.3985521Z [info]  at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:66)
2023-01-11T01:03:46.3986684Z [info]  at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321)
2023-01-11T01:03:46.3987264Z [info]  at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517)
2023-01-11T01:03:46.3987774Z [info]  at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:413)
2023-01-11T01:03:46.3988269Z [info]  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
2023-01-11T01:03:46.3989260Z [info]  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
2023-01-11T01:03:46.3996895Z [info]  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
2023-01-11T01:03:46.3997550Z [info]  at java.lang.Thread.run(Thread.java:750)
```

The reason for the failure is that the result of `CatalogFileIndex` printed when using Scala 2.12 is different from Scala 2.13:

When using Scala 2.12, the `hashCode` of `CatalogFileIndex` is `7d811218`

```
InsertIntoHadoopFsRelationCommand Location [not included in comparison]/{warehouse_dir}/explain_temp5], Append, `spark_catalog`.`default`.`explain_temp5`, org.apache.spark.sql.execution.datasources.CatalogFileIndex7d811218, [key, val]
```

When using Scala 2.13, the `hashCode` of `CatalogFileIndex` is `cdfa8472`

```
InsertIntoHadoopFsRelationCommand Location [not included in comparison]/{warehouse_dir}/explain_temp5], Append, `spark_catalog`.`default`.`explain_temp5`, org.apache.spark.sql.execution.datasources.CatalogFileIndexcdfa8472, [key, val]

```

So this add a new `replaceAll` action to `SQLQueryTestHelper#replaceNotIncludedMsg` to remove `hashCode`
to make `CatalogFileIndex` print the same results when using Scala 2.12 and 2.13

### Why are the changes needed?
Make daily build GA `Master Scala 2.13 + Hadoop 3 + JDK 8` can run successfully

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- Pass GitHub Actions
- Manual test with this pr:

```
gh pr checkout 39598
dev/change-scala-version.sh 2.13
build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z explain.sql" -Pscala-2.13
build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z explain-aqe.sql" -Pscala-2.13
build/sbt -Phive-thriftserver "hive-thriftserver/testOnly org.apache.spark.sql.hive.thriftserver.ThriftServerQueryTestSuite -- -z explain.sql" -Pscala-2.13
build/sbt -Phive-thriftserver "hive-thriftserver/testOnly org.apache.spark.sql.hive.thriftserver.ThriftServerQueryTestSuite -- -z explain-aqe.sql" -Pscala-2.13
```

Closes #39598 from LuciferYang/SPARK-41708-FOLLOWUP.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants