Skip to content

Commit

Permalink
ODP-1304 [SPARK-44914][BUILD] Upgrade Apache Ivy to 2.5.2
Browse files Browse the repository at this point in the history
This PR aims to upgrade Apache Ivy to 2.5.2 and protect old Ivy-based systems like old Spark from Apache Ivy 2.5.2's incompatibility by introducing a new `.ivy2.5.2` directory.

- Apache Spark 4.0.0 will create this once and reuse this directory while all the other systems like old Sparks uses the old one, `.ivy2`. So, the behavior is the same with the case where Apache Spark 4.0.0 is installed and used in a new machine.

- For the environments with `User-provided Ivy-path`es, the user might hit the incompatibility still. However, the users can mitigate them because they already have full control on `Ivy-path`es.

This was tried once and reverted logically due to Java 11 and Java 17 failures in Daily CIs.
- apache#42613
- apache#42668

Currently, PR Builder also fails as of now. If the PR passes CIes, we can achieve the following.

- [Release notes](https://lists.apache.org/thread/9gcz4xrsn8c7o9gb377xfzvkb8jltffr)
    - FIX: CVE-2022-46751: Apache Ivy Is Vulnerable to XML External Entity Injections

No.

Pass the CIs including `HiveExternalCatalogVersionsSuite`.

No.

Closes apache#45075 from dongjoon-hyun/SPARK-44914.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 3baa60a)
[SPARK-44968][BUILD] Downgrade ivy from 2.5.2 to 2.5.1

### What changes were proposed in this pull request?
After upgrading Ivy from 2.5.1 to 2.5.2 in SPARK-44914, daily tests for Java 11 and Java 17 began to experience ABORTED in the `HiveExternalCatalogVersionsSuite` test.

Java 11

- https://github.com/apache/spark/actions/runs/5953716283/job/16148657660
- https://github.com/apache/spark/actions/runs/5966131923/job/16185159550

Java 17

- https://github.com/apache/spark/actions/runs/5956925790/job/16158714165
- https://github.com/apache/spark/actions/runs/5969348559/job/16195073478

```
2023-08-23T23:00:49.6547573Z [info]   2023-08-23 16:00:48.209 - stdout> : java.lang.RuntimeException: problem during retrieve of org.apache.spark#spark-submit-parent-4c061f04-b951-4d06-8909-cde5452988d9: java.lang.RuntimeException: Multiple artifacts of the module log4j#log4j;1.2.17 are retrieved to the same file! Update the retrieve pattern to fix this error.
2023-08-23T23:00:49.6548745Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:238)
2023-08-23T23:00:49.6549572Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:89)
2023-08-23T23:00:49.6550334Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.ivy.Ivy.retrieve(Ivy.java:551)
2023-08-23T23:00:49.6551079Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1464)
2023-08-23T23:00:49.6552024Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.hive.client.IsolatedClientLoader$.$anonfun$downloadVersion$2(IsolatedClientLoader.scala:138)
2023-08-23T23:00:49.6552884Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.catalyst.util.package$.quietly(package.scala:42)
2023-08-23T23:00:49.6553755Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.hive.client.IsolatedClientLoader$.downloadVersion(IsolatedClientLoader.scala:138)
2023-08-23T23:00:49.6554705Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.hive.client.IsolatedClientLoader$.liftedTree1$1(IsolatedClientLoader.scala:65)
2023-08-23T23:00:49.6555637Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.hive.client.IsolatedClientLoader$.forVersion(IsolatedClientLoader.scala:64)
2023-08-23T23:00:49.6556554Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:443)
2023-08-23T23:00:49.6557340Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:356)
2023-08-23T23:00:49.6558187Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:71)
2023-08-23T23:00:49.6559061Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:70)
2023-08-23T23:00:49.6559962Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:224)
2023-08-23T23:00:49.6560766Z [info]   2023-08-23 16:00:48.209 - stdout> 	at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
2023-08-23T23:00:49.6561584Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102)
2023-08-23T23:00:49.6562510Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:224)
2023-08-23T23:00:49.6563435Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:150)
2023-08-23T23:00:49.6564323Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:140)
2023-08-23T23:00:49.6565340Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:45)
2023-08-23T23:00:49.6566321Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:60)
2023-08-23T23:00:49.6567363Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:118)
2023-08-23T23:00:49.6568372Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:118)
2023-08-23T23:00:49.6569393Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:490)
2023-08-23T23:00:49.6570685Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:155)
2023-08-23T23:00:49.6571842Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:113)
2023-08-23T23:00:49.6572932Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:111)
2023-08-23T23:00:49.6573996Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:125)
2023-08-23T23:00:49.6575045Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:97)
2023-08-23T23:00:49.6576066Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
2023-08-23T23:00:49.6576937Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
2023-08-23T23:00:49.6577807Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
2023-08-23T23:00:49.6578620Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
2023-08-23T23:00:49.6579432Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
2023-08-23T23:00:49.6580357Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97)
2023-08-23T23:00:49.6581331Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:93)
2023-08-23T23:00:49.6582239Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
2023-08-23T23:00:49.6583101Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
2023-08-23T23:00:49.6584088Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
2023-08-23T23:00:49.6585236Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
2023-08-23T23:00:49.6586519Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
2023-08-23T23:00:49.6587686Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
2023-08-23T23:00:49.6588898Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
2023-08-23T23:00:49.6590014Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
2023-08-23T23:00:49.6590993Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
2023-08-23T23:00:49.6591930Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:93)
2023-08-23T23:00:49.6592914Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:80)
2023-08-23T23:00:49.6593856Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:78)
2023-08-23T23:00:49.6594687Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219)
2023-08-23T23:00:49.6595379Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
2023-08-23T23:00:49.6596103Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
2023-08-23T23:00:49.6596807Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
2023-08-23T23:00:49.6597520Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
2023-08-23T23:00:49.6598276Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
2023-08-23T23:00:49.6599022Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
2023-08-23T23:00:49.6599819Z [info]   2023-08-23 16:00:48.209 - stdout> 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2023-08-23T23:00:49.6600723Z [info]   2023-08-23 16:00:48.209 - stdout> 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
2023-08-23T23:00:49.6601707Z [info]   2023-08-23 16:00:48.209 - stdout> 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2023-08-23T23:00:49.6602513Z [info]   2023-08-23 16:00:48.209 - stdout> 	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
2023-08-23T23:00:49.6603272Z [info]   2023-08-23 16:00:48.209 - stdout> 	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
2023-08-23T23:00:49.6604007Z [info]   2023-08-23 16:00:48.209 - stdout> 	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
2023-08-23T23:00:49.6604724Z [info]   2023-08-23 16:00:48.209 - stdout> 	at py4j.Gateway.invoke(Gateway.java:282)
2023-08-23T23:00:49.6605416Z [info]   2023-08-23 16:00:48.209 - stdout> 	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
2023-08-23T23:00:49.6606209Z [info]   2023-08-23 16:00:48.209 - stdout> 	at py4j.commands.CallCommand.execute(CallCommand.java:79)
2023-08-23T23:00:49.6606969Z [info]   2023-08-23 16:00:48.209 - stdout> 	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
2023-08-23T23:00:49.6607743Z [info]   2023-08-23 16:00:48.209 - stdout> 	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
2023-08-23T23:00:49.6608415Z [info]   2023-08-23 16:00:48.209 - stdout> 	at java.base/java.lang.Thread.run(Thread.java:833)
2023-08-23T23:00:49.6609288Z [info]   2023-08-23 16:00:48.209 - stdout> Caused by: java.lang.RuntimeException: Multiple artifacts of the module log4j#log4j;1.2.17 are retrieved to the same file! Update the retrieve pattern to fix this error.
2023-08-23T23:00:49.6610288Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.ivy.core.retrieve.RetrieveEngine.determineArtifactsToCopy(RetrieveEngine.java:426)
2023-08-23T23:00:49.6611332Z [info]   2023-08-23 16:00:48.209 - stdout> 	at org.apache.ivy.core.retrieve.RetrieveEngine.retrieve(RetrieveEngine.java:122)
2023-08-23T23:00:49.6612046Z [info]   2023-08-23 16:00:48.209 - stdout> 	... 66 more
2023-08-23T23:00:49.6612498Z [info]   2023-08-23 16:00:48.209 - stdout>
```

So this pr downgrade ivy from 2.5.2 to 2.5.1 to restore Java 11/17 daily tests.

### Why are the changes needed?
To restore Java 11/17 daily tests.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By changing the default Java version in `build_and_test.yml` to 17 for verification, the tests succeed after downgrading the Ivy to 2.5.1.

- https://github.com/LuciferYang/spark/actions/runs/5972232677/job/16209970934

<img width="1116" alt="image" src="https://github.com/apache/spark/assets/1475305/cd4002d8-893d-4845-8b2e-c01ff3106f7f">

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#42668 from LuciferYang/test-java17.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
(cherry picked from commit 4f8a199)
[SPARK-44914][BUILD] Upgrade `Apache ivy` from 2.5.1 to 2.5.2

Upgrade Apache ivy from 2.5.1 to 2.5.2

[Release notes](https://lists.apache.org/thread/9gcz4xrsn8c7o9gb377xfzvkb8jltffr)

[CVE-2022-46751](https://www.cve.org/CVERecord?id=CVE-2022-46751)

The fix apache/ant-ivy@2be17bc
No.

Pass GA

No.

Closes apache#42613 from bjornjorgensen/ivy-2.5.2.

Authored-by: Bjørn Jørgensen <bjornjorgensen@gmail.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
(cherry picked from commit 611e17e)
[SPARK-41030][BUILD] Upgrade `Apache Ivy` to 2.5.1

Upgrade `Apache Ivy` from 2.5.0 to 2.5.1
[Release  notes](https://ant.apache.org/ivy/history/2.5.1/release-notes.html)

[CVE-2022-37865](https://www.cve.org/CVERecord?id=CVE-2022-37865)
and
[CVE-2022-37866](https://nvd.nist.gov/vuln/detail/CVE-2022-37866)
No.

Pass GA

Closes apache#38539 from bjornjorgensen/ivy-2.5.1.

Authored-by: Bjørn <bjornjorgensen@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 4bbdca6)
(cherry picked from commit 0e5fa79)

# Conflicts:
#	dev/deps/spark-deps-hadoop-2-hive-2.3
#	dev/deps/spark-deps-hadoop-3-hive-2.3
#	docs/core-migration-guide.md
#	pom.xml
(cherry picked from commit 222356d)
  • Loading branch information
dongjoon-hyun authored and prabhjyotsingh committed May 12, 2024
1 parent 555afba commit 58ad52c
Show file tree
Hide file tree
Showing 8 changed files with 955 additions and 258 deletions.
658 changes: 658 additions & 0 deletions common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -2286,10 +2286,10 @@ package object config {
.doc("Path to specify the Ivy user directory, used for the local Ivy cache and " +
"package files from spark.jars.packages. " +
"This will override the Ivy property ivy.default.ivy.user.dir " +
"which defaults to ~/.ivy2.")
"which defaults to ~/.ivy2.5.2")
.version("1.3.0")
.stringConf
.createOptional
.createWithDefault("~/.ivy2.5.2")

private[spark] val JAR_IVY_SETTING_PATH =
ConfigBuilder("spark.jars.ivySettings")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -377,7 +377,8 @@ private[deploy] object IvyTestUtils {
f(repo.toURI.toString)
} finally {
// Clean up
if (repo.toString.contains(".m2") || repo.toString.contains(".ivy2")) {
if (repo.toString.contains(".m2") || repo.toString.contains(".ivy2") ||
repo.toString.contains(".ivy2.5.2")) {
val groupDir = getBaseGroupDirectory(artifact, useIvyLayout)
FileUtils.deleteDirectory(new File(repo, groupDir + File.separator + artifact.artifactId))
deps.foreach { _.foreach { dep =>
Expand Down
212 changes: 108 additions & 104 deletions dev/deps/spark-deps-hadoop-2-hive-2.3

Large diffs are not rendered by default.

298 changes: 150 additions & 148 deletions dev/deps/spark-deps-hadoop-3-hive-2.3

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions dev/run-tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -472,6 +472,8 @@ def main():
rm_r(os.path.join(SPARK_HOME, "work"))
rm_r(os.path.join(USER_HOME, ".ivy2", "local", "org.apache.spark"))
rm_r(os.path.join(USER_HOME, ".ivy2", "cache", "org.apache.spark"))
rm_r(os.path.join(USER_HOME, ".ivy2.5.2", "local", "org.apache.spark"))
rm_r(os.path.join(USER_HOME, ".ivy2.5.2", "cache", "org.apache.spark"))

os.environ["CURRENT_BLOCK"] = str(ERROR_CODES["BLOCK_GENERAL"])

Expand Down
34 changes: 32 additions & 2 deletions docs/core-migration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,42 @@ license: |
* Table of contents
{:toc}

## Upgrading from Core 3.5 to 4.0

- Since Spark 4.0, Spark will roll event logs to archive them incrementally. To restore the behavior before Spark 4.0, you can set `spark.eventLog.rolling.enabled` to `false`.

- Since Spark 4.0, Spark will compress event logs. To restore the behavior before Spark 4.0, you can set `spark.eventLog.compress` to `false`.

- Since Spark 4.0, Spark workers will clean up worker and stopped application directories periodically. To restore the behavior before Spark 4.0, you can set `spark.worker.cleanup.enabled` to `false`.

- Since Spark 4.0, `spark.shuffle.service.db.backend` is set to `ROCKSDB` by default which means Spark will use RocksDB store for shuffle service. To restore the behavior before Spark 4.0, you can set `spark.shuffle.service.db.backend` to `LEVELDB`.

- In Spark 4.0, support for Apache Mesos as a resource manager was removed.

- Since Spark 4.0, Spark uses `ReadWriteOncePod` instead of `ReadWriteOnce` access mode in persistence volume claims. To restore the legacy behavior, you can set `spark.kubernetes.legacy.useReadWriteOnceAccessMode` to `true`.

- Since Spark 4.0, Spark uses `~/.ivy2.5.2` as Ivy user directory by default to isolate the existing systems from Apache Ivy's incompatibility. To restore the legacy behavior, you can set `spark.jars.ivy` to `~/.ivy2`.

## Upgrading from Core 3.4 to 3.5

- Since Spark 3.5, `spark.yarn.executor.failuresValidityInterval` is deprecated. Use `spark.executor.failuresValidityInterval` instead.

- Since Spark 3.5, `spark.yarn.max.executor.failures` is deprecated. Use `spark.executor.maxNumFailures` instead.

## Upgrading from Core 3.3 to 3.4

- Since Spark 3.4, Spark driver will own `PersistentVolumnClaim`s and try to reuse if they are not assigned to live executors. To restore the behavior before Spark 3.4, you can set `spark.kubernetes.driver.ownPersistentVolumeClaim` to `false` and `spark.kubernetes.driver.reusePersistentVolumeClaim` to `false`.

- Since Spark 3.4, Spark driver will track shuffle data when dynamic allocation is enabled without shuffle service. To restore the behavior before Spark 3.4, you can set `spark.dynamicAllocation.shuffleTracking.enabled` to `false`.

- Since Spark 3.4, Spark will try to decommission cached RDD and shuffle blocks if both `spark.decommission.enabled` and `spark.storage.decommission.enabled` are true. To restore the behavior before Spark 3.4, you can set both `spark.storage.decommission.rddBlocks.enabled` and `spark.storage.decommission.shuffleBlocks.enabled` to `false`.

- Since Spark 3.4, Spark will use RocksDB store if `spark.history.store.hybridStore.enabled` is true. To restore the behavior before Spark 3.4, you can set `spark.history.store.hybridStore.diskBackend` to `LEVELDB`.

## Upgrading from Core 3.2 to 3.3

- Since Spark 3.3, Spark migrates its log4j dependency from 1.x to 2.x because log4j 1.x has reached end of life and is no longer supported by the community. Vulnerabilities reported after August 2015 against log4j 1.x were not checked and will not be fixed. Users should rewrite original log4j properties files using log4j2 syntax (XML, JSON, YAML, or properties format). Spark rewrites the `conf/log4j.properties.template` which is included in Spark distribution, to `conf/log4j2.properties.template` with log4j2 properties format.

- Since Spark 3.3.3, `spark.submit.proxyUser.allowCustomClasspathInClusterMode` allows users to disable custom class path in cluster mode by proxy users. It still defaults to `true` to maintain backward compatibility.

## Upgrading from Core 3.1 to 3.2

- Since Spark 3.2, `spark.scheduler.allocation.file` supports read remote file using hadoop filesystem which means if the path has no scheme Spark will respect hadoop configuration to read it. To restore the behavior before Spark 3.2, you can specify the local scheme for `spark.scheduler.allocation.file` e.g. `file:///path/to/file`.
Expand Down
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@
<jetty.version>9.4.48.v20220622</jetty.version>
<jakartaservlet.version>4.0.3</jakartaservlet.version>
<chill.version>0.10.0</chill.version>
<ivy.version>2.5.1</ivy.version>
<ivy.version>2.5.2</ivy.version>
<oro.version>2.0.8</oro.version>
<deltalake.version>2.3.0</deltalake.version>
<hudi.version>0.14.2</hudi.version>
Expand Down

0 comments on commit 58ad52c

Please sign in to comment.