Skip to content

Commit

Permalink
[SPARK-45771][CORE] Enable spark.eventLog.rolling.enabled by default
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

This PR aims to enable `spark.eventLog.rolling.enabled` by default for Apache Spark 4.0.0.

### Why are the changes needed?

Since Apache Spark 3.0.0, we have been using event log rolling not only for **long-running jobs**, but also for **some failed jobs** to archive the partial event logs incrementally.
- #25670

### Does this PR introduce _any_ user-facing change?

- No because `spark.eventLog.enabled` is disabled by default.
- For the users with `spark.eventLog.enabled=true`, yes, `spark-events` directory will have different layouts. However, all 3.3+ `Spark History Server` can read both old and new event logs. I believe that the event log users are already using this configuration to avoid the loss of event logs for long-running jobs and some failed jobs.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43638 from dongjoon-hyun/SPARK-45771.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
  • Loading branch information
dongjoon-hyun committed Nov 2, 2023
1 parent 5970d35 commit 653b31e
Show file tree
Hide file tree
Showing 6 changed files with 8 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,7 @@ package object config {
"each event log file to the configured size.")
.version("3.0.0")
.booleanConf
.createWithDefault(false)
.createWithDefault(true)

private[spark] val EVENT_LOG_ROLLING_MAX_FILE_SIZE =
ConfigBuilder("spark.eventLog.rolling.maxFileSize")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -731,6 +731,7 @@ class SparkSubmitSuite
"--conf", "spark.master.rest.enabled=false",
"--conf", "spark.executorEnv.HADOOP_CREDSTORE_PASSWORD=secret_password",
"--conf", "spark.eventLog.enabled=true",
"--conf", "spark.eventLog.rolling.enabled=false",
"--conf", "spark.eventLog.testing=true",
"--conf", s"spark.eventLog.dir=${testDirPath.toUri.toString}",
"--conf", "spark.hadoop.fs.defaultFS=unsupported://example.com",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ abstract class EventLogFileWritersSuite extends SparkFunSuite with LocalSparkCon
conf.set(EVENT_LOG_DIR, testDir.toString)

// default config
buildWriterAndVerify(conf, classOf[SingleEventLogFileWriter])
buildWriterAndVerify(conf, classOf[RollingEventLogFilesWriter])

conf.set(EVENT_LOG_ENABLE_ROLLING, true)
buildWriterAndVerify(conf, classOf[RollingEventLogFilesWriter])
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ object EventLogTestHelper {
def getLoggingConf(logDir: Path, compressionCodec: Option[String] = None): SparkConf = {
val conf = new SparkConf
conf.set(EVENT_LOG_ENABLED, true)
conf.set(EVENT_LOG_ENABLE_ROLLING, false)
conf.set(EVENT_LOG_BLOCK_UPDATES, true)
conf.set(EVENT_LOG_TESTING, true)
conf.set(EVENT_LOG_DIR, logDir.toString)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ import org.apache.spark.deploy.SparkHadoopUtil
import org.apache.spark.deploy.history.{EventLogFileReader, SingleEventLogFileWriter}
import org.apache.spark.deploy.history.EventLogTestHelper._
import org.apache.spark.executor.{ExecutorMetrics, TaskMetrics}
import org.apache.spark.internal.config.{EVENT_LOG_COMPRESS, EVENT_LOG_DIR, EVENT_LOG_ENABLED}
import org.apache.spark.internal.config.{EVENT_LOG_COMPRESS, EVENT_LOG_DIR, EVENT_LOG_ENABLE_ROLLING, EVENT_LOG_ENABLED}
import org.apache.spark.io._
import org.apache.spark.metrics.{ExecutorMetricType, MetricsSystem}
import org.apache.spark.resource.ResourceProfile
Expand Down Expand Up @@ -163,6 +163,7 @@ class EventLoggingListenerSuite extends SparkFunSuite with LocalSparkContext wit
test("SPARK-31764: isBarrier should be logged in event log") {
val conf = new SparkConf()
conf.set(EVENT_LOG_ENABLED, true)
conf.set(EVENT_LOG_ENABLE_ROLLING, false)
conf.set(EVENT_LOG_COMPRESS, false)
conf.set(EVENT_LOG_DIR, testDirPath.toString)
val sc = new SparkContext("local", "test-SPARK-31764", conf)
Expand Down
2 changes: 2 additions & 0 deletions docs/core-migration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ license: |

## Upgrading from Core 3.5 to 4.0

- Since Spark 4.0, Spark will roll event logs to archive them incrementally. To restore the behavior before Spark 4.0, you can set `spark.eventLog.rolling.enabled` to `false`.

- Since Spark 4.0, Spark will compress event logs. To restore the behavior before Spark 4.0, you can set `spark.eventLog.compress` to `false`.

- Since Spark 4.0, `spark.shuffle.service.db.backend` is set to `ROCKSDB` by default which means Spark will use RocksDB store for shuffle service. To restore the behavior before Spark 4.0, you can set `spark.shuffle.service.db.backend` to `LEVELDB`.
Expand Down

0 comments on commit 653b31e

Please sign in to comment.