Spark heuristic modification #407

ShubhamGupta29 · 2018-07-30T12:19:01Z

Description

Making some changes to the current Spark Heuristics for executor GC, configuration , unified memory.
Changes made are as follows::

Removed the Driver Gc heuristic
If total executor runtime is less than 5 mins then won't flag for the executor GC heuristic
If spark.memory.fraction > 0.05 and unified allocated memory > 256 MB then only consider for peak unified memory severity
Changes in configuration heuristic for spark.executor.core, severity will be NONE if cores <= 4

How this is tested

For these changes, tests are included in the respective tests for the heuristics.

This reverts commit a0470a3.

…uding attribution. This reverts commit e3fd598. Co-authored-by: Abhishek Das <abhishekdas99@users.noreply.github.com>

…istic

skakker · 2018-08-01T05:35:57Z

app/com/linkedin/drelephant/spark/heuristics/UnifiedMemoryHeuristic.scala

@@ -116,10 +116,14 @@ object UnifiedMemoryHeuristic {
      }
    }.max

-    lazy val severity: Severity = if (sparkExecutorMemory <= MemoryFormatUtils.stringToBytes(unifiedMemoryHeuristic.sparkExecutorMemoryThreshold)) {
-      Severity.NONE
+     lazy val severity: Severity = if (sparkMemoryFraction > 0.05D && maxMemory > 268435456L) {


Make these values configurable

skakker · 2018-08-01T05:36:19Z

app/com/linkedin/drelephant/spark/heuristics/ExecutorGcHeuristic.scala

@@ -107,7 +107,12 @@ object ExecutorGcHeuristic {

    var ratio: Double = jvmTime.toDouble / executorRunTimeTotal.toDouble

-    lazy val severityTimeA: Severity = executorGcHeuristic.gcSeverityAThresholds.severityOf(ratio)
+    //If the total Executor Runtime is less then 5 minutes then we won't consider for the severity due to GC
+    lazy val severityTimeA: Severity = if ((executorRunTimeTotal/Statistics.MINUTE_IN_MS) >= 5.0D)


Please make this configurable

skakker · 2018-08-01T05:41:12Z

app/com/linkedin/drelephant/spark/heuristics/UnifiedMemoryHeuristic.scala

@@ -116,10 +116,14 @@ object UnifiedMemoryHeuristic {
      }
    }.max

-    lazy val severity: Severity = if (sparkExecutorMemory <= MemoryFormatUtils.stringToBytes(unifiedMemoryHeuristic.sparkExecutorMemoryThreshold)) {
-      Severity.NONE
+     lazy val severity: Severity = if (sparkMemoryFraction > 0.05D && maxMemory > 268435456L) {


Use MemoryFormatUtils library instead of hard coding the value.

skakker · 2018-08-01T05:43:47Z

app/com/linkedin/drelephant/spark/heuristics/UnifiedMemoryHeuristic.scala

+      if (sparkExecutorMemory <= MemoryFormatUtils.stringToBytes(unifiedMemoryHeuristic.sparkExecutorMemoryThreshold)) {
+        Severity.NONE
+      } else {
+        PEAK_UNIFIED_MEMORY_THRESHOLDS.severityOf(maxUnifiedMemory)


It would be great if you can add the points mentioned the pull request's description as comments in the code.

…parkHeuristicModification

skakker

LGTM

pralabhkumar · 2018-08-01T15:34:39Z

LGTM

edwinalu · 2018-08-02T00:43:54Z

app/com/linkedin/drelephant/spark/heuristics/ConfigurationHeuristic.scala

@@ -204,7 +204,7 @@ object ConfigurationHeuristic {
      SeverityThresholds(low = MemoryFormatUtils.stringToBytes("10G"), MemoryFormatUtils.stringToBytes("15G"),
        severe = MemoryFormatUtils.stringToBytes("20G"), critical = MemoryFormatUtils.stringToBytes("25G"), ascending = true)
    val DEFAULT_SPARK_CORES_THRESHOLDS =
-      SeverityThresholds(low = 4, moderate = 6, severe = 8, critical = 10, ascending = true)


Let's change for the other threshold levels as well, since it's a strict inequality.

Do you mean to change it like :: SeverityThresholds(low = 5, moderate = 7, severe = 9, critical = 11, ascending = true)

Yes, the original design was assuming >= rather than > for the comparison with the threshold values, so please make the change as you've suggested above.

Added the suggested changes.

edwinalu · 2018-08-02T00:45:19Z

app/com/linkedin/drelephant/spark/heuristics/DriverHeuristic.scala

@@ -172,7 +175,7 @@ object DriverHeuristic {

    //The following thresholds are for checking if the memory and cores values (driver) are above normal. These thresholds are experimental, and may change in the future.
    val DEFAULT_SPARK_MEMORY_THRESHOLDS =
-      SeverityThresholds(low = MemoryFormatUtils.stringToBytes("10G"), MemoryFormatUtils.stringToBytes("15G"),
+      SeverityThresholds(low = MemoryFormatUtils.stringToBytes("10G"), moderate = MemoryFormatUtils.stringToBytes("15G"),
        severe = MemoryFormatUtils.stringToBytes("20G"), critical = MemoryFormatUtils.stringToBytes("25G"), ascending = true)
    val DEFAULT_SPARK_CORES_THRESHOLDS =
      SeverityThresholds(low = 4, moderate = 6, severe = 8, critical = 10, ascending = true)


Please change driver core thresholds to match executor core thresholds.

…cutor and driver

edwinalu

Thanks for making the changes, looks good.

edwinalu

Could the changes already reviewed be merged and deployed, and the changes for unified memory be done in a separate PR? We have been having a lot of issues with Dr. E (user confusion and questions), so getting the existing fixes out would be very helpful, and help reduce the support workload for our team.

edwinalu · 2018-08-07T23:14:36Z

app/com/linkedin/drelephant/spark/heuristics/UnifiedMemoryHeuristic.scala

+  val JVM_USED_MEMORY = "jvmUsedMemory"
+  val MAX_EXECUTOR_PEAK_JVM_USED_MEMORY_THRESHOLD_KEY = "executor_peak_jvm_memory_threshold"
+
+  // 300 * FileUtils.ONE_MB (300 * 1024 * 1024)


It looks like this is copying a lot of the code and logic from JvmUsedMemoryHeuristic.scala. Instead, could the code for evaluation be factored out and shared?

* Revert "Dr. Elephant Tez Support working patch (linkedin#313)" This reverts commit a0470a3. * Rerevert "Dr. Elephant Tez Support working patch (linkedin#313)" including attribution. This reverts commit e3fd598. Co-authored-by: Abhishek Das <abhishekdas99@users.noreply.github.com> * Auto tuning: Support for parameter set multi-try (linkedin#386) * Changes in some of the Spark Heuristics * Adding test for changes executor gc heuristic and unified memory heuristic * Update ExecutorGcHeuristic.scala * Update UnifiedMemoryHeuristic.scala * Changed some hard coded values to variables * Due to strict inequality changing the other thereshold levels for executor and driver

* Revert "Dr. Elephant Tez Support working patch (#313)" This reverts commit a0470a3. * Rerevert "Dr. Elephant Tez Support working patch (#313)" including attribution. This reverts commit e3fd598. Co-authored-by: Abhishek Das <abhishekdas99@users.noreply.github.com> * Auto tuning: Support for parameter set multi-try (#386) * Changes in some of the Spark Heuristics * Adding test for changes executor gc heuristic and unified memory heuristic * Update ExecutorGcHeuristic.scala * Update UnifiedMemoryHeuristic.scala * Changed some hard coded values to variables * Due to strict inequality changing the other thereshold levels for executor and driver

akshayrai and others added 7 commits June 14, 2018 09:54

Revert "Dr. Elephant Tez Support working patch (linkedin#313)"

e3fd598

This reverts commit a0470a3.

Rerevert "Dr. Elephant Tez Support working patch (linkedin#313)" incl…

860dbe6

…uding attribution. This reverts commit e3fd598. Co-authored-by: Abhishek Das <abhishekdas99@users.noreply.github.com>

Auto tuning: Support for parameter set multi-try (linkedin#386)

dd31ad5

Changes in some of the Spark Heuristics

4c3eba2

Adding test for changes executor gc heuristic and unified memory heur…

0e146f0

…istic

Update ExecutorGcHeuristic.scala

3b19161

Update UnifiedMemoryHeuristic.scala

79ce85f

skakker reviewed Aug 1, 2018

View reviewed changes

ShubhamGupta29 added 2 commits August 1, 2018 14:24

Changed some hard coded values to variables

521c5c8

Merge branch 'master' of github.com:ShubhamGupta29/dr-elephant into s…

a166d20

…parkHeuristicModification

skakker reviewed Aug 1, 2018

View reviewed changes

edwinalu reviewed Aug 2, 2018

View reviewed changes

Due to strict inequality changing the other thereshold levels for exe…

35b57d7

…cutor and driver

edwinalu approved these changes Aug 3, 2018

View reviewed changes

ShubhamGupta29 force-pushed the sparkHeuristicModification branch from c226088 to 35b57d7 Compare August 7, 2018 04:32

edwinalu reviewed Aug 7, 2018

View reviewed changes

ShubhamGupta29 force-pushed the sparkHeuristicModification branch 2 times, most recently from d34dc93 to 35b57d7 Compare August 8, 2018 05:09

pralabhkumar merged commit 17fcb60 into linkedin:customSHSWork Aug 9, 2018

ShubhamGupta29 deleted the sparkHeuristicModification branch April 13, 2020 03:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark heuristic modification #407

Spark heuristic modification #407

ShubhamGupta29 commented Jul 30, 2018

skakker Aug 1, 2018

skakker Aug 1, 2018

skakker Aug 1, 2018

skakker Aug 1, 2018

skakker left a comment

pralabhkumar commented Aug 1, 2018

edwinalu Aug 2, 2018

ShubhamGupta29 Aug 2, 2018

edwinalu Aug 2, 2018

ShubhamGupta29 Aug 3, 2018

edwinalu Aug 2, 2018

edwinalu left a comment

edwinalu left a comment

edwinalu Aug 7, 2018

Spark heuristic modification #407

Spark heuristic modification #407

Conversation

ShubhamGupta29 commented Jul 30, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

skakker left a comment

Choose a reason for hiding this comment

pralabhkumar commented Aug 1, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edwinalu left a comment

Choose a reason for hiding this comment

edwinalu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment