-
Notifications
You must be signed in to change notification settings - Fork 855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark heuristic modification #407
Spark heuristic modification #407
Conversation
@@ -116,10 +116,14 @@ object UnifiedMemoryHeuristic { | |||
} | |||
}.max | |||
|
|||
lazy val severity: Severity = if (sparkExecutorMemory <= MemoryFormatUtils.stringToBytes(unifiedMemoryHeuristic.sparkExecutorMemoryThreshold)) { | |||
Severity.NONE | |||
lazy val severity: Severity = if (sparkMemoryFraction > 0.05D && maxMemory > 268435456L) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make these values configurable
@@ -107,7 +107,12 @@ object ExecutorGcHeuristic { | |||
|
|||
var ratio: Double = jvmTime.toDouble / executorRunTimeTotal.toDouble | |||
|
|||
lazy val severityTimeA: Severity = executorGcHeuristic.gcSeverityAThresholds.severityOf(ratio) | |||
//If the total Executor Runtime is less then 5 minutes then we won't consider for the severity due to GC | |||
lazy val severityTimeA: Severity = if ((executorRunTimeTotal/Statistics.MINUTE_IN_MS) >= 5.0D) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make this configurable
@@ -116,10 +116,14 @@ object UnifiedMemoryHeuristic { | |||
} | |||
}.max | |||
|
|||
lazy val severity: Severity = if (sparkExecutorMemory <= MemoryFormatUtils.stringToBytes(unifiedMemoryHeuristic.sparkExecutorMemoryThreshold)) { | |||
Severity.NONE | |||
lazy val severity: Severity = if (sparkMemoryFraction > 0.05D && maxMemory > 268435456L) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use MemoryFormatUtils library instead of hard coding the value.
if (sparkExecutorMemory <= MemoryFormatUtils.stringToBytes(unifiedMemoryHeuristic.sparkExecutorMemoryThreshold)) { | ||
Severity.NONE | ||
} else { | ||
PEAK_UNIFIED_MEMORY_THRESHOLDS.severityOf(maxUnifiedMemory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if you can add the points mentioned the pull request's description as comments in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LGTM |
@@ -204,7 +204,7 @@ object ConfigurationHeuristic { | |||
SeverityThresholds(low = MemoryFormatUtils.stringToBytes("10G"), MemoryFormatUtils.stringToBytes("15G"), | |||
severe = MemoryFormatUtils.stringToBytes("20G"), critical = MemoryFormatUtils.stringToBytes("25G"), ascending = true) | |||
val DEFAULT_SPARK_CORES_THRESHOLDS = | |||
SeverityThresholds(low = 4, moderate = 6, severe = 8, critical = 10, ascending = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's change for the other threshold levels as well, since it's a strict inequality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean to change it like :: SeverityThresholds(low = 5, moderate = 7, severe = 9, critical = 11, ascending = true)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the original design was assuming >= rather than > for the comparison with the threshold values, so please make the change as you've suggested above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the suggested changes.
@@ -172,7 +175,7 @@ object DriverHeuristic { | |||
|
|||
//The following thresholds are for checking if the memory and cores values (driver) are above normal. These thresholds are experimental, and may change in the future. | |||
val DEFAULT_SPARK_MEMORY_THRESHOLDS = | |||
SeverityThresholds(low = MemoryFormatUtils.stringToBytes("10G"), MemoryFormatUtils.stringToBytes("15G"), | |||
SeverityThresholds(low = MemoryFormatUtils.stringToBytes("10G"), moderate = MemoryFormatUtils.stringToBytes("15G"), | |||
severe = MemoryFormatUtils.stringToBytes("20G"), critical = MemoryFormatUtils.stringToBytes("25G"), ascending = true) | |||
val DEFAULT_SPARK_CORES_THRESHOLDS = | |||
SeverityThresholds(low = 4, moderate = 6, severe = 8, critical = 10, ascending = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change driver core thresholds to match executor core thresholds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the changes, looks good.
c226088
to
35b57d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the changes already reviewed be merged and deployed, and the changes for unified memory be done in a separate PR? We have been having a lot of issues with Dr. E (user confusion and questions), so getting the existing fixes out would be very helpful, and help reduce the support workload for our team.
val JVM_USED_MEMORY = "jvmUsedMemory" | ||
val MAX_EXECUTOR_PEAK_JVM_USED_MEMORY_THRESHOLD_KEY = "executor_peak_jvm_memory_threshold" | ||
|
||
// 300 * FileUtils.ONE_MB (300 * 1024 * 1024) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is copying a lot of the code and logic from JvmUsedMemoryHeuristic.scala. Instead, could the code for evaluation be factored out and shared?
d34dc93
to
35b57d7
Compare
* Revert "Dr. Elephant Tez Support working patch (linkedin#313)" This reverts commit a0470a3. * Rerevert "Dr. Elephant Tez Support working patch (linkedin#313)" including attribution. This reverts commit e3fd598. Co-authored-by: Abhishek Das <abhishekdas99@users.noreply.github.com> * Auto tuning: Support for parameter set multi-try (linkedin#386) * Changes in some of the Spark Heuristics * Adding test for changes executor gc heuristic and unified memory heuristic * Update ExecutorGcHeuristic.scala * Update UnifiedMemoryHeuristic.scala * Changed some hard coded values to variables * Due to strict inequality changing the other thereshold levels for executor and driver
* Revert "Dr. Elephant Tez Support working patch (#313)" This reverts commit a0470a3. * Rerevert "Dr. Elephant Tez Support working patch (#313)" including attribution. This reverts commit e3fd598. Co-authored-by: Abhishek Das <abhishekdas99@users.noreply.github.com> * Auto tuning: Support for parameter set multi-try (#386) * Changes in some of the Spark Heuristics * Adding test for changes executor gc heuristic and unified memory heuristic * Update ExecutorGcHeuristic.scala * Update UnifiedMemoryHeuristic.scala * Changed some hard coded values to variables * Due to strict inequality changing the other thereshold levels for executor and driver
Description
Making some changes to the current Spark Heuristics for executor GC, configuration , unified memory.
Changes made are as follows::
Removed the Driver Gc heuristic
If total executor runtime is less than 5 mins then won't flag for the executor GC heuristic
If spark.memory.fraction > 0.05 and unified allocated memory > 256 MB then only consider for peak unified memory severity
Changes in configuration heuristic for spark.executor.core, severity will be NONE if cores <= 4
How this is tested
For these changes, tests are included in the respective tests for the heuristics.