Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark heuristic modification #407

Merged

Conversation

ShubhamGupta29
Copy link
Contributor

Description

Making some changes to the current Spark Heuristics for executor GC, configuration , unified memory.
Changes made are as follows::

  • Removed the Driver Gc heuristic

  • If total executor runtime is less than 5 mins then won't flag for the executor GC heuristic

  • If spark.memory.fraction > 0.05 and unified allocated memory > 256 MB then only consider for peak unified memory severity

  • Changes in configuration heuristic for spark.executor.core, severity will be NONE if cores <= 4

How this is tested

For these changes, tests are included in the respective tests for the heuristics.

@@ -116,10 +116,14 @@ object UnifiedMemoryHeuristic {
}
}.max

lazy val severity: Severity = if (sparkExecutorMemory <= MemoryFormatUtils.stringToBytes(unifiedMemoryHeuristic.sparkExecutorMemoryThreshold)) {
Severity.NONE
lazy val severity: Severity = if (sparkMemoryFraction > 0.05D && maxMemory > 268435456L) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make these values configurable

@@ -107,7 +107,12 @@ object ExecutorGcHeuristic {

var ratio: Double = jvmTime.toDouble / executorRunTimeTotal.toDouble

lazy val severityTimeA: Severity = executorGcHeuristic.gcSeverityAThresholds.severityOf(ratio)
//If the total Executor Runtime is less then 5 minutes then we won't consider for the severity due to GC
lazy val severityTimeA: Severity = if ((executorRunTimeTotal/Statistics.MINUTE_IN_MS) >= 5.0D)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this configurable

@@ -116,10 +116,14 @@ object UnifiedMemoryHeuristic {
}
}.max

lazy val severity: Severity = if (sparkExecutorMemory <= MemoryFormatUtils.stringToBytes(unifiedMemoryHeuristic.sparkExecutorMemoryThreshold)) {
Severity.NONE
lazy val severity: Severity = if (sparkMemoryFraction > 0.05D && maxMemory > 268435456L) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use MemoryFormatUtils library instead of hard coding the value.

if (sparkExecutorMemory <= MemoryFormatUtils.stringToBytes(unifiedMemoryHeuristic.sparkExecutorMemoryThreshold)) {
Severity.NONE
} else {
PEAK_UNIFIED_MEMORY_THRESHOLDS.severityOf(maxUnifiedMemory)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if you can add the points mentioned the pull request's description as comments in the code.

Copy link
Contributor

@skakker skakker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pralabhkumar
Copy link
Contributor

LGTM

@@ -204,7 +204,7 @@ object ConfigurationHeuristic {
SeverityThresholds(low = MemoryFormatUtils.stringToBytes("10G"), MemoryFormatUtils.stringToBytes("15G"),
severe = MemoryFormatUtils.stringToBytes("20G"), critical = MemoryFormatUtils.stringToBytes("25G"), ascending = true)
val DEFAULT_SPARK_CORES_THRESHOLDS =
SeverityThresholds(low = 4, moderate = 6, severe = 8, critical = 10, ascending = true)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change for the other threshold levels as well, since it's a strict inequality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to change it like :: SeverityThresholds(low = 5, moderate = 7, severe = 9, critical = 11, ascending = true)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the original design was assuming >= rather than > for the comparison with the threshold values, so please make the change as you've suggested above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the suggested changes.

@@ -172,7 +175,7 @@ object DriverHeuristic {

//The following thresholds are for checking if the memory and cores values (driver) are above normal. These thresholds are experimental, and may change in the future.
val DEFAULT_SPARK_MEMORY_THRESHOLDS =
SeverityThresholds(low = MemoryFormatUtils.stringToBytes("10G"), MemoryFormatUtils.stringToBytes("15G"),
SeverityThresholds(low = MemoryFormatUtils.stringToBytes("10G"), moderate = MemoryFormatUtils.stringToBytes("15G"),
severe = MemoryFormatUtils.stringToBytes("20G"), critical = MemoryFormatUtils.stringToBytes("25G"), ascending = true)
val DEFAULT_SPARK_CORES_THRESHOLDS =
SeverityThresholds(low = 4, moderate = 6, severe = 8, critical = 10, ascending = true)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change driver core thresholds to match executor core thresholds.

Copy link

@edwinalu edwinalu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes, looks good.

@ShubhamGupta29 ShubhamGupta29 force-pushed the sparkHeuristicModification branch from c226088 to 35b57d7 Compare August 7, 2018 04:32
Copy link

@edwinalu edwinalu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the changes already reviewed be merged and deployed, and the changes for unified memory be done in a separate PR? We have been having a lot of issues with Dr. E (user confusion and questions), so getting the existing fixes out would be very helpful, and help reduce the support workload for our team.

val JVM_USED_MEMORY = "jvmUsedMemory"
val MAX_EXECUTOR_PEAK_JVM_USED_MEMORY_THRESHOLD_KEY = "executor_peak_jvm_memory_threshold"

// 300 * FileUtils.ONE_MB (300 * 1024 * 1024)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is copying a lot of the code and logic from JvmUsedMemoryHeuristic.scala. Instead, could the code for evaluation be factored out and shared?

@ShubhamGupta29 ShubhamGupta29 force-pushed the sparkHeuristicModification branch 2 times, most recently from d34dc93 to 35b57d7 Compare August 8, 2018 05:09
@pralabhkumar pralabhkumar merged commit 17fcb60 into linkedin:customSHSWork Aug 9, 2018
pralabhkumar pushed a commit to pralabhkumar/dr-elephant that referenced this pull request Aug 31, 2018
* Revert "Dr. Elephant Tez Support working patch (linkedin#313)"

This reverts commit a0470a3.

* Rerevert "Dr. Elephant Tez Support working patch (linkedin#313)" including attribution.

This reverts commit e3fd598.

Co-authored-by: Abhishek Das <abhishekdas99@users.noreply.github.com>

* Auto tuning: Support for parameter set multi-try (linkedin#386)

* Changes in some of the Spark Heuristics

* Adding test for changes executor gc heuristic and unified memory heuristic

* Update ExecutorGcHeuristic.scala

* Update UnifiedMemoryHeuristic.scala

* Changed some hard coded values to variables

* Due to strict inequality changing the other thereshold levels for executor and driver
varunsaxena pushed a commit that referenced this pull request Oct 16, 2018
* Revert "Dr. Elephant Tez Support working patch (#313)"

This reverts commit a0470a3.

* Rerevert "Dr. Elephant Tez Support working patch (#313)" including attribution.

This reverts commit e3fd598.

Co-authored-by: Abhishek Das <abhishekdas99@users.noreply.github.com>

* Auto tuning: Support for parameter set multi-try (#386)

* Changes in some of the Spark Heuristics

* Adding test for changes executor gc heuristic and unified memory heuristic

* Update ExecutorGcHeuristic.scala

* Update UnifiedMemoryHeuristic.scala

* Changed some hard coded values to variables

* Due to strict inequality changing the other thereshold levels for executor and driver
@ShubhamGupta29 ShubhamGupta29 deleted the sparkHeuristicModification branch April 13, 2020 03:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants