[SPARK-7214] Reserve space for unrolling even when MemoryStore nearly full #5784

charlesreiss · 2015-04-29T18:42:12Z

If making the initial reservation of space to unrolling a block fails, attempt to drop blocks to make room before giving up.

…uld be if blocks are dropped.

SparkQA · 2015-04-29T18:51:35Z

Test build #31305 has finished for PR 5784 at commit 6c57a97.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

SparkQA · 2015-04-29T20:27:00Z

Test build #31308 has finished for PR 5784 at commit 16797ea.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- public final class UnsafeFixedWidthAggregationMap
- public static class MapEntry
- public final class UnsafeRow implements MutableRow
- class UnsafeRowConverter(fieldTypes: Array[DataType])
- case class Hypot(
- case class Pow(left: Expression, right: Expression) extends BinaryMathExpression(math.pow, "POWER")
- abstract class MathematicalExpression(f: Double => Double, name: String)
- case class Acos(child: Expression) extends MathematicalExpression(math.acos, "ACOS")
- case class Asin(child: Expression) extends MathematicalExpression(math.asin, "ASIN")
- case class Atan(child: Expression) extends MathematicalExpression(math.atan, "ATAN")
- case class Cbrt(child: Expression) extends MathematicalExpression(math.cbrt, "CBRT")
- case class Ceil(child: Expression) extends MathematicalExpression(math.ceil, "CEIL")
- case class Cos(child: Expression) extends MathematicalExpression(math.cos, "COS")
- case class Cosh(child: Expression) extends MathematicalExpression(math.cosh, "COSH")
- case class Exp(child: Expression) extends MathematicalExpression(math.exp, "EXP")
- case class Expm1(child: Expression) extends MathematicalExpression(math.expm1, "EXPM1")
- case class Floor(child: Expression) extends MathematicalExpression(math.floor, "FLOOR")
- case class Log(child: Expression) extends MathematicalExpression(math.log, "LOG")
- case class Log10(child: Expression) extends MathematicalExpression(math.log10, "LOG10")
- case class Log1p(child: Expression) extends MathematicalExpression(math.log1p, "LOG1P")
- case class Rint(child: Expression) extends MathematicalExpression(math.rint, "ROUND")
- case class Signum(child: Expression) extends MathematicalExpression(math.signum, "SIGNUM")
- case class Sin(child: Expression) extends MathematicalExpression(math.sin, "SIN")
- case class Sinh(child: Expression) extends MathematicalExpression(math.sinh, "SINH")
- case class Tan(child: Expression) extends MathematicalExpression(math.tan, "TAN")
- case class Tanh(child: Expression) extends MathematicalExpression(math.tanh, "TANH")
- case class Repartition(numPartitions: Int, shuffle: Boolean, child: LogicalPlan)
- case class RepartitionByExpression(partitionExpressions: Seq[Expression], child: LogicalPlan)
- case class Repartition(numPartitions: Int, shuffle: Boolean, child: SparkPlan)
- public final class PlatformDependent
- public class ByteArrayMethods
- public final class LongArray
- public final class BitSet
- public final class BitSetMethods
- public final class Murmur3_x86_32
- public final class BytesToBytesMap
- public final class Location
- class Doubling implements HashMapGrowthStrategy
- public class ExecutorMemoryManager
- public class HeapMemoryAllocator implements MemoryAllocator
- public class MemoryBlock extends MemoryLocation
- public class MemoryLocation
- public final class TaskMemoryManager
- public class UnsafeMemoryAllocator implements MemoryAllocator
This patch adds the following new dependencies:
- spark-unsafe_2.10-1.4.0-SNAPSHOT.jar

shaneknapp · 2015-04-29T23:02:37Z

@brennonyork -- something seems strange w/the output of pr_public_classes.sh...

i checked the jenkins output also, and it's reporting that the SHA1 is "SHA1: origin/pr/5784/merge", which seems not right. thoughts?

brennonyork · 2015-04-30T15:44:42Z

@shaneknapp well, to start, I couldn't agree more that something looks very fishy with pr_public_classes. That was a direct port from the previous code though which makes it even more interesting :/

To address the SHA1 hash its getting pulled from, as I'm sure you know, this code here which, I'll admit, is interesting in and of itself in that it could produce the SHA1 as an actual hash or what we see above (in the case that the patch can merge without conflict into master).

That said pr_public_classes only relies on the ghprbActualCommit and not the SHA1 so unless that were empty somehow I'm not immediately sure how this could be happening (of which it isn't empty according to Jenkins).

My only thought, and I'm hoping you could shed some light here, would be a possible race condition from some shared state on each Jenkins box such that the Bash calls (or environment variables) aren't atomic to the PR they're building. Thoughts on that? I'll continue to dig and see what I can find.

shaneknapp · 2015-04-30T17:44:55Z

so, believe it or not, the jenkins bash environment is actually atomic (all machines are identical hardware, same kernel, system updates, package versions, ssh bash environment, etc). you can see the bash environment for the worker this job executed on here (https://amplab.cs.berkeley.edu/jenkins/computer/amp-jenkins-worker-08/log), and it's identical to all of the other ones.

regarding sha1, derp. i totally knew that. :p

anyways, i'll poke around those shell scripts some more and see if there's anything i can tweak to better report/fail on errors. i'm also going to redir a lot of the git error output to /dev/null as it's been cluttering up the test logs. i know there's a PR to move this stuff to python, but for the time being i'll see if i can't make things better. once the PR is done i'll add you (@brennonyork) as a reviewer.

shaneknapp · 2015-04-30T18:29:26Z

jenkins, test this please

brennonyork · 2015-04-30T18:32:51Z

thanks for that clarity @shaneknapp. Looks like I don't have access to see the Jenkins environment variables from the link you sent (unless it became stale before I clicked), but I'll look for the review note and provide what I can!

SparkQA · 2015-04-30T20:09:32Z

Test build #31441 has finished for PR 5784 at commit 16797ea.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

squito · 2015-04-30T22:21:38Z

I've been trying to come up with a scenario when this would be undesirable. I've only come up with one scenario: on the second put, you have less than the initial room available, AND the second block is bigger than your total memory. After this change, you'll end up with nothing in the cache, but before, you would still have the first block.

however, that is quite a stretch: you could end up in this situation before in any case, if you had more than the initial room available to start with, but the second block is bigger than your total memory.

does that seem right? not that this is a show stopper, just trying to make sure I understand

squito · 2015-04-30T22:22:36Z

core/src/main/scala/org/apache/spark/storage/MemoryStore.scala

@@ -280,20 +283,10 @@ private[spark] class MemoryStore(blockManager: BlockManager, maxMemory: Long)
          val currentSize = vector.estimateSize()
          if (currentSize >= memoryThreshold) {
            val amountToRequest = (currentSize * memoryGrowthFactor - memoryThreshold).toLong
-            // Hold the accounting lock, in case another thread concurrently puts a block that
-            // takes up the unrolling space we just ensured here


move this comment into the new method

charlesreiss · 2015-04-30T23:56:24Z

The idea about it happening anyways with slightly different amount of space is right; often fragmentation will mean that there will be more than unrollMemoryThreshold free in the MemoryStore even though it is essentially full.

Note that the total amount passed to ensureFreeSpace() is limited by maxUnrollMemory (taking into account already-reserved unrolling space), so, if there are no "huge" blocks, there's a limit to how much can be spuriously dropped.

squito · 2015-05-01T00:21:06Z

core/src/main/scala/org/apache/spark/storage/MemoryStore.scala

+   * necessary. If blocks are dropped, adds them to droppedBlocks. Returns whether the
+   * request was granted, and any blocks that were dropped trying to grant it.
+   */
+  def reserveUnrollMemoryForThisThreadDroppingBlocks(


make this private

squito · 2015-05-01T00:48:00Z

just some teeny style things, otherwise lgtm

squito · 2015-05-01T00:49:51Z

just curious, I assume this matters only when spark.storage.unrollMemoryThreshold is changed to something bigger?

SparkQA · 2015-05-01T01:58:23Z

Test build #31481 has finished for PR 5784 at commit fa212c1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

charlesreiss · 2015-05-01T02:03:03Z

No; one just needs to have enough small blocks to store that the free space gets below 1MB (and not be unpersist'ing/etc. and have the timing work out that storing the last block isn't stopped by unroll space reservations).

SparkQA · 2015-05-01T02:42:59Z

Test build #31495 has finished for PR 5784 at commit 63afdfb.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

squito · 2015-05-01T15:00:29Z

Jenkins, retest this please

squito · 2015-05-01T15:54:23Z

ah I see, I'm glad I asked! thanks for clarifying. I'm pretty sure that last set of test failures was spurious, so assuming the latest set of tests pass, I think this is good. Since the memory store is so central, though, I'd like to err on the side of caution. @aarondav @rxin do you want to take a quick look?

SparkQA · 2015-05-01T16:38:51Z

Test build #31560 has finished for PR 5784 at commit 63afdfb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2015-05-02T06:00:56Z

cc @andrewor14 who worked a lot on this

charlesreiss · 2015-05-19T15:51:39Z

@andrewor14 @rxin Update on this?

andrewor14 · 2015-05-19T22:53:42Z

Hi @woggle thanks for the patch, but I won't have the bandwidth to look at this in detail until after the 1.4 release. I will have a look then.

JoshRosen · 2015-10-16T20:22:40Z

Hey @andrewor14, given our recent discussions / work on unrolling, do you think this PR is still necessary / relevant?

andrewor14 · 2015-10-16T22:15:25Z

@woggle It turns out that my recent patch #9000 unintentionally fixed this issue :). In particular, reserving the initial memory now drops blocks (see this line). Can you verify?

andrewor14 · 2015-11-10T20:53:24Z

I think this patch is largely out of date by now given the new memory management patches. I would recommend we close this issue.

charlesreiss · 2015-11-10T21:23:12Z

Agreed. I haven't had a chance to actually make sure your ptach #9000 it fixes the problem, but it looks like it should.

Test for unrolling when unrollMemoryThreshold is not available but wo…

d6002d8

…uld be if blocks are dropped.

Consistently call ensureFreeSpace() when reserving unroll memory.

16797ea

charlesreiss force-pushed the SPARK-7214 branch from 6c57a97 to 16797ea Compare April 29, 2015 18:56

squito reviewed Apr 30, 2015
View reviewed changes

Add back comment to code extracted into method

fa212c1

squito reviewed May 1, 2015
View reviewed changes

charlesreiss added 2 commits April 30, 2015 18:56

Make reserveUnrollMemoryForThisThreadDroppingBlocks private

400b8fa

var to val in test

63afdfb

charlesreiss closed this Nov 10, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-7214] Reserve space for unrolling even when MemoryStore nearly full #5784

[SPARK-7214] Reserve space for unrolling even when MemoryStore nearly full #5784

charlesreiss commented Apr 29, 2015

SparkQA commented Apr 29, 2015

SparkQA commented Apr 29, 2015

shaneknapp commented Apr 29, 2015

brennonyork commented Apr 30, 2015

shaneknapp commented Apr 30, 2015

shaneknapp commented Apr 30, 2015

brennonyork commented Apr 30, 2015

SparkQA commented Apr 30, 2015

squito commented Apr 30, 2015

squito Apr 30, 2015

charlesreiss commented Apr 30, 2015

squito May 1, 2015

squito commented May 1, 2015

squito commented May 1, 2015

SparkQA commented May 1, 2015

charlesreiss commented May 1, 2015

SparkQA commented May 1, 2015

squito commented May 1, 2015

squito commented May 1, 2015

SparkQA commented May 1, 2015

rxin commented May 2, 2015

charlesreiss commented May 19, 2015

andrewor14 commented May 19, 2015

JoshRosen commented Oct 16, 2015

andrewor14 commented Oct 16, 2015

andrewor14 commented Nov 10, 2015

charlesreiss commented Nov 10, 2015

[SPARK-7214] Reserve space for unrolling even when MemoryStore nearly full #5784

[SPARK-7214] Reserve space for unrolling even when MemoryStore nearly full #5784

Conversation

charlesreiss commented Apr 29, 2015

SparkQA commented Apr 29, 2015

SparkQA commented Apr 29, 2015

shaneknapp commented Apr 29, 2015

brennonyork commented Apr 30, 2015

shaneknapp commented Apr 30, 2015

shaneknapp commented Apr 30, 2015

brennonyork commented Apr 30, 2015

SparkQA commented Apr 30, 2015

squito commented Apr 30, 2015

squito Apr 30, 2015

Choose a reason for hiding this comment

charlesreiss commented Apr 30, 2015

squito May 1, 2015

Choose a reason for hiding this comment

squito commented May 1, 2015

squito commented May 1, 2015

SparkQA commented May 1, 2015

charlesreiss commented May 1, 2015

SparkQA commented May 1, 2015

squito commented May 1, 2015

squito commented May 1, 2015

SparkQA commented May 1, 2015

rxin commented May 2, 2015

charlesreiss commented May 19, 2015

andrewor14 commented May 19, 2015

JoshRosen commented Oct 16, 2015

andrewor14 commented Oct 16, 2015

andrewor14 commented Nov 10, 2015

charlesreiss commented Nov 10, 2015