Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-10956] Common MemoryManager interface for storage and execution #9000

Closed
wants to merge 16 commits into from

Conversation

andrewor14
Copy link
Contributor

This patch introduces a MemoryManager that is the central arbiter of how much memory to grant to storage and execution. This patch is primarily concerned only with refactoring while preserving the existing behavior as much as possible.

This is the first step away from the existing rigid separation of storage and execution memory, which has several major drawbacks discussed on the issue. It is the precursor of a series of patches that will attempt to address those drawbacks.

Andrew Or added 8 commits September 30, 2015 16:57
This commit does not represent any change in behavior. The
MemoryManager's introduced are not actually used anywhere yet.
This will come in the next commit.
This commit refactors BlockManager and ShuffleMemoryManager to
take in a MemoryManager in their constructors, such that the max
memory is derived from the MemoryManager instead of from a Long
that is passed around everywhere.

Note that no memory acquisition or release goes through the
MemoryManager yet. This will come in future commits.
All execution memory is now acquired through the new MemoryManager
interface as of this commit. The next step is to do it for storage
memory.
All storage memory is now acquired through the new MemoryManager
interface. This requires non-trivial refactoring due to the fact
that the unrolling and eviction logic are closely intertwined.
It was an awkward case class that didn't really do much. Instead
we'll pass a mutable list into the methods to get the dropped
blocks.
@SparkQA
Copy link

SparkQA commented Oct 6, 2015

Test build #43293 has finished for PR 9000 at commit 88faede.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* sorts and aggregations, while storage memory refers to that used for caching and propagating
* internal data across the cluster.
*/
private[spark] abstract class MemoryManager {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this somewhere else that's not at the top level? I'd like to minimize the number of private[spark] that appears in the top level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on moving this out of the top-level and into a new package (maybe spark.memory ?). This will also provide a convenient way to group the memory-related classes together post-refactoring.

Clarify the transfer of memory from unroll to pending unroll
@JoshRosen
Copy link
Contributor

LGTM, by the way, pending updates to address comments. The majority of the unaddressed comments are minor and nitpicky, but I think that #9000 (comment) might deserve special attention.

Andrew Or added 5 commits October 8, 2015 17:31
Previously these would always return either 0 or N. Instead we
can express this simply as whether the acquire was successful
so downstream usages don't have to worry about releasing partially
acquired memory.
@andrewor14
Copy link
Contributor Author

@JoshRosen alright I believe I addressed all of your comments. Please let me know if it LGTY.

@JoshRosen
Copy link
Contributor

LGTM. Feel free to merge pending Jenkins.

@SparkQA
Copy link

SparkQA commented Oct 9, 2015

Test build #43437 has finished for PR 9000 at commit adb1764.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor Author

Never mind, the test pass was not for the latest commit. Still waiting...

@SparkQA
Copy link

SparkQA commented Oct 9, 2015

Test build #43445 has finished for PR 9000 at commit 6e2e870.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 9, 2015

Test build #43450 has finished for PR 9000 at commit fc7f9f5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor

Woohoo! Tests have passed as of the latest commit, so I'm going to merge this now.

@asfgit asfgit closed this in 67fbecb Oct 9, 2015
@andrewor14 andrewor14 deleted the memory-manager branch October 13, 2015 01:09
asfgit pushed a commit that referenced this pull request Oct 13, 2015
This patch unifies the memory management of the storage and execution regions such that either side can borrow memory from each other. When memory pressure arises, storage will be evicted in favor of execution. To avoid regressions in cases where storage is crucial, we dynamically allocate a fraction of space for storage that execution cannot evict. Several configurations are introduced:

- **spark.memory.fraction (default 0.75)**: ​fraction of the heap space used for execution and storage. The lower this is, the more frequently spills and cached data eviction occur. The purpose of this config is to set aside memory for internal metadata, user data structures, and imprecise size estimation in the case of sparse, unusually large records.

- **spark.memory.storageFraction (default 0.5)**: size of the storage region within the space set aside by `s​park.memory.fraction`. ​Cached data may only be evicted if total storage exceeds this region.

- **spark.memory.useLegacyMode (default false)**: whether to use the memory management that existed in Spark 1.5 and before. This is mainly for backward compatibility.

For a detailed description of the design, see [SPARK-10000](https://issues.apache.org/jira/browse/SPARK-10000). This patch builds on top of the `MemoryManager` interface introduced in #9000.

Author: Andrew Or <andrew@databricks.com>

Closes #9084 from andrewor14/unified-memory-manager.
markhamstra pushed a commit to markhamstra/spark that referenced this pull request Oct 14, 2015
This patch introduces a `MemoryManager` that is the central arbiter of how much memory to grant to storage and execution. This patch is primarily concerned only with refactoring while preserving the existing behavior as much as possible.

This is the first step away from the existing rigid separation of storage and execution memory, which has several major drawbacks discussed on the [issue](https://issues.apache.org/jira/browse/SPARK-10956). It is the precursor of a series of patches that will attempt to address those drawbacks.

Author: Andrew Or <andrew@databricks.com>
Author: Josh Rosen <joshrosen@databricks.com>
Author: andrewor14 <andrew@databricks.com>

Closes apache#9000 from andrewor14/memory-manager.
markhamstra pushed a commit to markhamstra/spark that referenced this pull request Oct 14, 2015
This patch unifies the memory management of the storage and execution regions such that either side can borrow memory from each other. When memory pressure arises, storage will be evicted in favor of execution. To avoid regressions in cases where storage is crucial, we dynamically allocate a fraction of space for storage that execution cannot evict. Several configurations are introduced:

- **spark.memory.fraction (default 0.75)**: ​fraction of the heap space used for execution and storage. The lower this is, the more frequently spills and cached data eviction occur. The purpose of this config is to set aside memory for internal metadata, user data structures, and imprecise size estimation in the case of sparse, unusually large records.

- **spark.memory.storageFraction (default 0.5)**: size of the storage region within the space set aside by `s​park.memory.fraction`. ​Cached data may only be evicted if total storage exceeds this region.

- **spark.memory.useLegacyMode (default false)**: whether to use the memory management that existed in Spark 1.5 and before. This is mainly for backward compatibility.

For a detailed description of the design, see [SPARK-10000](https://issues.apache.org/jira/browse/SPARK-10000). This patch builds on top of the `MemoryManager` interface introduced in apache#9000.

Author: Andrew Or <andrew@databricks.com>

Closes apache#9084 from andrewor14/unified-memory-manager.
asfgit pushed a commit that referenced this pull request Oct 26, 2015
This patch refactors the MemoryManager class structure. After #9000, Spark had the following classes:

- MemoryManager
- StaticMemoryManager
- ExecutorMemoryManager
- TaskMemoryManager
- ShuffleMemoryManager

This is fairly confusing. To simplify things, this patch consolidates several of these classes:

- ShuffleMemoryManager and ExecutorMemoryManager were merged into MemoryManager.
- TaskMemoryManager is moved into Spark Core.

**Key changes and tasks**:

- [x] Merge ExecutorMemoryManager into MemoryManager.
  - [x] Move pooling logic into Allocator.
- [x] Move TaskMemoryManager from `spark-unsafe` to `spark-core`.
- [x] Refactor the existing Tungsten TaskMemoryManager interactions so Tungsten code use only this and not both this and ShuffleMemoryManager.
- [x] Refactor non-Tungsten code to use the TaskMemoryManager instead of ShuffleMemoryManager.
- [x] Merge ShuffleMemoryManager into MemoryManager.
  - [x] Move code
  - [x] ~~Simplify 1/n calculation.~~ **Will defer to followup, since this needs more work.**
- [x] Port ShuffleMemoryManagerSuite tests.
- [x] Move classes from `unsafe` package to `memory` package.
- [ ] Figure out how to handle the hacky use of the memory managers in HashedRelation's broadcast variable construction.
- [x] Test porting and cleanup: several tests relied on mock functionality (such as `TestShuffleMemoryManager.markAsOutOfMemory`) which has been changed or broken during the memory manager consolidation
  - [x] AbstractBytesToBytesMapSuite
  - [x] UnsafeExternalSorterSuite
  - [x] UnsafeFixedWidthAggregationMapSuite
  - [x] UnsafeKVExternalSorterSuite

**Compatiblity notes**:

- This patch introduces breaking changes in `ExternalAppendOnlyMap`, which is marked as `DevloperAPI` (likely for legacy reasons): this class now cannot be used outside of a task.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #9127 from JoshRosen/SPARK-10984.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants