-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-7855] Move bypassMergeSort-handling from ExternalSorter to own component #6397
[SPARK-7855] Move bypassMergeSort-handling from ExternalSorter to own component #6397
Conversation
|
||
// Write metrics for current spill | ||
private var curWriteMetrics: ShuffleWriteMetrics = _ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the old code, it was pretty hard to trace through usages of this variable to figure out where it was reset / updated. After removing the hash bypass code, I noticed that this variable's scope can be reduced to a local variable in spillToMergebableFiles
.
Test build #33475 has finished for PR 6397 at commit
|
Test build #33476 has finished for PR 6397 at commit
|
Test build #33478 has started for PR 6397 at commit |
Woah, looks like the test log is being flooded with warnings about being unable to delete spill files:
It looks like I had the |
Alright, I think that this is ready for review so I'm going to remove the |
Test build #33597 has finished for PR 6397 at commit
|
Hmm, interesting test failure:
|
Ah, I found the problem: we're closing the partition writers at the wrong place. We should be calling According to BlockObjectWriter's API contract: /**
* Returns the file segment of committed data that this Writer has written.
* This is only valid after commitAndClose() has been called.
*/
def fileSegment(): FileSegment However, DiskBlockObjectWriter doesn't have assertions to catch if I also noticed that InputOutputMetricsSuite is one of the only places with an end-to-end test of the shuffle records written metric. This test probably belongs in ShuffleSuite, since it should be tested for each shuffle manager. We should also add another test to cover these metrics for shuffles which perform aggregation, since these shuffles may use a different code path. |
Actually, I spoke slightly too soon: it looks like we do call |
I added a bunch more tests to DiskBlockObjectWriter to ensure that all of the close methods are idempotent. I've also moved the metrics test from InputOutputMetricsSuite into ShuffleSuite so that they're run against all shuffle managers and have added a test to check that the metrics are correct when shuffles perform aggregation. I also added some additional checks to ensure that shuffleBytesRead == shuffleBytesWritten. /cc @ksakellis, since this touches some of your code in InputOutputMetricsSuite. |
In case reviewers missed this in the updated PR description, here's a better summary of the new changes: This patch also makes several improvements to shuffle-related tests and adds more defensive checks to certain shuffle classes:
|
@sryza, this might also be of interest to you. |
Test build #33681 has finished for PR 6397 at commit
|
Hmm, looks like this fails nearly all of the HashShuffleSuite tests. I found the problem: after 8b8fb9e, the |
Jenkins, retest this please. |
Test build #33716 has finished for PR 6397 at commit
|
Test build #33723 has finished for PR 6397 at commit
|
while (records.hasNext()) { | ||
final Product2<K, V> record = records.next(); | ||
final K key = record._1(); | ||
partitionWriters[partitioner.getPartition(key)].write(key, record._2()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partitioner.getPartition(key)
is different from the previous codes. The previous codes calls getPartition
which doesn't call partitioner.getPartition(key)
if there is only 1 partition. I'm not sure if such optimization does matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for noticing this. I left this out on purpose, but should have probably commented on the diff to explain why.
I think that the reason why ExternalSorter
skips the partitioner.getPartition(key)
call when there is only one partition is because ExternalSorter
is also used for non-shuffle contexts for which we don't define a partitioner (such as the reduce-side sort in sortByKey()
. In those cases, we obviously want to avoid unnecessary hashing.
BypassMergeSortShuffleWriter is only used for shuffles, though, and I expect that it's extremely rare to have shuffles that shuffle everything to a single partition (collecting results to the driver is handled by different code). Therefore, I chose to leave out that check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Fair enough. LGTM now.
LGTM except the |
…ass-cleanup Conflicts: core/src/test/scala/org/apache/spark/storage/BlockObjectWriterSuite.scala core/src/test/scala/org/apache/spark/util/collection/ExternalSorterSuite.scala
Test build #33775 has finished for PR 6397 at commit
|
* Spill the current in-memory collection to disk, adding a new file to spills, and clear it. | ||
*/ | ||
override protected[this] def spill(collection: WritablePartitionedPairCollection[K, C]): Unit = { | ||
if (bypassMergeSort) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify a little further, since we no longer buffered records in memory when bypassMergeSort
was true, this branch would never be taken.
LGTM. |
… component Spark's `ExternalSorter` writes shuffle output files during sort-based shuffle. Sort-shuffle contains a configuration, `spark.shuffle.sort.bypassMergeThreshold`, which causes ExternalSorter to skip sorting and merging and simply write separate files per partition, which are then concatenated together to form the final map output file. The code paths used during this bypass are almost completely separate from ExternalSorter's other code paths, so refactoring them into a separate file can significantly simplify the code. In addition to re-arranging code, this patch deletes a bunch of dead code. The main entry point into ExternalSorter is `insertAll()` and in SPARK-4479 / apache#3422 this method was modified to completely bypass in-memory buffering of records when `bypassMergeSort` takes effect. As a result, some of the spilling and merging code paths will no longer be called when `bypassMergeSort` is used, so we should be able to safely remove that code. There's an open JIRA ([SPARK-6026](https://issues.apache.org/jira/browse/SPARK-6026)) for removing the `bypassMergeThreshold` parameter and code paths; I have not done that here, but the changes in this patch will make removing that parameter significantly easier if we ever decide to do that. This patch also makes several improvements to shuffle-related tests and adds more defensive checks to certain shuffle classes: - DiskBlockObjectWriter now throws an exception if `fileSegment()` is called before `commitAndClose()` has been called. - DiskBlockObjectWriter's close methods are now idempotent, so calling any of the close methods twice in a row will no longer result in incorrect shuffle write metrics changes. Calling `revertPartialWritesAndClose()` on a closed DiskBlockObjectWriter now has no effect (before, it might mess up the metrics). - The end-to-end shuffle record count metrics tests have been moved from InputOutputMetricsSuite to ShuffleSuite. This means that these tests will now be run against all shuffle implementations rather than just the default shuffle configuration. - The end-to-end metrics tests now include a test of a job which performs aggregation in the shuffle. - Our tests now check that `shuffleBytesWritten == totalShuffleBytesRead`. - FileSegment now throws IllegalArgumentException if it is constructed with a negative length or offset. Author: Josh Rosen <joshrosen@databricks.com> Closes apache#6397 from JoshRosen/external-sorter-bypass-cleanup and squashes the following commits: bf3f3f6 [Josh Rosen] Merge remote-tracking branch 'origin/master' into external-sorter-bypass-cleanup 8b216c4 [Josh Rosen] Guard against negative offsets and lengths in FileSegment 03f35a4 [Josh Rosen] Minor fix to cleanup logic. b5cc35b [Josh Rosen] Move shuffle metrics tests to ShuffleSuite. 8b8fb9e [Josh Rosen] Add more tests + defensive programming to DiskBlockObjectWriter. 16564eb [Josh Rosen] Guard against calling fileSegment() before commitAndClose() has been called. 96811b4 [Josh Rosen] Remove confusing taskMetrics.shuffleWriteMetrics() optional call 8522b6a [Josh Rosen] Do not perform a map-side sort unless we're also doing map-side aggregation 08e40f3 [Josh Rosen] Remove excessively clever (and wrong) implementation of newBuffer() d7f9938 [Josh Rosen] Add missing overrides; fix compilation 71d76ff [Josh Rosen] Update Javadoc bf0d98f [Josh Rosen] Add comment to clarify confusing factory code 5197f73 [Josh Rosen] Add missing private[this] 30ef2c8 [Josh Rosen] Convert BypassMergeSortShuffleWriter to Java bc1a820 [Josh Rosen] Fix bug when aggregator is used but map-side combine is disabled 0d3dcc0 [Josh Rosen] Remove unnecessary overloaded methods 25b964f [Josh Rosen] Rename SortShuffleSorter to SortShuffleFileWriter 0d9848c [Josh Rosen] Make it more clear that curWriteMetrics is now only used for spill metrics 7af7aea [Josh Rosen] Combine spill() and spillToMergeableFile() 6320112 [Josh Rosen] Add missing negation in deletion success check. d267e0d [Josh Rosen] Fix style issue 7f15f7b [Josh Rosen] Back out extra cleanup-handling code, since this is already covered in stop() 25aa3bd [Josh Rosen] Make sure to delete outputFile after errors. 931ca68 [Josh Rosen] Refactor tests. 6a35716 [Josh Rosen] Refactor logic for deciding when to bypass 4b03539 [Josh Rosen] Move conf prior to first use 1265b25 [Josh Rosen] Fix some style errors and comments. 02355ef [Josh Rosen] More simplification d4cb536 [Josh Rosen] Delete more unused code bb96678 [Josh Rosen] Add missing interface file b6cc1eb [Josh Rosen] Realize that bypass never buffers; proceed to delete tons of code 6185ee2 [Josh Rosen] WIP towards moving bypass code into own file. 8d0678c [Josh Rosen] Move diskBytesSpilled getter next to variable 19bccd6 [Josh Rosen] Remove duplicated buffer creation code. 18959bb [Josh Rosen] Move comparator methods closer together.
… component Spark's `ExternalSorter` writes shuffle output files during sort-based shuffle. Sort-shuffle contains a configuration, `spark.shuffle.sort.bypassMergeThreshold`, which causes ExternalSorter to skip sorting and merging and simply write separate files per partition, which are then concatenated together to form the final map output file. The code paths used during this bypass are almost completely separate from ExternalSorter's other code paths, so refactoring them into a separate file can significantly simplify the code. In addition to re-arranging code, this patch deletes a bunch of dead code. The main entry point into ExternalSorter is `insertAll()` and in SPARK-4479 / apache#3422 this method was modified to completely bypass in-memory buffering of records when `bypassMergeSort` takes effect. As a result, some of the spilling and merging code paths will no longer be called when `bypassMergeSort` is used, so we should be able to safely remove that code. There's an open JIRA ([SPARK-6026](https://issues.apache.org/jira/browse/SPARK-6026)) for removing the `bypassMergeThreshold` parameter and code paths; I have not done that here, but the changes in this patch will make removing that parameter significantly easier if we ever decide to do that. This patch also makes several improvements to shuffle-related tests and adds more defensive checks to certain shuffle classes: - DiskBlockObjectWriter now throws an exception if `fileSegment()` is called before `commitAndClose()` has been called. - DiskBlockObjectWriter's close methods are now idempotent, so calling any of the close methods twice in a row will no longer result in incorrect shuffle write metrics changes. Calling `revertPartialWritesAndClose()` on a closed DiskBlockObjectWriter now has no effect (before, it might mess up the metrics). - The end-to-end shuffle record count metrics tests have been moved from InputOutputMetricsSuite to ShuffleSuite. This means that these tests will now be run against all shuffle implementations rather than just the default shuffle configuration. - The end-to-end metrics tests now include a test of a job which performs aggregation in the shuffle. - Our tests now check that `shuffleBytesWritten == totalShuffleBytesRead`. - FileSegment now throws IllegalArgumentException if it is constructed with a negative length or offset. Author: Josh Rosen <joshrosen@databricks.com> Closes apache#6397 from JoshRosen/external-sorter-bypass-cleanup and squashes the following commits: bf3f3f6 [Josh Rosen] Merge remote-tracking branch 'origin/master' into external-sorter-bypass-cleanup 8b216c4 [Josh Rosen] Guard against negative offsets and lengths in FileSegment 03f35a4 [Josh Rosen] Minor fix to cleanup logic. b5cc35b [Josh Rosen] Move shuffle metrics tests to ShuffleSuite. 8b8fb9e [Josh Rosen] Add more tests + defensive programming to DiskBlockObjectWriter. 16564eb [Josh Rosen] Guard against calling fileSegment() before commitAndClose() has been called. 96811b4 [Josh Rosen] Remove confusing taskMetrics.shuffleWriteMetrics() optional call 8522b6a [Josh Rosen] Do not perform a map-side sort unless we're also doing map-side aggregation 08e40f3 [Josh Rosen] Remove excessively clever (and wrong) implementation of newBuffer() d7f9938 [Josh Rosen] Add missing overrides; fix compilation 71d76ff [Josh Rosen] Update Javadoc bf0d98f [Josh Rosen] Add comment to clarify confusing factory code 5197f73 [Josh Rosen] Add missing private[this] 30ef2c8 [Josh Rosen] Convert BypassMergeSortShuffleWriter to Java bc1a820 [Josh Rosen] Fix bug when aggregator is used but map-side combine is disabled 0d3dcc0 [Josh Rosen] Remove unnecessary overloaded methods 25b964f [Josh Rosen] Rename SortShuffleSorter to SortShuffleFileWriter 0d9848c [Josh Rosen] Make it more clear that curWriteMetrics is now only used for spill metrics 7af7aea [Josh Rosen] Combine spill() and spillToMergeableFile() 6320112 [Josh Rosen] Add missing negation in deletion success check. d267e0d [Josh Rosen] Fix style issue 7f15f7b [Josh Rosen] Back out extra cleanup-handling code, since this is already covered in stop() 25aa3bd [Josh Rosen] Make sure to delete outputFile after errors. 931ca68 [Josh Rosen] Refactor tests. 6a35716 [Josh Rosen] Refactor logic for deciding when to bypass 4b03539 [Josh Rosen] Move conf prior to first use 1265b25 [Josh Rosen] Fix some style errors and comments. 02355ef [Josh Rosen] More simplification d4cb536 [Josh Rosen] Delete more unused code bb96678 [Josh Rosen] Add missing interface file b6cc1eb [Josh Rosen] Realize that bypass never buffers; proceed to delete tons of code 6185ee2 [Josh Rosen] WIP towards moving bypass code into own file. 8d0678c [Josh Rosen] Move diskBytesSpilled getter next to variable 19bccd6 [Josh Rosen] Remove duplicated buffer creation code. 18959bb [Josh Rosen] Move comparator methods closer together.
Spark's
ExternalSorter
writes shuffle output files during sort-based shuffle. Sort-shuffle contains a configuration,spark.shuffle.sort.bypassMergeThreshold
, which causes ExternalSorter to skip sorting and merging and simply write separate files per partition, which are then concatenated together to form the final map output file.The code paths used during this bypass are almost completely separate from ExternalSorter's other code paths, so refactoring them into a separate file can significantly simplify the code.
In addition to re-arranging code, this patch deletes a bunch of dead code. The main entry point into ExternalSorter is
insertAll()
and in SPARK-4479 / #3422 this method was modified to completely bypass in-memory buffering of records whenbypassMergeSort
takes effect. As a result, some of the spilling and merging code paths will no longer be called whenbypassMergeSort
is used, so we should be able to safely remove that code.There's an open JIRA (SPARK-6026) for removing the
bypassMergeThreshold
parameter and code paths; I have not done that here, but the changes in this patch will make removing that parameter significantly easier if we ever decide to do that.This patch also makes several improvements to shuffle-related tests and adds more defensive checks to certain shuffle classes:
fileSegment()
is called beforecommitAndClose()
has been called.revertPartialWritesAndClose()
on a closed DiskBlockObjectWriter now has no effect (before, it might mess up the metrics).shuffleBytesWritten == totalShuffleBytesRead
.