[MINOR][DOC] update the condition description of BypassMergeSortShuffle

## What changes were proposed in this pull request? These three condition descriptions should be updated, follow #23228 : <li>no Ordering is specified,</li> <li>no Aggregator is specified, and</li> <li>the number of partitions is less than <code>spark.shuffle.sort.bypassMergeThreshold</code>. </li> 1、If the shuffle dependency specifies aggregation, but it only aggregates at the reduce-side, BypassMergeSortShuffle can still be used. 2、If the number of output partitions is spark.shuffle.sort.bypassMergeThreshold(eg.200), we can use BypassMergeSortShuffle. ## How was this patch tested? N/A Closes #23281 from lcqzte10192193/wid-lcq-1211. Authored-by: lichaoqun <li.chaoqun@zte.com.cn> Signed-off-by: Sean Owen <sean.owen@databricks.com>
apache · Dec 13, 2018 · f69998a · f69998a
1 parent f372609
commit f69998a
Showing 1 changed file with 2 additions and 3 deletions.
diff --git a/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java b/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java
@@ -58,9 +58,8 @@
  * simultaneously opens separate serializers and file streams for all partitions. As a result,
  * {@link SortShuffleManager} only selects this write path when
  * <ul>
- *    <li>no Ordering is specified,</li>
- *    <li>no Aggregator is specified, and</li>
- *    <li>the number of partitions is less than
+ *    <li>no map-side combine is specified, and</li>
+ *    <li>the number of partitions is less than or equal to
  *      <code>spark.shuffle.sort.bypassMergeThreshold</code>.</li>
  * </ul>
  *