Skip to content

[QST] I need someone to help explain the shuffle read code in rapids (thanks very much) #5384

Answered by revans2
JustPlay asked this question in General
Discussion options

You must be logged in to vote

@andygrove please correct me if I get anything wrong in relation to AQE.
@abellina please correct anything I get wrong for the UCX based shuffle.

There are two different shuffle instances in the plugin.

The first one is based off of Spark's sql shuffle. The SQL shuffle is in turn based off of the RDD shuffle.

In the default RDD shuffle is the SortShuffleManager. For a sort the user is able to control how the data is serialized, if the data is sorted or not before it is shuffled, and how the data is partitioned. All of this is controlled by a ShuffleDependency.

In the SQL shuffle for Spark this is all wrapped in/controlled by the ShuffleExchangeExec. It will perform the partitioning ahead …

Replies: 4 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by sameerz
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
5 participants
Converted from issue

This discussion was converted from issue #865 on April 28, 2022 23:11.