-
Notifications
You must be signed in to change notification settings - Fork 458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CELEBORN] Add compression for row-based shuffle #6380
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/apache/incubator-gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
if ("none" | ||
.equalsIgnoreCase( | ||
conf.get(SPARK_CELEBORN_COMPRESSION_CODEC_KEY, celebornDefaultCodec))) { | ||
conf.set(SPARK_CELEBORN_COMPRESSION_CODEC_KEY, celebornDefaultCodec); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we use a new spark conf copy to hold the configs for row-based celeborn shuffle manager ? I think it's risk to share the configs for both row-based and columnar shuffle manager, for example, the first shuffle is row-based then the conf will overwrite, and later the columnar shuffle manager would use the overwritten conf.
@@ -330,14 +351,18 @@ public <K, C> ShuffleReader<K, C> getReader( | |||
if (handle instanceof CelebornShuffleHandle) { | |||
@SuppressWarnings("unchecked") | |||
CelebornShuffleHandle<K, ?, C> h = (CelebornShuffleHandle<K, ?, C>) handle; | |||
CelebornConf readerConf = celebornConf; | |||
if (_vanillaCelebornShuffleManager != null) { | |||
readerConf = rowBasedCelebornConf; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the _vanillaCelebornShuffleManager
is null if we get row-based, can it happen ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, when gettingReader, vanillaCelebornShuffleManager
is usually null. I am looking to see if there is any way to pass this message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ulysses-you cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you @kerwin-zk
What changes were proposed in this pull request?
When using
CelebornShuffleManager
, columnar shuffle data is typically compressed on the Gluten side and sent directly to Celeborn, so compression is not needed on the Celeborn side. Therefore,spark.celeborn.client.shuffle.compression.codec
is set to none. However, when row-based shuffle occurs, this configuration causes Celeborn to skip compression, leading to data bloat.How was this patch tested?
CI