-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-33916][CORE] Fix fallback storage offset and improve compression codec test coverage #30934
Conversation
…on codec test coverage
val rdd2 = rdd1.map(x => (x % 2, 1)) | ||
val rdd3 = rdd2.reduceByKey(_ + _) | ||
assert(rdd3.collect() === Array((0, 5), (1, 5))) | ||
Seq("lz4", "snappy", "zstd").foreach { codec => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc of IO_COMPRESSION_CODEC
says that it supports lz4, lzf, snappy, and zstd
. Should we test lzf
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll add that, @MaxGekk .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the user choose a non-default codec, it causes a failure.
Hm, the bug is caused by wrong offset calculation or non-default codec?
Test build #133395 has finished for PR 30934 at commit
|
The
|
@@ -158,7 +158,7 @@ object FallbackStorage extends Logging { | |||
val name = ShuffleDataBlockId(shuffleId, mapId, NOOP_REDUCE_ID).name | |||
val dataFile = new Path(fallbackPath, s"$appId/$shuffleId/$name") | |||
val f = fallbackFileSystem.open(dataFile) | |||
val size = nextOffset - 1 - offset | |||
val size = nextOffset - offset | |||
logDebug(s"To byte array $size") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this bug, I am wondering if we want to refactor IndexShuffleBlockResolver.read
such that we can reuse it here as well ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, for Apache Spark 3.2.0, @mridulm ? Currently, this PR is aiming to provide this fix for Apache Spark 3.1.0 RC before next Monday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thanks @dongjoon-hyun !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @mridulm !
Test build #133400 has finished for PR 30934 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can consider refactoring later.
Thank you for review and approval, @HyukjinKwon . Please let me know if you have other comments, @MaxGekk and @mridulm . |
Thank you all! |
…on codec test coverage ### What changes were proposed in this pull request? This PR aims to fix offset bug and improve compression codec test coverage. ### Why are the changes needed? When the user choose a non-default codec, it causes a failure. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the extended test suite. Closes #30934 from dongjoon-hyun/SPARK-33916. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 6497ccb) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
What changes were proposed in this pull request?
This PR aims to fix offset bug and improve compression codec test coverage.
Why are the changes needed?
When the user choose a non-default codec, it causes a failure.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Pass the extended test suite.