-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-2672] support compressed file in wholeTextFile #3005
Conversation
Test build #22491 has started for PR 3005 at commit
|
Test build #22491 has finished for PR 3005 at commit
|
Test PASSed. |
override protected def isSplitable(context: JobContext, file: Path): Boolean = false | ||
|
||
private var conf: Configuration = _ | ||
def setConf(c: Configuration) = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the return type?
Test build #22732 has started for PR 3005 at commit
|
Test build #22732 has finished for PR 3005 at commit
|
Test PASSed. |
@rxin I had addressed your comments, could you take a look again? |
@JoshRosen @mateiz Do you have time to review this? |
@@ -57,8 +65,16 @@ private[spark] class WholeTextFileRecordReader( | |||
|
|||
override def nextKeyValue(): Boolean = { | |||
if (!processed) { | |||
val conf = new Configuration | |||
val factory = new CompressionCodecFactory(conf); | |||
val codec = factory.getCodec(path); // infers from file ext. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: don't need this semicolon.
This looks good to me! I'm going to merge this (I'll remove those semicolons on merge). Thanks! |
(Actually, let me just re-run Jenkins, just to be safe). Jenkins, retest this please. |
Test build #517 has started for PR 3005 at commit
|
Test build #23282 has started for PR 3005 at commit
|
Test build #517 has finished for PR 3005 at commit
|
Test build #23282 has finished for PR 3005 at commit
|
Test PASSed. |
Looks good to me (thanks for fixing the semicolons!). I'm going to merge this into master and 1.2. |
The wholeFile() can not read compressed files, it should be, just like textFile(). Author: Davies Liu <davies@databricks.com> Closes #3005 from davies/whole and squashes the following commits: a43fcfb [Davies Liu] remove semicolon c83571a [Davies Liu] remove = if return type is Unit 83c844f [Davies Liu] Merge branch 'master' of github.com:apache/spark into whole 22e8b3e [Davies Liu] support compressed file in wholeTextFile (cherry picked from commit d7d54a4) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
The wholeFile() can not read compressed files, it should be, just like textFile().