Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-2672] support compressed file in wholeTextFile #3005

Closed
wants to merge 4 commits into from

Conversation

davies
Copy link
Contributor

@davies davies commented Oct 29, 2014

The wholeFile() can not read compressed files, it should be, just like textFile().

@SparkQA
Copy link

SparkQA commented Oct 29, 2014

Test build #22491 has started for PR 3005 at commit 22e8b3e.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 30, 2014

Test build #22491 has finished for PR 3005 at commit 22e8b3e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22491/
Test PASSed.

override protected def isSplitable(context: JobContext, file: Path): Boolean = false

private var conf: Configuration = _
def setConf(c: Configuration) = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the return type?

@SparkQA
Copy link

SparkQA commented Nov 2, 2014

Test build #22732 has started for PR 3005 at commit c83571a.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 2, 2014

Test build #22732 has finished for PR 3005 at commit c83571a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class DecimalType(DataType):
    • case class UnscaledValue(child: Expression) extends UnaryExpression
    • case class MakeDecimal(child: Expression, precision: Int, scale: Int) extends UnaryExpression
    • case class MutableLiteral(var value: Any, dataType: DataType, nullable: Boolean = true)
    • case class PrecisionInfo(precision: Int, scale: Int)
    • case class DecimalType(precisionInfo: Option[PrecisionInfo]) extends FractionalType
    • final class Decimal extends Ordered[Decimal] with Serializable
    • trait DecimalIsConflicted extends Numeric[Decimal]

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22732/
Test PASSed.

@davies
Copy link
Contributor Author

davies commented Nov 5, 2014

@rxin I had addressed your comments, could you take a look again?

@davies
Copy link
Contributor Author

davies commented Nov 10, 2014

@JoshRosen @mateiz Do you have time to review this?

@@ -57,8 +65,16 @@ private[spark] class WholeTextFileRecordReader(

override def nextKeyValue(): Boolean = {
if (!processed) {
val conf = new Configuration
val factory = new CompressionCodecFactory(conf);
val codec = factory.getCodec(path); // infers from file ext.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: don't need this semicolon.

@JoshRosen
Copy link
Contributor

This looks good to me! I'm going to merge this (I'll remove those semicolons on merge). Thanks!

@JoshRosen
Copy link
Contributor

(Actually, let me just re-run Jenkins, just to be safe).

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Nov 12, 2014

Test build #517 has started for PR 3005 at commit c83571a.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 12, 2014

Test build #23282 has started for PR 3005 at commit a43fcfb.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 12, 2014

Test build #517 has finished for PR 3005 at commit c83571a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 12, 2014

Test build #23282 has finished for PR 3005 at commit a43fcfb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23282/
Test PASSed.

@JoshRosen
Copy link
Contributor

Looks good to me (thanks for fixing the semicolons!). I'm going to merge this into master and 1.2.

@asfgit asfgit closed this in d7d54a4 Nov 13, 2014
asfgit pushed a commit that referenced this pull request Nov 13, 2014
The wholeFile() can not read compressed files, it should be, just like textFile().

Author: Davies Liu <davies@databricks.com>

Closes #3005 from davies/whole and squashes the following commits:

a43fcfb [Davies Liu] remove semicolon
c83571a [Davies Liu] remove = if return type is Unit
83c844f [Davies Liu] Merge branch 'master' of github.com:apache/spark into whole
22e8b3e [Davies Liu] support compressed file in wholeTextFile

(cherry picked from commit d7d54a4)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants