Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4945] [SQL] Add overwrite option support for SchemaRDD.saveAsParquetFile #3780

Closed

Conversation

chenghao-intel
Copy link
Contributor

No description provided.

@SparkQA
Copy link

SparkQA commented Dec 24, 2014

Test build #24755 has started for PR 3780 at commit b776284.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 24, 2014

Test build #24755 has finished for PR 3780 at commit b776284.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24755/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Dec 24, 2014

Test build #24758 has started for PR 3780 at commit 52adf46.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 24, 2014

Test build #24758 has finished for PR 3780 at commit 52adf46.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24758/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Dec 25, 2014

Test build #24802 has started for PR 3780 at commit a7a380f.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 25, 2014

Test build #24802 has finished for PR 3780 at commit a7a380f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24802/
Test PASSed.

ParquetRelation.create(path, child, sparkContext.hadoopConfiguration, sqlContext)
// Note: overwrite=false because otherwise the metadata we just created will be deleted
InsertIntoParquetTable(relation, planLater(child), overwrite = false) :: Nil
ParquetRelation.createEmpty(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to replace create with createEmpty? I see create will do some check and then call createEmpty

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be safe, as we believe the logical plan has been resolved already during the analysis phase.

@marmbrus
Copy link
Contributor

/cc @rxin

More API questions.

@SparkQA
Copy link

SparkQA commented Jan 12, 2015

Test build #25409 has started for PR 3780 at commit 72c4a4b.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 12, 2015

Test build #25409 has finished for PR 3780 at commit 72c4a4b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25409/
Test PASSed.

} catch {
case e: IOException =>
throw new IOException(
s"Unable to clear output directory ${path}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put this on the previous line?

@rxin
Copy link
Contributor

rxin commented Jan 20, 2015

The changes lgtm. I left couple small comments. Do you mind fixing those?

@marmbrus
Copy link
Contributor

@rxin what about the question of append, overwrite, error? Do we want to expose all three for these types of interfaces?

@chenghao-intel
Copy link
Contributor Author

For my understanding, insertInto works in the ways of create new, append or overwrite already, but it doesn't give user an option, which allow throwing exception when the file already existed. Probably that's why we need the saveAsParquetFile, which in the semantic of create new if the file doesn't exist, or overwrite provided by this PR.

@SparkQA
Copy link

SparkQA commented Jan 21, 2015

Test build #25860 has started for PR 3780 at commit 8782fef.

  • This patch merges cleanly.

@chenghao-intel
Copy link
Contributor Author

Or probably we can cleanup those APIs, kind of like:

trait Mode
object Append extends Mode
object Create extends Mode
object CreateOrOverwrite extends Mode
object CreateOrAppend extends Mode

def outputFile(path: String, format: FileFormat=defaultFormat, mode: Mode = CreateOrOverwrite)
def insertIntoFile(path: String, format: FileFormat) = outputFile(path, format, CreateOrOverwrite)
def toCsv(path: String, mode: Mode) = outputFile(path, CSV, mode)
def toParquet(path: String, mode: Mode) = outputFile(path, Parquet, mode)

Any ideas?

@SparkQA
Copy link

SparkQA commented Jan 21, 2015

Test build #25860 has finished for PR 3780 at commit 8782fef.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25860/
Test PASSed.

@marmbrus
Copy link
Contributor

I kind of like a proposal like that, but we need to make it Java friendly as well.

@chenghao-intel
Copy link
Contributor Author

OK, I will close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants