Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8961] [SQL] Makes BaseWriterContainer.outputWriterForRow accepts InternalRow instead of Row #7331

Closed
wants to merge 2 commits into from

Conversation

liancheng
Copy link
Contributor

This is a follow-up of SPARK-8888, which also aims to optimize writing dynamic partitions.

Three more changes can be made here:

  1. Using InternalRow instead of Row in BaseWriterContainer.outputWriterForRow
  2. Using Cast expressions to convert partition columns to strings, so that we can leverage code generation.
  3. Replacing the FP-style zip and map calls with a faster imperative while loop.

@SparkQA
Copy link

SparkQA commented Jul 9, 2015

Test build #36970 has finished for PR 7331 at commit 719e63b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor Author

cc @rxin

@@ -19,6 +19,8 @@ package org.apache.spark.sql.sources

import java.util.{Date, UUID}

import scala.collection.JavaConversions._
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explicitly didn't include this wildcard implicit import because I didn't want future code to accidentally introduce a scala wrapper on the java hashmap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Converting the iterator explicitly now.

@rxin
Copy link
Contributor

rxin commented Jul 10, 2015

lgtm

@SparkQA
Copy link

SparkQA commented Jul 11, 2015

Test build #37062 has finished for PR 7331 at commit b5ab9ae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jul 11, 2015

Thanks - merging.

@rxin
Copy link
Contributor

rxin commented Jul 11, 2015

Actually couldn't merge. Not sure what's going on.

Proceed with merging pull request #7331? (y/n): y
git fetch apache-github pull/7331/head:PR_TOOL_MERGE_PR_7331
git fetch apache master:PR_TOOL_MERGE_PR_7331_MASTER
git checkout PR_TOOL_MERGE_PR_7331_MASTER
error: you need to resolve your current index first
Traceback (most recent call last):
  File "./merge_spark_pr.py", line 331, in <module>
    merge_hash = merge_pr(pr_num, target_ref)
  File "./merge_spark_pr.py", line 109, in merge_pr
    run_cmd("git checkout %s" % target_branch_name)
  File "./merge_spark_pr.py", line 80, in run_cmd
    return subprocess.check_output(cmd.split(" "))
  File "/Users/rxin/anaconda/lib/python2.7/subprocess.py", line 573, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '[u'git', u'checkout', u'PR_TOOL_MERGE_PR_7331_MASTER']' returned non-zero exit status 1

@liancheng
Copy link
Contributor Author

Let me try to merge it.

@liancheng
Copy link
Contributor Author

Merged to master.

liancheng added a commit that referenced this pull request Jul 11, 2015
…ts InternalRow instead of Row

This is a follow-up of [SPARK-8888] [1], which also aims to optimize writing dynamic partitions.

Three more changes can be made here:

1. Using `InternalRow` instead of `Row` in `BaseWriterContainer.outputWriterForRow`
2. Using `Cast` expressions to convert partition columns to strings, so that we can leverage code generation.
3. Replacing the FP-style `zip` and `map` calls with a faster imperative `while` loop.

[1]: https://issues.apache.org/jira/browse/SPARK-8888

Author: Cheng Lian <lian@databricks.com>

Closes #7331 from liancheng/spark-8961 and squashes the following commits:

b5ab9ae [Cheng Lian] Casts Java iterator to Scala iterator explicitly
719e63b [Cheng Lian] Makes BaseWriterContainer.outputWriterForRow accepts InternalRow instead of Row
@liancheng liancheng closed this Jul 12, 2015
@liancheng liancheng deleted the spark-8961 branch July 12, 2015 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants