[SPARK-8961] [SQL] Makes BaseWriterContainer.outputWriterForRow accepts InternalRow instead of Row #7331

liancheng · 2015-07-09T21:20:31Z

This is a follow-up of SPARK-8888, which also aims to optimize writing dynamic partitions.

Three more changes can be made here:

Using InternalRow instead of Row in BaseWriterContainer.outputWriterForRow
Using Cast expressions to convert partition columns to strings, so that we can leverage code generation.
Replacing the FP-style zip and map calls with a faster imperative while loop.

…ead of Row

SparkQA · 2015-07-09T23:02:39Z

Test build #36970 has finished for PR 7331 at commit 719e63b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

liancheng · 2015-07-10T18:25:33Z

cc @rxin

rxin · 2015-07-10T22:35:04Z

sql/core/src/main/scala/org/apache/spark/sql/sources/commands.scala

@@ -19,6 +19,8 @@ package org.apache.spark.sql.sources

 import java.util.{Date, UUID}

+import scala.collection.JavaConversions._


I explicitly didn't include this wildcard implicit import because I didn't want future code to accidentally introduce a scala wrapper on the java hashmap.

Got it. Converting the iterator explicitly now.

rxin · 2015-07-10T23:24:59Z

lgtm

SparkQA · 2015-07-11T00:27:39Z

Test build #37062 has finished for PR 7331 at commit b5ab9ae.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2015-07-11T00:29:26Z

Thanks - merging.

rxin · 2015-07-11T00:29:48Z

Actually couldn't merge. Not sure what's going on.

Proceed with merging pull request #7331? (y/n): y
git fetch apache-github pull/7331/head:PR_TOOL_MERGE_PR_7331
git fetch apache master:PR_TOOL_MERGE_PR_7331_MASTER
git checkout PR_TOOL_MERGE_PR_7331_MASTER
error: you need to resolve your current index first
Traceback (most recent call last):
  File "./merge_spark_pr.py", line 331, in <module>
    merge_hash = merge_pr(pr_num, target_ref)
  File "./merge_spark_pr.py", line 109, in merge_pr
    run_cmd("git checkout %s" % target_branch_name)
  File "./merge_spark_pr.py", line 80, in run_cmd
    return subprocess.check_output(cmd.split(" "))
  File "/Users/rxin/anaconda/lib/python2.7/subprocess.py", line 573, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '[u'git', u'checkout', u'PR_TOOL_MERGE_PR_7331_MASTER']' returned non-zero exit status 1

liancheng · 2015-07-11T01:14:43Z

Let me try to merge it.

liancheng · 2015-07-11T01:16:30Z

Merged to master.

…ts InternalRow instead of Row This is a follow-up of [SPARK-8888] [1], which also aims to optimize writing dynamic partitions. Three more changes can be made here: 1. Using `InternalRow` instead of `Row` in `BaseWriterContainer.outputWriterForRow` 2. Using `Cast` expressions to convert partition columns to strings, so that we can leverage code generation. 3. Replacing the FP-style `zip` and `map` calls with a faster imperative `while` loop. [1]: https://issues.apache.org/jira/browse/SPARK-8888 Author: Cheng Lian <lian@databricks.com> Closes #7331 from liancheng/spark-8961 and squashes the following commits: b5ab9ae [Cheng Lian] Casts Java iterator to Scala iterator explicitly 719e63b [Cheng Lian] Makes BaseWriterContainer.outputWriterForRow accepts InternalRow instead of Row

Makes BaseWriterContainer.outputWriterForRow accepts InternalRow inst…

719e63b

…ead of Row

rxin reviewed Jul 10, 2015
View reviewed changes

Casts Java iterator to Scala iterator explicitly

b5ab9ae

liancheng closed this Jul 12, 2015

liancheng deleted the spark-8961 branch July 12, 2015 22:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-8961] [SQL] Makes BaseWriterContainer.outputWriterForRow accepts InternalRow instead of Row #7331

[SPARK-8961] [SQL] Makes BaseWriterContainer.outputWriterForRow accepts InternalRow instead of Row #7331

liancheng commented Jul 9, 2015

SparkQA commented Jul 9, 2015

liancheng commented Jul 10, 2015

rxin Jul 10, 2015

liancheng Jul 10, 2015

rxin commented Jul 10, 2015

SparkQA commented Jul 11, 2015

rxin commented Jul 11, 2015

rxin commented Jul 11, 2015

liancheng commented Jul 11, 2015

liancheng commented Jul 11, 2015

		@@ -19,6 +19,8 @@ package org.apache.spark.sql.sources

		import java.util.{Date, UUID}

		import scala.collection.JavaConversions._

[SPARK-8961] [SQL] Makes BaseWriterContainer.outputWriterForRow accepts InternalRow instead of Row #7331

[SPARK-8961] [SQL] Makes BaseWriterContainer.outputWriterForRow accepts InternalRow instead of Row #7331

Conversation

liancheng commented Jul 9, 2015

SparkQA commented Jul 9, 2015

liancheng commented Jul 10, 2015

rxin Jul 10, 2015

Choose a reason for hiding this comment

liancheng Jul 10, 2015

Choose a reason for hiding this comment

rxin commented Jul 10, 2015

SparkQA commented Jul 11, 2015

rxin commented Jul 11, 2015

rxin commented Jul 11, 2015

liancheng commented Jul 11, 2015

liancheng commented Jul 11, 2015