Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-25313][SQL][FOLLOW-UP] Fix InsertIntoHiveDirCommand output schema in Parquet issue #22359

Closed
wants to merge 3 commits into from
Closed

[SPARK-25313][SQL][FOLLOW-UP] Fix InsertIntoHiveDirCommand output schema in Parquet issue #22359

wants to merge 3 commits into from

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Sep 7, 2018

What changes were proposed in this pull request?

How to reproduce:

spark.sql("CREATE TABLE tbl(id long)")
spark.sql("INSERT OVERWRITE TABLE tbl VALUES 4")
spark.sql("CREATE VIEW view1 AS SELECT id FROM tbl")
spark.sql(s"INSERT OVERWRITE LOCAL DIRECTORY '/tmp/spark/parquet' " +
  "STORED AS PARQUET SELECT ID FROM view1")
spark.read.parquet("/tmp/spark/parquet").schema
scala> spark.read.parquet("/tmp/spark/parquet").schema
res10: org.apache.spark.sql.types.StructType = StructType(StructField(id,LongType,true))

The schema should be StructType(StructField(ID,LongType,true)) as we SELECT ID FROM view1.

This pr fix this issue.

How was this patch tested?

unit tests

@wangyum
Copy link
Member Author

wangyum commented Sep 7, 2018

cc @gengliangwang

@@ -803,6 +803,23 @@ class HiveDDLSuite
}
}

test("Insert overwrite directory should output correct schema") {
withSQLConf(CONVERT_METASTORE_PARQUET.key -> "false") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add withTable("tbl") { here.

Copy link
Member

@gengliangwang gengliangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the fix!

@SparkQA
Copy link

SparkQA commented Sep 7, 2018

Test build #95788 has finished for PR 22359 at commit ff78fdb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 7, 2018

Test build #95791 has finished for PR 22359 at commit 8e60b98.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Hi, @wangyum .

  • Ur, I know SPARK-25313 has some information, but could you make the PR description more complete? The following PR description is just a repetition of the title. :)
Fix InsertIntoHiveDirCommand output schema issue.
  • nit, FOLLOW-UP] -> [FOLLOW-UP]?

@@ -803,6 +803,25 @@ class HiveDDLSuite
}
}

test("Insert overwrite directory should output correct schema") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a bug fix, can we have SPARK-25313 prefix?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added here?

test("Insert overwrite Hive table should output correct schema") {

test("Create Hive table as select should output correct schema") {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR, let's handle this test case only.

@wangyum wangyum changed the title [SPARK-25313][SQL]FOLLOW-UP] Fix InsertIntoHiveDirCommand output schema issue [SPARK-25313][SQL][FOLLOW-UP] Fix InsertIntoHiveDirCommand output schema issue Sep 8, 2018
@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Sep 8, 2018

I removed my previous comment. It seems to be the Parquet behavior from the beginning of this command at 2.3.0. I was confused because it's different from ORC.

@SparkQA
Copy link

SparkQA commented Sep 8, 2018

Test build #95818 has finished for PR 22359 at commit 71f382b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Since this is related to Parquet behavior only, can we have in Parquet at the end of title specifically?

@wangyum wangyum changed the title [SPARK-25313][SQL][FOLLOW-UP] Fix InsertIntoHiveDirCommand output schema issue [SPARK-25313][SQL][FOLLOW-UP] Fix InsertIntoHiveDirCommand output schema in Parquet issue Sep 9, 2018
@wangyum
Copy link
Member Author

wangyum commented Sep 10, 2018

cc @cloud-fan

@cloud-fan
Copy link
Contributor

thanks, merging to master/2.4!

asfgit pushed a commit that referenced this pull request Sep 10, 2018
…ema in Parquet issue

## What changes were proposed in this pull request?

How to reproduce:
```scala
spark.sql("CREATE TABLE tbl(id long)")
spark.sql("INSERT OVERWRITE TABLE tbl VALUES 4")
spark.sql("CREATE VIEW view1 AS SELECT id FROM tbl")
spark.sql(s"INSERT OVERWRITE LOCAL DIRECTORY '/tmp/spark/parquet' " +
  "STORED AS PARQUET SELECT ID FROM view1")
spark.read.parquet("/tmp/spark/parquet").schema
scala> spark.read.parquet("/tmp/spark/parquet").schema
res10: org.apache.spark.sql.types.StructType = StructType(StructField(id,LongType,true))
```
The schema should be `StructType(StructField(ID,LongType,true))` as we `SELECT ID FROM view1`.

This pr fix this issue.

## How was this patch tested?

unit tests

Closes #22359 from wangyum/SPARK-25313-FOLLOW-UP.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit f8b4d5a)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@asfgit asfgit closed this in f8b4d5a Sep 10, 2018
asfgit pushed a commit that referenced this pull request Sep 11, 2018
…and output schema in Parquet issue

## What changes were proposed in this pull request?

Backport #22359 to branch-2.3.

## How was this patch tested?

unit tests

Closes #22387 from wangyum/SPARK-25313-FOLLOW-UP-branch-2.3.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants