[SPARK-23355][SQL] convertMetastore should not ignore table properties #20522

dongjoon-hyun · 2018-02-06T20:58:26Z

What changes were proposed in this pull request?

Previously, SPARK-22158 fixed for USING hive syntax. This PR aims to fix for STORED AS syntax. Although the test case covers ORC part, the patch considers both convertMetastoreOrc and convertMetastoreParquet.

How was this patch tested?

Pass newly added test cases.

dongjoon-hyun · 2018-02-06T21:01:47Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala

      sessionCatalog.metastoreCatalog
        .convertToLogicalRelation(relation, options, classOf[ParquetFileFormat], "parquet")
    } else {
-      val options = relation.tableMeta.storage.properties
+      val options = relation.tableMeta.properties ++ relation.tableMeta.storage.properties


@cloud-fan and @gatorsmile .
Unlike USING hive syntax, STORED AS syntax saves the property into tableMeta.properties.
This PR considers both table properties. Please see the test case for an example.

These are not the right things to do. The table properties should not be always put to the serde properties.

That is the reason why my previous PR #20120 was closed. We need a comprehensive fix for resolving all the DDL issues.

This place is only for convertMetastore. I think we need this inevitably to prevent regression.

It's surprising that Apache Spark doesn't respect table properties in convertMetastoreParquet until now. It has been true by default.

The issues have been defined in the PR description of #20120:

Currently, we ignore table-specific compression conf when the Hive serde tables are converted to the data source tables. We also ignore it when users set compression in the TBLPROPERTIES clause instead of the OPTIONS clause, even if the tables are native data source tables.

#20087 is also trying to resolve the related issues. Thus, we might still miss multiple critical bugs.

Simply adding the table properties to the options will introduce new bugs. For example, users might provide conflicting serde properties and change the properties through DDLs.

We can't blindly add the table properties to the options. This also introduces the behavior changes. For example, some table properties might have the identical names as the serde properties with different semantics.

Thus, what we can do is to do this only for the table properties we need.

Conceptually, the table properties should not take the serde-related confs. However, Hive basically does not follow the rule it sets. It is pretty strange. No idea why Hive did it.

SparkQA · 2018-02-06T22:24:32Z

Test build #87130 has finished for PR 20522 at commit 23d8205.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-02-07T00:22:00Z

Test build #87131 has finished for PR 20522 at commit 0f65eb9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-02-07T00:45:42Z

Just FYI, #19382 is to solve the issue of storage/serde properties. However the issue here is table properties. They are not related. Please create a new JIRA instead of treating it as a follow-up

gatorsmile · 2018-02-07T00:50:33Z

Also cc @cloud-fan @gengliangwang

dongjoon-hyun · 2018-02-07T01:45:07Z

I agree with you on all the issues. BTW, how can we put these existing Spark bugs into ORC migration guide?

Given that users already experience this for STORED AS PARQUET Hive table. Is it okay to mention that ORC table will do the same with Parquet table?

tgravescs · 2018-02-27T20:34:30Z

hey, sorry its unclear to me exactly what isn't support right now in 2.3. The jira and this pr mention table properties not supported but #19382 seems like it fixed that. The reason I'm wondering if to know what is/is not support in 2.3 to determine if I personally turn the config convertMetastoreOrc on or not.

dongjoon-hyun · 2018-02-27T20:38:59Z

@tgravescs .
Yep. The title of SPARK-22158 was changed recently because it only supported Table SerDe properties. If you set the ORC property into Table Storage SerDe properties (not in Table properties), it will work. This PR includes Table properties additionally.

gatorsmile · 2018-02-28T00:16:56Z

@dongjoon-hyun Could you submit the PR to resolve all the related issues?

dongjoon-hyun · 2018-02-28T00:30:51Z

Sure. I'll try, @gatorsmile . It'll take some time for me.

dongjoon-hyun · 2018-04-26T20:02:01Z

@gatorsmile . Sorry for late updating. I updated this PR by narrowing the scope of configuration key names specifically for ORC and Parquet. The test coverage is reading and writing non-partitioned tables.

SparkQA · 2018-04-26T22:22:08Z

Test build #89901 has finished for PR 20522 at commit 06a9a45.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-04-27T03:01:28Z

LGTM, merging to master!

dongjoon-hyun · 2018-04-27T03:58:56Z

Thank you for review and merge, @cloud-fan !

## What changes were proposed in this pull request? Before Apache Spark 2.3, table properties were ignored when writing data to a hive table(created with STORED AS PARQUET/ORC syntax), because the compression configurations were not passed to the FileFormatWriter in hadoopConf. Then it was fixed in #20087. But actually for CTAS with USING PARQUET/ORC syntax, table properties were ignored too when convertMastore, so the test case for CTAS not supported. Now it has been fixed in #20522 , the test case should be enabled too. ## How was this patch tested? This only re-enables the test cases of previous PR. Closes #22302 from fjh100456/compressionCodec. Authored-by: fjh100456 <fu.jinhua6@zte.com.cn> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 473f2fb) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

dongjoon-hyun mentioned this pull request Feb 6, 2018

[SPARK-23313][DOC] Add a migration guide for ORC #20484

Closed

dongjoon-hyun commented Feb 6, 2018

View reviewed changes

dongjoon-hyun changed the title ~~[SPARK-22158][SQL][FOLLOWUP] convertMetastore should not ignore table properties~~ [SPARK-23355][SQL] convertMetastore should not ignore table properties Feb 7, 2018

[SPARK-23355][SQL] convertMetastore should not ignore table properties

06a9a45

asfgit closed this in 8aa1d7b Apr 27, 2018

dongjoon-hyun deleted the SPARK-22158-2 branch April 27, 2018 03:59

This was referenced Aug 31, 2018

[SPARK-21786][SQL][FOLLOWUP] Add compressionCodec test for CTAS #22301

Closed

[SPARK-21786][SQL][FOLLOWUP] Add compressionCodec test for CTAS #22302

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-23355][SQL] convertMetastore should not ignore table properties #20522

[SPARK-23355][SQL] convertMetastore should not ignore table properties #20522

dongjoon-hyun commented Feb 6, 2018 •

edited

Loading

dongjoon-hyun Feb 6, 2018 •

edited

Loading

gatorsmile Feb 6, 2018

gatorsmile Feb 6, 2018

dongjoon-hyun Feb 6, 2018

dongjoon-hyun Feb 6, 2018

gatorsmile Feb 7, 2018 •

edited

Loading

gatorsmile Feb 7, 2018

SparkQA commented Feb 6, 2018

SparkQA commented Feb 7, 2018

gatorsmile commented Feb 7, 2018

gatorsmile commented Feb 7, 2018

dongjoon-hyun commented Feb 7, 2018 •

edited

Loading

tgravescs commented Feb 27, 2018

dongjoon-hyun commented Feb 27, 2018 •

edited

Loading

gatorsmile commented Feb 28, 2018

dongjoon-hyun commented Feb 28, 2018

dongjoon-hyun commented Apr 26, 2018

SparkQA commented Apr 26, 2018

cloud-fan commented Apr 27, 2018

dongjoon-hyun commented Apr 27, 2018

[SPARK-23355][SQL] convertMetastore should not ignore table properties #20522

[SPARK-23355][SQL] convertMetastore should not ignore table properties #20522

Conversation

dongjoon-hyun commented Feb 6, 2018 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

dongjoon-hyun Feb 6, 2018 • edited Loading

Choose a reason for hiding this comment

gatorsmile Feb 6, 2018

Choose a reason for hiding this comment

gatorsmile Feb 6, 2018

Choose a reason for hiding this comment

dongjoon-hyun Feb 6, 2018

Choose a reason for hiding this comment

dongjoon-hyun Feb 6, 2018

Choose a reason for hiding this comment

gatorsmile Feb 7, 2018 • edited Loading

Choose a reason for hiding this comment

gatorsmile Feb 7, 2018

Choose a reason for hiding this comment

SparkQA commented Feb 6, 2018

SparkQA commented Feb 7, 2018

gatorsmile commented Feb 7, 2018

gatorsmile commented Feb 7, 2018

dongjoon-hyun commented Feb 7, 2018 • edited Loading

tgravescs commented Feb 27, 2018

dongjoon-hyun commented Feb 27, 2018 • edited Loading

gatorsmile commented Feb 28, 2018

dongjoon-hyun commented Feb 28, 2018

dongjoon-hyun commented Apr 26, 2018

SparkQA commented Apr 26, 2018

cloud-fan commented Apr 27, 2018

dongjoon-hyun commented Apr 27, 2018

dongjoon-hyun commented Feb 6, 2018 •

edited

Loading

dongjoon-hyun Feb 6, 2018 •

edited

Loading

gatorsmile Feb 7, 2018 •

edited

Loading

dongjoon-hyun commented Feb 7, 2018 •

edited

Loading

dongjoon-hyun commented Feb 27, 2018 •

edited

Loading