[SPARK-4152] [SQL] Avoid data change in CTAS while table already existed #3013

chenghao-intel · 2014-10-30T08:25:53Z

CREATE TABLE t1 (a String);
CREATE TABLE t1 AS SELECT key FROM src; – throw exception
CREATE TABLE if not exists t1 AS SELECT key FROM src; – expect do nothing, currently it will overwrite the t1, which is incorrect.

SparkQA · 2014-10-30T08:30:00Z

Test build #22526 has started for PR 3013 at commit 6d18aa7.

This patch merges cleanly.

SparkQA · 2014-10-30T08:31:11Z

Test build #22526 has finished for PR 3013 at commit 6d18aa7.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-10-30T08:31:12Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22526/
Test FAILed.

SparkQA · 2014-10-30T11:09:48Z

Test build #22538 has started for PR 3013 at commit c1ea850.

This patch merges cleanly.

SparkQA · 2014-10-30T11:42:39Z

Test build #22538 has finished for PR 3013 at commit c1ea850.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-10-30T11:42:42Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22538/
Test FAILed.

marmbrus · 2014-10-30T19:44:10Z

I'm confused by the failure. Do you possibly have HIVE_DEV_HOME set to Hive 12 or something such that its generating the wrong golden files?

scwf · 2014-10-30T21:02:46Z

I think @chenghao-intel used hive 0.12 to generate the golden files, while Jenkins test with hive 0.13

marmbrus · 2014-10-30T21:04:10Z

Yeah, sorry. We have switched everything to Hive 13 (though we should still pass the tests when running in Hive 12 mode, otherwise they should be added to the HiveShim blacklist).

SparkQA · 2014-10-31T18:34:50Z

Test build #22619 has started for PR 3013 at commit 1acc914.

This patch does not merge cleanly.

chenghao-intel · 2014-10-31T18:45:42Z

Thank you @scwf @marmbrus , I've updated the code for unit testing based on Hive 0.13, and it passed the test locally.

SparkQA · 2014-10-31T18:47:34Z

Test build #22622 has started for PR 3013 at commit ec72555.

This patch merges cleanly.

chenghao-intel · 2014-10-31T18:49:10Z

I also noticed that the golden files changed when switching from Hive 0.12 to 0.13, probably due to the decimal type incompatible. see https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-Decimals

marmbrus · 2014-10-31T18:50:25Z

Yes, blacklist any tests the rely on fixed decimals. This will be fixed by #2983.

chenghao-intel · 2014-10-31T18:55:49Z

I don't think we need to blacklist the tests here. I've added TODO in the code. it's about the string output format for decimal.

SparkQA · 2014-10-31T19:21:42Z

Test build #22622 has finished for PR 3013 at commit ec72555.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-10-31T19:21:45Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22622/
Test FAILed.

SparkQA · 2014-10-31T19:53:03Z

Test build #22619 has finished for PR 3013 at commit 1acc914.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-10-31T19:53:06Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22619/
Test FAILed.

SparkQA · 2014-11-01T04:10:05Z

Test build #22678 has started for PR 3013 at commit 4085c67.

This patch merges cleanly.

SparkQA · 2014-11-01T05:01:17Z

Test build #22678 has finished for PR 3013 at commit 4085c67.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-01T05:01:19Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22678/
Test PASSed.

scwf · 2014-11-01T05:34:37Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala

-    "boolean" ^^^ BooleanType |
-    HiveShim.metastoreDecimal ^^^ DecimalType |
+    "boolean" ^^^ BooleanType | // TODO decimal Hive 0.12.0
+    "decimal\\((\\d+),(\\d+)\\)".r ^^^ DecimalType | // TODO decimal Hive 0.13.1


we need these todos here? this is both ok for hive 12 and 13, right?

The ".q" files contains the create table like "CREATE TABLE DECIMAL_4_1(key decimal(35,25), value int)", however, the Jenkins seems compile spark sql with Hive 0.12, hence I have to put the both patterns here.

chenghao-intel · 2014-11-02T14:52:36Z

Thank you @scwf @marmbrus for reviewing the code. Actually I am a little confused with the hive versions in Spark SQL. Seems the golden files are based on Hive 0.13, but Jenkins compiles the code with Hive 0.12. This why I have to put some of the work around code with TODOs.

marmbrus · 2014-11-02T22:54:00Z

Jenkins compiles with both versions just to make sure that we aren't breaking backwards compatibility (Hive 12 first). Ideally, we'll set up another job to run the test for Hive 12 in parallel or at least periodically, but for now running both would take too much time.

In terms of semantics I think it is too much overhead to try to faithfully mimic both versions since the primary goal here is metastore compatibility. Thus, the query tests are based on Hive13 and the golden answers are too. It is possible to run nearly of the tests with the Hive12 library too, though in places we act like 13 even though we are compiling with the 12 library. In the few cases where we can't run a given test with both versions of the library there is a special blacklist in the shim.

Full Hive 13 decimal support is now merged, so hopefully we can remove all the special cases from this PR.

chenghao-intel · 2014-11-03T01:02:16Z

Thank you @marmbrus . I've updated the code just for the bug fixing. And will create another PRs for the Hive compatibility testing.

SparkQA · 2014-11-03T01:05:11Z

Test build #22785 has started for PR 3013 at commit 194113e.

This patch merges cleanly.

SparkQA · 2014-11-03T02:04:17Z

Test build #22785 has finished for PR 3013 at commit 194113e.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- abstract class GenericStrategy[PhysicalPlan <: TreeNode[PhysicalPlan]] extends Logging
- trait RunnableCommand extends logical.Command
- case class ExecutedCommand(cmd: RunnableCommand) extends SparkPlan
- protected case class Keyword(str: String)
- sys.error(s"Failed to load class for data source: $provider")
- case class EqualTo(attribute: String, value: Any) extends Filter
- case class GreaterThan(attribute: String, value: Any) extends Filter
- case class GreaterThanOrEqual(attribute: String, value: Any) extends Filter
- case class LessThan(attribute: String, value: Any) extends Filter
- case class LessThanOrEqual(attribute: String, value: Any) extends Filter
- trait RelationProvider
- abstract class BaseRelation
- abstract class TableScan extends BaseRelation
- abstract class PrunedScan extends BaseRelation
- abstract class PrunedFilteredScan extends BaseRelation

AmplabJenkins · 2014-11-03T02:04:20Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22785/
Test PASSed.

marmbrus · 2014-11-03T22:00:28Z

Thanks! Merged to master.

CREATE TABLE t1 (a String); CREATE TABLE t1 AS SELECT key FROM src; – throw exception CREATE TABLE if not exists t1 AS SELECT key FROM src; – expect do nothing, currently it will overwrite the t1, which is incorrect. Author: Cheng Hao <hao.cheng@intel.com> Closes #3013 from chenghao-intel/ctas_unittest and squashes the following commits: 194113e [Cheng Hao] fix bug in CTAS when table already existed (cherry picked from commit e83f13e) Signed-off-by: Michael Armbrust <michael@databricks.com>

chenghao-intel force-pushed the ctas_unittest branch from c1ea850 to 1acc914 Compare October 31, 2014 18:28

chenghao-intel force-pushed the ctas_unittest branch from 1acc914 to ec72555 Compare October 31, 2014 18:44

scwf reviewed Nov 1, 2014
View reviewed changes

fix bug in CTAS when table already existed

194113e

chenghao-intel force-pushed the ctas_unittest branch from 4085c67 to 194113e Compare November 3, 2014 00:59

asfgit closed this in e83f13e Nov 3, 2014

chenghao-intel deleted the ctas_unittest branch June 9, 2015 01:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-4152] [SQL] Avoid data change in CTAS while table already existed #3013

[SPARK-4152] [SQL] Avoid data change in CTAS while table already existed #3013

chenghao-intel commented Oct 30, 2014

SparkQA commented Oct 30, 2014

SparkQA commented Oct 30, 2014

AmplabJenkins commented Oct 30, 2014

SparkQA commented Oct 30, 2014

SparkQA commented Oct 30, 2014

AmplabJenkins commented Oct 30, 2014

marmbrus commented Oct 30, 2014

scwf commented Oct 30, 2014

marmbrus commented Oct 30, 2014

SparkQA commented Oct 31, 2014

chenghao-intel commented Oct 31, 2014

SparkQA commented Oct 31, 2014

chenghao-intel commented Oct 31, 2014

marmbrus commented Oct 31, 2014

chenghao-intel commented Oct 31, 2014

SparkQA commented Oct 31, 2014

AmplabJenkins commented Oct 31, 2014

SparkQA commented Oct 31, 2014

AmplabJenkins commented Oct 31, 2014

SparkQA commented Nov 1, 2014

SparkQA commented Nov 1, 2014

AmplabJenkins commented Nov 1, 2014

scwf Nov 1, 2014

chenghao-intel Nov 2, 2014

chenghao-intel commented Nov 2, 2014

marmbrus commented Nov 2, 2014

chenghao-intel commented Nov 3, 2014

SparkQA commented Nov 3, 2014

SparkQA commented Nov 3, 2014

AmplabJenkins commented Nov 3, 2014

marmbrus commented Nov 3, 2014

[SPARK-4152] [SQL] Avoid data change in CTAS while table already existed #3013

[SPARK-4152] [SQL] Avoid data change in CTAS while table already existed #3013

Conversation

chenghao-intel commented Oct 30, 2014

SparkQA commented Oct 30, 2014

SparkQA commented Oct 30, 2014

AmplabJenkins commented Oct 30, 2014

SparkQA commented Oct 30, 2014

SparkQA commented Oct 30, 2014

AmplabJenkins commented Oct 30, 2014

marmbrus commented Oct 30, 2014

scwf commented Oct 30, 2014

marmbrus commented Oct 30, 2014

SparkQA commented Oct 31, 2014

chenghao-intel commented Oct 31, 2014

SparkQA commented Oct 31, 2014

chenghao-intel commented Oct 31, 2014

marmbrus commented Oct 31, 2014

chenghao-intel commented Oct 31, 2014

SparkQA commented Oct 31, 2014

AmplabJenkins commented Oct 31, 2014

SparkQA commented Oct 31, 2014

AmplabJenkins commented Oct 31, 2014

SparkQA commented Nov 1, 2014

SparkQA commented Nov 1, 2014

AmplabJenkins commented Nov 1, 2014

scwf Nov 1, 2014

Choose a reason for hiding this comment

chenghao-intel Nov 2, 2014

Choose a reason for hiding this comment

chenghao-intel commented Nov 2, 2014

marmbrus commented Nov 2, 2014

chenghao-intel commented Nov 3, 2014

SparkQA commented Nov 3, 2014

SparkQA commented Nov 3, 2014

AmplabJenkins commented Nov 3, 2014

marmbrus commented Nov 3, 2014