-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-4152] [SQL] Avoid data change in CTAS while table already existed #3013
Conversation
Test build #22526 has started for PR 3013 at commit
|
Test build #22526 has finished for PR 3013 at commit
|
Test FAILed. |
Test build #22538 has started for PR 3013 at commit
|
Test build #22538 has finished for PR 3013 at commit
|
Test FAILed. |
I'm confused by the failure. Do you possibly have HIVE_DEV_HOME set to Hive 12 or something such that its generating the wrong golden files? |
I think @chenghao-intel used hive 0.12 to generate the golden files, while Jenkins test with hive 0.13 |
Yeah, sorry. We have switched everything to Hive 13 (though we should still pass the tests when running in Hive 12 mode, otherwise they should be added to the HiveShim blacklist). |
c1ea850
to
1acc914
Compare
Test build #22619 has started for PR 3013 at commit
|
1acc914
to
ec72555
Compare
Test build #22622 has started for PR 3013 at commit
|
I also noticed that the golden files changed when switching from Hive 0.12 to 0.13, probably due to the decimal type incompatible. see https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-Decimals |
Yes, blacklist any tests the rely on fixed decimals. This will be fixed by #2983. |
I don't think we need to blacklist the tests here. I've added TODO in the code. it's about the string output format for decimal. |
Test build #22622 has finished for PR 3013 at commit
|
Test FAILed. |
Test build #22619 has finished for PR 3013 at commit
|
Test FAILed. |
Test build #22678 has started for PR 3013 at commit
|
Test build #22678 has finished for PR 3013 at commit
|
Test PASSed. |
"boolean" ^^^ BooleanType | | ||
HiveShim.metastoreDecimal ^^^ DecimalType | | ||
"boolean" ^^^ BooleanType | // TODO decimal Hive 0.12.0 | ||
"decimal\\((\\d+),(\\d+)\\)".r ^^^ DecimalType | // TODO decimal Hive 0.13.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need these todos here? this is both ok for hive 12 and 13, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ".q" files contains the create table like "CREATE TABLE DECIMAL_4_1(key decimal(35,25), value int)", however, the Jenkins seems compile spark sql with Hive 0.12, hence I have to put the both patterns here.
Jenkins compiles with both versions just to make sure that we aren't breaking backwards compatibility (Hive 12 first). Ideally, we'll set up another job to run the test for Hive 12 in parallel or at least periodically, but for now running both would take too much time. In terms of semantics I think it is too much overhead to try to faithfully mimic both versions since the primary goal here is metastore compatibility. Thus, the query tests are based on Hive13 and the golden answers are too. It is possible to run nearly of the tests with the Hive12 library too, though in places we act like 13 even though we are compiling with the 12 library. In the few cases where we can't run a given test with both versions of the library there is a special blacklist in the shim. Full Hive 13 decimal support is now merged, so hopefully we can remove all the special cases from this PR. |
4085c67
to
194113e
Compare
Thank you @marmbrus . I've updated the code just for the bug fixing. And will create another PRs for the Hive compatibility testing. |
Test build #22785 has started for PR 3013 at commit
|
Test build #22785 has finished for PR 3013 at commit
|
Test PASSed. |
Thanks! Merged to master. |
CREATE TABLE t1 (a String); CREATE TABLE t1 AS SELECT key FROM src; – throw exception CREATE TABLE if not exists t1 AS SELECT key FROM src; – expect do nothing, currently it will overwrite the t1, which is incorrect. Author: Cheng Hao <hao.cheng@intel.com> Closes #3013 from chenghao-intel/ctas_unittest and squashes the following commits: 194113e [Cheng Hao] fix bug in CTAS when table already existed (cherry picked from commit e83f13e) Signed-off-by: Michael Armbrust <michael@databricks.com>
CREATE TABLE t1 (a String);
CREATE TABLE t1 AS SELECT key FROM src; – throw exception
CREATE TABLE if not exists t1 AS SELECT key FROM src; – expect do nothing, currently it will overwrite the t1, which is incorrect.