-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set the schema case sensitivity to false for Hive #234
Conversation
cc @wmoustafa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you update the PR description with an example query (preferably simplest one), and the behavior before and after the patch?
Can we add a Hive to Rel conversion test?
run(driver, "create table table_same_column_name_a (some_id string)"); | ||
run(driver, "create table table_same_column_name_b (some_id string)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please follow the same casing convention as the rest of the test cases? (I see some are lowercase, but these seem to have slipped).
run(driver, "create table table_same_column_name_a (some_id string)"); | ||
run(driver, "create table table_same_column_name_b (some_id string)"); | ||
run(driver, | ||
"create view if not exists view_column_name_case_insensitive as\n" + " select a.some_id\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you simplify the view to the minimal one that satisfies the test case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create table hive.test.table_a (
some_id string
);
create table hive.test.table_b (
some_id string
);
These 2 views will have the same issue:
-- Hive view version 1
create or replace view test.view_ab as
select a.some_id
from test.table_a a
left join
(
select trim(some_id) AS SOME_ID
from test.table_b
) b
on a.some_id = b.some_id
where a.some_id != ''
-- Hive view version 2
create or replace view test.view_ab2 as
select a.some_id
from test.table_a a
left join
(
select some_id AS SOME_ID
from test.table_b
) b
on a.some_id = trim(b.some_id)
where a.some_id != ''
The WHERE
clause is needed in order to enforce the usage of the inner select. I don't have at hand a sample view easier to grasp than the ones above.
@@ -114,6 +114,14 @@ public static void initializeViews(HiveConf conf) throws HiveException, MetaExce | |||
run(driver, String.join("\n", "", "CREATE VIEW IF NOT EXISTS named_struct_view", "AS", | |||
"SELECT named_struct('abc', 123, 'def', 'xyz') AS named_struc", "FROM bar")); | |||
|
|||
run(driver, "create table table_same_column_name_a (some_id string)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
table_same_column_name_a
-> duplicate_column_name.tableA
(same for the other table)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did the update for the spark tests.
However the other tests follow a slightly different naming scheme for the test tables. Do you want me to adjust the naming for the other tests (hive/trino) ?
66733bd
to
1baab42
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@findinpath Thanks for this PR!
I have kicked off the i-test and will update once it finishes.
Could you please run ./gradlew spotlessApply
to fix the code format issue? Please ensure that ./gradlew clean build
succeeds before pushing.
1baab42
to
f1527ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM and all the i-tests passed.
Thanks @findinpath @findepi !
@wmoustafa @funcheetah do you want to take another look?
In the translation of views, when joining tables that have the namesake column names ,irrespective of their case, make sure that the resulting relation is using names of the columns which have unique names in order to avoid the situation where the SQL statement created contains ambiguous column names.
f1527ef
to
6ca047f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @findinpath for the PR. LGTM.
Problem description
Assume that we have two tables which contain a column with the same name:
Assume also that we have a view doing a select from the result of these table joined together and the name of the common column is used in a different case (some_id, SOME_ID):
When translating in Coral from Hive, the following result is being obtained:
As it can be seen from the query above this leads to having an ambiguous column, because there is both "some_id" and "SOME_ID" which in Hive point to the same column.
Solution description
Pointing out to coral that Hive is a case insensitive system, solves the problem by creating unique names for the columns in the selection.
When joining tables that have the same column name
,irrespective of their case, make sure that the
names of the columns have unique names to avoid
the situation where the SQL statement created
contains ambiguous column names.