-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DECIMAL to VARCHAR coercion to hive #14172
Add DECIMAL to VARCHAR coercion to hive #14172
Conversation
dc419fe
to
8daacdb
Compare
nit: lowercase "UNBOUNDED" and "BOUNDED" we sometimes writes type names in uppercase, but we don't write adjectives, that describe the these types, in uppercase. |
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HivePageSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/coercions/DecimalCoercers.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/coercions/DecimalCoercers.java
Outdated
Show resolved
Hide resolved
75a0a7e
to
5a2f4fd
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/coercions/DecimalCoercers.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/coercions/DecimalCoercers.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/coercions/DecimalCoercers.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/coercions/DecimalCoercers.java
Outdated
Show resolved
Hide resolved
protected LongDecimalToVarcharCoercer(DecimalType fromType, int currentLength) | ||
{ | ||
super(fromType, VARCHAR); | ||
this.currentLength = currentLength; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lengthLimit ?
5a2f4fd
to
6880a11
Compare
6880a11
to
5d239a9
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/coercions/DecimalCoercers.java
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/coercions/DecimalCoercers.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/coercions/DecimalCoercers.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/coercions/DecimalCoercers.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestHiveDecimalToVarcharCoercion.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestHiveDecimalToVarcharCoercion.java
Outdated
Show resolved
Hide resolved
public void testShortDecimalToVarcharCoercion() | ||
{ | ||
String tableName = "test_short_decimal_to_varchar_coercion_" + randomTableSuffix(); | ||
assertUpdate("CREATE TABLE " + tableName + " (col1 DECIMAL(10,3), col2 int, col3 int) with ( FORMAT = 'PARQUET', partitioned_by = ARRAY['col3'])"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried your test with a different file format
assertUpdate("CREATE TABLE " + tableName + " (col1 DECIMAL(23,3), col2 int, col3 int) with ( FORMAT = 'JSON', partitioned_by = ARRAY['col3'])");
assertUpdate("INSERT INTO " + tableName + "(col1, col2, col3) VALUES(DECIMAL '100.13', 1, 1)", 1);
query("ALTER TABLE " + tableName + " DROP COLUMN col1");
query("ALTER TABLE " + tableName + " ADD COLUMN col1 VARCHAR");
assertQuery("SELECT col1, col2, col3 FROM " + tableName, "VALUES ('100.13', 1, 1)");
and got a different result from PARQUET
testing:
String tableName = "test_long_decimal_to_varchar_coercion_" + randomTableSuffix();
assertUpdate("CREATE TABLE " + tableName + " (col1 DECIMAL(23,3), col2 int, col3 int) with ( FORMAT = 'PARQUET', partitioned_by = ARRAY['col3'])");
assertUpdate("INSERT INTO " + tableName + "(col1, col2, col3) VALUES(DECIMAL '100.13', 1, 1)", 1);
query("ALTER TABLE " + tableName + " DROP COLUMN col1");
query("ALTER TABLE " + tableName + " ADD COLUMN col1 VARCHAR");
assertQuery("SELECT col1, col2, col3 FROM " + tableName, "VALUES ('100.130', 1, 1)");
Notice 100.13
for JSON and 100.130
for Parquet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a partition has decimal and table has varchar,
does Hive's eventual result depend on the partition's file format? 🤯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some extra checking and realised that for JSON
data, the HiveRecordCursor
is employed and not the HivePageSource
.
There is no coercing done as far as I see in the HiveRecordCursor
.
assertUpdate("CREATE TABLE " + tableName + " (col1 DECIMAL(23,3), col2 int, col3 int) with ( FORMAT = 'PARQUET', partitioned_by = ARRAY['col3'])"); | ||
assertUpdate("INSERT INTO " + tableName + "(col1, col2, col3) VALUES(DECIMAL '100.13', 1, 1)", 1); | ||
query("ALTER TABLE " + tableName + " DROP COLUMN col1"); | ||
query("ALTER TABLE " + tableName + " ADD COLUMN col1 VARCHAR(5)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some file formats (eg CSV) are positional. Dropping column other than the last and re-adding it may have weird results (like shifting data).
Let's maybe make the interesting column the last one.
Also, do we need 3 columns, or would 2 columns be enough?
Can the columns have names more distinct than col1, col2, col3?
like dummy integer, evolved_column decimal(...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I will drop this class entirely, these test cases are covered by product test anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and BTW it cannot be the last one because the last one is a partitioning column, and partitioning column has to be the last one. Also there need to be 3 columns at least as otherwise I won't be able to drop the first one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I will drop this class entirely
let me know whether you want to drop it or should i continue with review of it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After writing that comment I left it so you can review it ;)
I am trying to add other file formats here but some don't work because of the problem with dropping/adding columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Trino doesn't have proper support for changing column type, let's run with product tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just wanted to have a test that will check that coercion to too small varchar will fail with expected exception. If you think this is not needed I will delete it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it's fine not to have it
plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestHiveDecimalToVarcharCoercion.java
Outdated
Show resolved
Hide resolved
5d239a9
to
4c715c9
Compare
testing/trino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveCoercion.java
Show resolved
Hide resolved
96e34c0
to
3ae5fba
Compare
@homar please see the checks failure |
plugin/trino-hive/src/main/java/io/trino/plugin/hive/coercions/DecimalCoercers.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/coercions/DecimalCoercers.java
Outdated
Show resolved
Hide resolved
testing/trino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveCoercion.java
Outdated
Show resolved
Hide resolved
3ae5fba
to
4882f10
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/coercions/DecimalCoercers.java
Outdated
Show resolved
Hide resolved
Add a conversion support for the case when partition has `decimal` type for a column and table has `varchar` type for same column. The implementation is consistent with Hive except for truncation. Hive truncates values when they do not fit the target type, and in this commit we just fail.
4882f10
to
1a59bf7
Compare
Description
Add DECIMAL to UNBOUNDED VARCHAR coercion to hive
Non-technical explanation
This is intended to allow reading data from columns which type changed from decimal to string in hive.
Release notes
( ) This is not user-visible and no release notes are required.
( ) Release notes are required, please propose a release note for me.
() Release notes are required, with the following suggested text: