Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DECIMAL to VARCHAR coercion to hive #14172

Merged

Conversation

homar
Copy link
Member

@homar homar commented Sep 17, 2022

Description

Add DECIMAL to UNBOUNDED VARCHAR coercion to hive

Non-technical explanation

This is intended to allow reading data from columns which type changed from decimal to string in hive.

Release notes

( ) This is not user-visible and no release notes are required.
( ) Release notes are required, please propose a release note for me.
() Release notes are required, with the following suggested text:

# Section
* Allows reading columns which type was changed from decimal to string in hive. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Sep 17, 2022
@homar homar force-pushed the homar/add_coercion_for_hive_varchar_to_decimal branch from dc419fe to 8daacdb Compare September 17, 2022 10:25
@findepi
Copy link
Member

findepi commented Sep 17, 2022

nit: lowercase "UNBOUNDED" and "BOUNDED"

we sometimes writes type names in uppercase, but we don't write adjectives, that describe the these types, in uppercase.

@homar homar force-pushed the homar/add_coercion_for_hive_varchar_to_decimal branch 2 times, most recently from 75a0a7e to 5a2f4fd Compare September 18, 2022 11:13
@homar homar changed the title Add DECIMAL to UNBOUNDED VARCHAR coercion to hive Add DECIMAL to VARCHAR coercion to hive Sep 18, 2022
protected LongDecimalToVarcharCoercer(DecimalType fromType, int currentLength)
{
super(fromType, VARCHAR);
this.currentLength = currentLength;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lengthLimit ?

@homar homar force-pushed the homar/add_coercion_for_hive_varchar_to_decimal branch from 5a2f4fd to 6880a11 Compare September 19, 2022 08:50
@findepi findepi marked this pull request as ready for review September 19, 2022 09:16
@homar homar force-pushed the homar/add_coercion_for_hive_varchar_to_decimal branch from 6880a11 to 5d239a9 Compare September 19, 2022 09:19
public void testShortDecimalToVarcharCoercion()
{
String tableName = "test_short_decimal_to_varchar_coercion_" + randomTableSuffix();
assertUpdate("CREATE TABLE " + tableName + " (col1 DECIMAL(10,3), col2 int, col3 int) with ( FORMAT = 'PARQUET', partitioned_by = ARRAY['col3'])");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried your test with a different file format

        assertUpdate("CREATE TABLE " + tableName + " (col1 DECIMAL(23,3), col2 int, col3 int) with ( FORMAT = 'JSON', partitioned_by = ARRAY['col3'])");
        assertUpdate("INSERT INTO " + tableName + "(col1, col2, col3) VALUES(DECIMAL '100.13', 1, 1)", 1);
        query("ALTER TABLE " + tableName + " DROP COLUMN col1");
        query("ALTER TABLE " + tableName + " ADD COLUMN col1 VARCHAR");
        assertQuery("SELECT col1, col2, col3 FROM " + tableName, "VALUES ('100.13', 1, 1)");

and got a different result from PARQUET testing:

        String tableName = "test_long_decimal_to_varchar_coercion_" + randomTableSuffix();
        assertUpdate("CREATE TABLE " + tableName + " (col1 DECIMAL(23,3), col2 int, col3 int) with ( FORMAT = 'PARQUET', partitioned_by = ARRAY['col3'])");
        assertUpdate("INSERT INTO " + tableName + "(col1, col2, col3) VALUES(DECIMAL '100.13', 1, 1)", 1);
        query("ALTER TABLE " + tableName + " DROP COLUMN col1");
        query("ALTER TABLE " + tableName + " ADD COLUMN col1 VARCHAR");
        assertQuery("SELECT col1, col2, col3 FROM " + tableName, "VALUES ('100.130', 1, 1)");

Notice 100.13 for JSON and 100.130 for Parquet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a partition has decimal and table has varchar,
does Hive's eventual result depend on the partition's file format? 🤯

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some extra checking and realised that for JSON data, the HiveRecordCursor is employed and not the HivePageSource.

There is no coercing done as far as I see in the HiveRecordCursor.

assertUpdate("CREATE TABLE " + tableName + " (col1 DECIMAL(23,3), col2 int, col3 int) with ( FORMAT = 'PARQUET', partitioned_by = ARRAY['col3'])");
assertUpdate("INSERT INTO " + tableName + "(col1, col2, col3) VALUES(DECIMAL '100.13', 1, 1)", 1);
query("ALTER TABLE " + tableName + " DROP COLUMN col1");
query("ALTER TABLE " + tableName + " ADD COLUMN col1 VARCHAR(5)");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some file formats (eg CSV) are positional. Dropping column other than the last and re-adding it may have weird results (like shifting data).

Let's maybe make the interesting column the last one.

Also, do we need 3 columns, or would 2 columns be enough?
Can the columns have names more distinct than col1, col2, col3?

like dummy integer, evolved_column decimal(...) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I will drop this class entirely, these test cases are covered by product test anyway

Copy link
Member Author

@homar homar Sep 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and BTW it cannot be the last one because the last one is a partitioning column, and partitioning column has to be the last one. Also there need to be 3 columns at least as otherwise I won't be able to drop the first one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I will drop this class entirely

let me know whether you want to drop it or should i continue with review of it

Copy link
Member Author

@homar homar Sep 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After writing that comment I left it so you can review it ;)
I am trying to add other file formats here but some don't work because of the problem with dropping/adding columns

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Trino doesn't have proper support for changing column type, let's run with product tests.

Copy link
Member Author

@homar homar Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just wanted to have a test that will check that coercion to too small varchar will fail with expected exception. If you think this is not needed I will delete it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's fine not to have it

@homar homar force-pushed the homar/add_coercion_for_hive_varchar_to_decimal branch from 5d239a9 to 4c715c9 Compare September 19, 2022 12:35
@homar homar force-pushed the homar/add_coercion_for_hive_varchar_to_decimal branch 2 times, most recently from 96e34c0 to 3ae5fba Compare September 20, 2022 06:50
@findepi
Copy link
Member

findepi commented Sep 20, 2022

@homar please see the checks failure

@homar homar force-pushed the homar/add_coercion_for_hive_varchar_to_decimal branch from 3ae5fba to 4882f10 Compare September 20, 2022 08:04
Add a conversion support for the case when partition has `decimal` type
for a column and table has `varchar` type for same column.

The implementation is consistent with Hive except for truncation. Hive
truncates values when they do not fit the target type, and in this
commit we just fail.
@findepi findepi force-pushed the homar/add_coercion_for_hive_varchar_to_decimal branch from 4882f10 to 1a59bf7 Compare September 20, 2022 08:11
@findepi findepi merged commit 8fe8a91 into trinodb:master Sep 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants