Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix varchar to numeric coercion for hive tables #20766

Merged

Conversation

Praveen2112
Copy link
Member

@Praveen2112 Praveen2112 commented Feb 20, 2024

Description

Fixes the varchar to numeric coercion for non-orc tables and also add support for those coercion in case of partitioned table.

For non ORC file format, for coercing from string/varchar

  • Tinyint/Smallint - We try to get the Integer representation of the string and then cast them to byte or short based on the data type. If the number representation is beyond the range of integer, we treat them as null
  • Integer/Bigint - We try to get the Integer/Long representation of the string, if it crosses the threshold we treat them as null.

For ORC file format, for coercing from string/varchar
If the number representation crosses the limit, we treat them as null.

For any non-varchar representation, we treat them as null.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
(x) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix varchar to numeric coercion for hive tables

@cla-bot cla-bot bot added the cla-signed label Feb 20, 2024
@Praveen2112 Praveen2112 force-pushed the praveen/varchar_to_numeric_coercer_2 branch 3 times, most recently from 928b247 to 258b683 Compare February 21, 2024 10:54
@Praveen2112 Praveen2112 marked this pull request as ready for review February 21, 2024 10:55
toType.writeLong(blockBuilder, Long.parseLong(valueToBeCoerced));
}
else {
throw new TrinoException(NOT_SUPPORTED, format("Could not create Coercer from varchar to %s", toType));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: message is a bit misleading because it may hint the user that an internal error happened, while actually the coercion is not available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually we can fail early , in the constructor if toType is not any of tinyint, smallint, integer, bigint

assertVarcharToIntegralCoercion("9223372036854775807", BIGINT, 9223372036854775807L);
assertVarcharToIntegralCoercion("-9223372036854775808", BIGINT, -9223372036854775808L);
assertVarcharToIntegralCoercion("9223372036854775808", BIGINT, null); // Greater than Long.MAX_VALUE
assertVarcharToIntegralCoercion("-9223372036854775809", BIGINT, null); // Lesser than Short.MIN_VALUE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Lesser than Short.MIN_VALUE

please adapt

@@ -152,6 +152,14 @@ protected void doTestHiveCoercion(HiveTableDefinition tableDefinition)
"long_decimal_to_varchar",
"short_decimal_to_bounded_varchar",
"long_decimal_to_bounded_varchar",
"varchar_to_tinyint",
"string_to_tinyint",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm... we've unfortunately missed covering these use-cases previously :(

good catch Praveen.

@Praveen2112 Praveen2112 force-pushed the praveen/varchar_to_numeric_coercer_2 branch from 258b683 to d3eaee1 Compare February 26, 2024 08:42
@Praveen2112
Copy link
Member Author

@findinpath Thanks for your review. AC

@Praveen2112 Praveen2112 merged commit d0e16c6 into trinodb:master Feb 26, 2024
57 checks passed
@github-actions github-actions bot added this to the 440 milestone Feb 26, 2024
@colebow
Copy link
Member

colebow commented Mar 6, 2024

Does this need a release note? I think it's fixing the PRs that were included in this release already and users never had it "unfixed," right?

@Praveen2112
Copy link
Member Author

There are two parts here -
One is we have fix the coercion for partitioned tables in hive
Other is we have added this coercion for unpartitioned tables of orc format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

6 participants