-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add varchar to boolean coercion for hive tables #20741
Add varchar to boolean coercion for hive tables #20741
Conversation
701b4ba
to
049241a
Compare
049241a
to
c1f1ae8
Compare
assertVarcharToBooleanCoercion("OFF", false); | ||
assertVarcharToBooleanCoercion("NO", false); | ||
assertVarcharToBooleanCoercion("0", false); | ||
assertVarcharToBooleanCoercion("", false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the behaviour for space " "
in Hive? Can we add test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have replaced 'ON` with ' ' so as to verify the behavior
plugin/trino-hive/src/main/java/io/trino/plugin/hive/coercions/CoercionUtils.java
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/coercions/TestDecimalCoercers.java
Show resolved
Hide resolved
Unless the test requires additional modification by default we can set HiveTimestampPrecision to DEFAULT and not treating NaN as null.
Rename `treatNaNAsNull` to `isOrcFile` as the changes in coercion are seen only in case of ORC files
c1f1ae8
to
e1a6872
Compare
@@ -73,6 +75,9 @@ private OrcTypeTranslator() {} | |||
return Optional.of(new DateToVarcharCoercer(varcharType)); | |||
} | |||
if (isVarcharType(fromOrcType)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scaffolding with isOrcFile
is done for ORC only.
OrcPageSourceFactory
is using though OrcTypeTranslator
(for both partitioned and unpartitioned tables) where we know that we're dealing with ORC.
The CoercionUtils
class (which makes use of CoercionContext
) has actually nothing to do with ORC changes at the moment.
Instead of adding special logic for ORC in the XToYCoercer
classes, I'd be in favor of having separate coercer classes:
XToYCoercer
XToYOrcCoercer
This change would keep the XToYCoercer
classes easier to follow.
I don't feel strongly about this chnage because the coercer classes are simple at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scaffolding with isOrcFile is done for ORC only.
True
OrcPageSourceFactory is using though OrcTypeTranslator (for both partitioned and unpartitioned tables) where we know that we're dealing with ORC.
This one is partially true - Though OrcTypeTranslator is well equipped with handling the coercions for partitioned table as well - We still rely on CoercionPolicy
or CoercionUtil
to control the coercion for partitioned layer - Today the coercion are applied at a layer above the file format specfic PageSourceProvider and we don't want to split it for each file format. So still for ORC files in case of partitioned tables we rely on CoercionUtils
I'd be in favor of having separate coercer classes:
We can handle them but for boolean, double or float the changes are fairly simple so we have constrained them within a class but for numeric coercions #20766 we have implemented something like a seperate coercer for ORC
assertVarcharToBooleanCoercion("TR", true); | ||
assertVarcharToBooleanCoercion("T", true); | ||
assertVarcharToBooleanCoercion("Y", true); | ||
assertVarcharToBooleanCoercion("1", true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add test for 0
in ORC as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have added it in testVarcharToBooleanForOrc
plugin/trino-hive/src/main/java/io/trino/plugin/hive/coercions/BooleanCoercer.java
Outdated
Show resolved
Hide resolved
e1a6872
to
03262c4
Compare
@findinpath Based on this comment - I have experimented to extract a dedicated coercer for ORC |
03262c4
to
e5e30f9
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/coercions/BooleanCoercer.java
Show resolved
Hide resolved
} | ||
} | ||
|
||
public static class OrcVarcharToBooleanCoercer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is is sad that ORC is special :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True
e5e30f9
to
a4dfcf6
Compare
@kokosing AC |
Description
Add varchar to boolean coercion for hive tables.
Enable the Hive users to retrieve the information from their tables after performing queries which change varchar/ string columns to boolean.
Hive sample DDL query:
ALTER TABLE mytable CHANGE COLUMN mycolumn mycolumn boolean;
The coercion logic for non ORC files is available here - https://github.com/apache/hive/blob/4df4d75bf1e16fe0af75aad0b4179c34c07fc975/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils.java#L559
The coercion logic for ORC files is available here - https://github.com/apache/orc/blob/fb1c4cb9461d207db652fc253396e57640ed805b/java/core/src/java/org/apache/orc/impl/ConvertTreeReaderFactory.java#L567
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
(x) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: