Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IntLogicalTypeAnnotation in parquet writer #18301

Merged
merged 1 commit into from
Jul 17, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@
import static java.lang.String.format;
import static java.util.Objects.requireNonNull;
import static org.apache.parquet.schema.LogicalTypeAnnotation.decimalType;
import static org.apache.parquet.schema.LogicalTypeAnnotation.intType;
import static org.apache.parquet.schema.Type.Repetition.OPTIONAL;
import static org.apache.parquet.schema.Type.Repetition.REQUIRED;

Expand Down Expand Up @@ -144,8 +145,23 @@ private static org.apache.parquet.schema.Type getPrimitiveType(
if (BOOLEAN.equals(type)) {
return Types.primitive(PrimitiveType.PrimitiveTypeName.BOOLEAN, repetition).named(name);
}
if (INTEGER.equals(type) || SMALLINT.equals(type) || TINYINT.equals(type)) {
return Types.primitive(PrimitiveType.PrimitiveTypeName.INT32, repetition).named(name);
// https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#signed-integers
// INT(32, true) and INT(64, true) are implied by the int32 and int64 primitive types if no other annotation is present.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to test is against databricks or hive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestHiveCompatibility and TestHiveSparkCompatibility test hive/spark reading files written by trino.
Worst case other readers should ignore this annotation and behave as before.

// Implementations may use these annotations to produce smaller in-memory representations when reading data.
if (TINYINT.equals(type)) {
return Types.primitive(PrimitiveType.PrimitiveTypeName.INT32, repetition)
.as(intType(8, true))
.named(name);
}
if (SMALLINT.equals(type)) {
return Types.primitive(PrimitiveType.PrimitiveTypeName.INT32, repetition)
.as(intType(16, true))
.named(name);
}
if (INTEGER.equals(type)) {
return Types.primitive(PrimitiveType.PrimitiveTypeName.INT32, repetition)
.as(intType(32, true))
.named(name);
}
if (type instanceof DecimalType decimalType) {
// Apache Hive version 3 or lower does not support reading decimals encoded as INT32/INT64
Expand All @@ -170,7 +186,9 @@ private static org.apache.parquet.schema.Type getPrimitiveType(
return Types.primitive(PrimitiveType.PrimitiveTypeName.INT32, repetition).as(LogicalTypeAnnotation.dateType()).named(name);
}
if (BIGINT.equals(type)) {
return Types.primitive(PrimitiveType.PrimitiveTypeName.INT64, repetition).named(name);
return Types.primitive(PrimitiveType.PrimitiveTypeName.INT64, repetition)
.as(intType(64, true))
.named(name);
}

if (type instanceof TimestampType timestampType) {
Expand Down