-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce IcebergPageSource and IcebergColumnHandle #1675
Conversation
22f1062
to
e314705
Compare
10c214a
to
ecaba2f
Compare
presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/IcebergColumnHandle.java
Show resolved
Hide resolved
presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/IcebergColumnHandle.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/IcebergPageSourceProvider.java
Show resolved
Hide resolved
presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/IcebergSplitSource.java
Show resolved
Hide resolved
presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/IcebergPageSource.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/IcebergSplitSource.java
Show resolved
Hide resolved
} | ||
partitionKeys.add(new HivePartitionKey(name, partitionValue)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about storing a deserialized presto type and value object here itself? it reduces the propagation of untyped values a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea but for now the value stored in IcebergPartitionKey
is not technically a "presto" value. It's more an "Iceberg" value. The deserializer (IcebergPageSource#deserializePartitionValue
) not only deserializes but also converts values from Iceberg representation to Presto representation.
I think we should do the conversion firstly and then serialize/deserialize Presto values. But I prefer to do it in a future separate PR. Because the value conversion also relates to DomainConverter
and ExpressionConverter
. We should do a uniform refactoring in the future.
ecaba2f
to
46f8387
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments, otherwise looks good. Thanks for cleaning this up and fixing partition handling.
Note that I merged #1561 but that should be a one line change in IcebergMetadata.applyFilter
when you rebase.
@@ -35,7 +35,7 @@ protected boolean supportsViews() | |||
@Override | |||
public void testDelete() | |||
{ | |||
assertQueryFails("DELETE FROM orders WHERE orderkey % 2 = 0", "This connector only supports delete where one or more partitions are deleted entirely"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IcebergMetadata
still implements delete. What happens if you run it after this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It throws an exception This connector does not support updates or deletes
from the default ConnectorMetadata##getUpdateRowIdColumnHandle
.
presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/IcebergColumnHandle.java
Outdated
Show resolved
Hide resolved
@@ -93,6 +95,7 @@ | |||
public class IcebergPageSink | |||
implements ConnectorPageSink | |||
{ | |||
private static final TypeTranslator TYPE_TRANSLATOR = new HiveTypeTranslator(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's fork this into Iceberg so that we can remove the TIMESTAMP_WITH_TIME_ZONE
hack. It can be a static utility method. We should also be able to remove the binding in IcebergModule
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I've added a static utility method to TypeConverter
.
presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/IcebergPartitionKey.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/IcebergPageSourceProvider.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/IcebergPageSource.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/IcebergPageSource.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/IcebergPageSource.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/io/prestosql/plugin/iceberg/IcebergPageSource.java
Outdated
Show resolved
Hide resolved
} | ||
} | ||
|
||
private static Object deserializePartitionValue(Type type, String valueString, String name, TimeZoneKey timeZoneKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like there should be a better way to do this, especially the decimal part, but this seems like the best we can do here for now.
The parameter constraint in PartitionTable#cursor is targetted at columns in the final partition table, rather than the original data table.
46f8387
to
7b29cd8
Compare
@electrum Thank you for the feedback. I've addressed the comments. |
This commit simplifies logic and fixing bugs in case of Iceberg Partition Spec evolution.
7b29cd8
to
2ea9bce
Compare
Merged, thanks! |
This PR is ready for review. Below are some explanations upon it.
Read side
IcebergPageSource
prefills partition key columns and gets regular column values from a delegate.IcebergPageSource
.IcebergPageSource
deserializes them into Presto objects. Note that in Iceberg, only columns with primitive types can be used as the source of partitioning.Write side
HiveColumnHandle
. But on the write side we have to still do this since we use Hive's Parquet writer in Presto.testDelete
test inTestIcebergDistributed
. (Originally Presto throws an exception saying "This connector only supports delete where one or more partitions are deleted entirely" for a delete operation upon Iceberg tables. But actually, even partition delete is not supported)`Issue: #1655
Umbrella issue: #1324
cc: @wagnermarkd @electrum @Parth-Brahmbhatt