Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support updating Iceberg table partitioning #12259

Merged
merged 1 commit into from
May 23, 2022

Conversation

alexjo2144
Copy link
Member

@alexjo2144 alexjo2144 commented May 5, 2022

Description

Add support for adding and removing partition fields from Iceberg tables using ALTER TABLE ... SET PROPERTIES

Is this change a fix, improvement, new feature, refactoring, or other?

New feature

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Iceberg connector

Related issues, pull requests, and links

Fixes: #12174
Fixes: #7580

Documentation

( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label May 5, 2022
@alexjo2144
Copy link
Member Author

alexjo2144 commented May 5, 2022

Need to add a test for changing a partition transform from year(<col>) -> month(<col>) for example

@alexjo2144 alexjo2144 requested review from findepi and findinpath May 5, 2022 13:29
@findinpath
Copy link
Contributor

        assertUpdate("CREATE TABLE " + tableName + " WITH (partitioning = ARRAY['regionkey']) AS SELECT * FROM nation WHERE nationkey < 10", 10);
        assertUpdate("ALTER TABLE " + tableName + " SET PROPERTIES partitioning = ARRAY['bogus', 'truncate(name, 1)']");
	Suppressed: java.lang.Exception: SQL: ALTER TABLE test_add_partition_column_1cpvaellfz SET PROPERTIES partitioning = ARRAY['bogus', 'truncate(name, 1)']
		at io.trino.testing.DistributedQueryRunner.execute(DistributedQueryRunner.java:529)
		... 28 more
Caused by: io.trino.spi.TrinoException: Failed to commit new table properties

is this error message friendly enough for the end user?

@findinpath
Copy link
Contributor

This new feature should land in the iceberg.rst file.

@findinpath
Copy link
Contributor

trino> CREATE TABLE iceberg.default.testp10  WITH (partitioning = ARRAY['truncate(name,1)']) AS SELECT * FROM tpch.sf1.nation WHERE nationkey < 10;
CREATE TABLE: 10 rows

trino> alter table iceberg.default.testp10 set properties partitioning = ARRAY['truncate(name,1)', 'regionkey'];
SET PROPERTIES
trino> show create table iceberg.default.testp10;
                              Create Table                              
------------------------------------------------------------------------
 CREATE TABLE iceberg.default.testp10 (                                 
    nationkey bigint,                                                   
    name varchar,                                                       
    regionkey bigint,                                                   
    comment varchar                                                     
 )                                                                      
 WITH (                                                                 
    format = 'ORC',                                                     
    format_version = 2,                                                 
    location = 'hdfs://hadoop-master:9000/user/hive/warehouse/testp10', 
    partitioning = ARRAY['regionkey']                                   
 )                                                                      
(1 row)

It seems that while updating the partitioning, the partitioning field truncate(name,1) gets lost.

This looks like a bug to me.

@alexjo2144 alexjo2144 force-pushed the iceberg/update-partitioning branch 2 times, most recently from 95d9c80 to f25be73 Compare May 5, 2022 18:19
@alexjo2144 alexjo2144 requested a review from findinpath May 5, 2022 18:20
@alexjo2144
Copy link
Member Author

Thanks for the review @findinpath

@github-actions github-actions bot added the docs label May 5, 2022
@alexjo2144 alexjo2144 force-pushed the iceberg/update-partitioning branch from f25be73 to 6aa5d3e Compare May 5, 2022 19:27
@findinpath
Copy link
Contributor

CREATE TABLE iceberg.default.testp1  WITH (partitioning = ARRAY['Truncate(name,1)']) AS SELECT * FROM tpch.sf1.nation WHERE nationkey < 10;

alter table iceberg.default.testp1  set properties partitioning = ARRAY['truncate(name   , 1 )', 'regionKey'];
Query 20220506_135716_00005_wqaig failed: Failed to commit new table properties
io.trino.spi.TrinoException: Failed to commit new table properties
	at io.trino.plugin.iceberg.IcebergMetadata.setTableProperties(IcebergMetadata.java:1196)
	at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorMetadata.setTableProperties(ClassLoaderSafeConnectorMetadata.java:438)
	at io.trino.metadata.MetadataManager.setTableProperties(MetadataManager.java:696)
	at io.trino.execution.SetPropertiesTask.setTableProperties(SetPropertiesTask.java:124)
	at io.trino.execution.SetPropertiesTask.execute(SetPropertiesTask.java:88)
	at io.trino.execution.SetPropertiesTask.execute(SetPropertiesTask.java:46)
	at io.trino.execution.DataDefinitionExecution.start(DataDefinitionExecution.java:145)
	at io.trino.execution.SqlQueryManager.createQuery(SqlQueryManager.java:243)
	at io.trino.dispatcher.LocalDispatchQuery.lambda$startExecution$7(LocalDispatchQuery.java:143)
	at io.trino.$gen.Trino_dev____20220506_135121_2.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.IllegalArgumentException: Invalid partition field declaration: truncate(name   , 1 )
	at io.trino.plugin.iceberg.PartitionFields.parsePartitionField(PartitionFields.java:73)
	at io.trino.plugin.iceberg.PartitionFields.parsePartitionFields(PartitionFields.java:54)
	at io.trino.plugin.iceberg.IcebergMetadata.updatePartitioning(IcebergMetadata.java:1223)
	at io.trino.plugin.iceberg.IcebergMetadata.setTableProperties(IcebergMetadata.java:1191)
	... 12 more

This is unrelated to this PR

 CREATE TABLE iceberg.default.testp3  WITH (partitioning = ARRAY['truncate(name   , 1)']) AS SELECT * FROM tpch.sf1.nation WHERE nationkey < 10;
Query 20220506_135829_00008_wqaig failed: Invalid partition field declaration: truncate(name   , 1)
java.lang.IllegalArgumentException: Invalid partition field declaration: truncate(name   , 1)
	at io.trino.plugin.iceberg.PartitionFields.parsePartitionField(PartitionFields.java:73)
	at io.trino.plugin.iceberg.PartitionFields.parsePartitionFields(PartitionFields.java:54)
	at io.trino.plugin.iceberg.IcebergMetadata.getNewTableLayout(IcebergMetadata.java:542)
	at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorMetadata.getNewTableLayout(ClassLoaderSafeConnectorMetadata.java:118)

@alexjo2144
Copy link
Member Author

That last point relates to #12227 I think

@alexjo2144 alexjo2144 force-pushed the iceberg/update-partitioning branch from 6aa5d3e to 1665a2e Compare May 6, 2022 19:42
@alexjo2144 alexjo2144 requested a review from findinpath May 6, 2022 19:42
@findepi
Copy link
Member

findepi commented May 10, 2022

This fixes #7580 too.
@alexjo2144 please ack the conversation there.

also, please rebase.

@alexjo2144
Copy link
Member Author

Added a limited test on $partitions but also created this Issue as follow up. #12323

@alexjo2144
Copy link
Member Author

Unless you think that's enough of an issue to try and tackle that here.

@alexjo2144 alexjo2144 force-pushed the iceberg/update-partitioning branch from f440f36 to 61b2820 Compare May 10, 2022 18:11
@alexjo2144 alexjo2144 requested a review from findepi May 10, 2022 18:12
@alexjo2144 alexjo2144 force-pushed the iceberg/update-partitioning branch from 61b2820 to 928892c Compare May 10, 2022 18:13
ALTER TABLE table_name SET PROPERTIES partitioning = ARRAY[<existing partition columns>, 'my_new_partition_column'];

The current values of a table's properties can be shown using :doc:`SHOW CREATE TABLE </sql/show-create-table>`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mosabua please review doc changes


private static Term toIcebergTerm(Schema schema, PartitionField partitionField)
{
return Expressions.transform(schema.findColumnName(partitionField.sourceId()), partitionField.transform());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kind of weird that PartitionSpec cannot be used directly in UpdatePartitionSpec and we need to go thru expressions like this.

Do you think it's something we could change in Iceberg API?
cc @rdblue @danielcweeks

(that would be a follow-up for this PR, no change requested here)

@alexjo2144 alexjo2144 force-pushed the iceberg/update-partitioning branch 2 times, most recently from a943919 to de61719 Compare May 11, 2022 18:00
@alexjo2144 alexjo2144 requested a review from findepi May 11, 2022 18:01
@alexjo2144
Copy link
Member Author

AC, thanks @findepi

@alexjo2144 alexjo2144 force-pushed the iceberg/update-partitioning branch from 6b473fd to 75a7d6c Compare May 16, 2022 15:49
@alexjo2144 alexjo2144 force-pushed the iceberg/update-partitioning branch from 75a7d6c to c94eb0e Compare May 20, 2022 14:41
@alexjo2144 alexjo2144 requested a review from findepi May 20, 2022 14:56
@findepi findepi merged commit 86db89c into trinodb:master May 23, 2022
@github-actions github-actions bot added this to the 382 milestone May 23, 2022
@findepi findepi mentioned this pull request May 23, 2022
@alexjo2144 alexjo2144 deleted the iceberg/update-partitioning branch May 23, 2022 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Support updating an Iceberg table's partition scheme Add support for partition evolution in Iceberg.
3 participants