Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating Iceberg tables via CloudFormation without using the Athena API #1827

Closed
dmschauer opened this issue Oct 25, 2023 · 10 comments
Closed
Labels

Comments

@dmschauer
Copy link

Name of the resource

AWS::Glue::Table

Resource name

No response

Description

Iceberg format has been available in Athena for 1 year now, but Cloudformation still hasn't supported the creation of an Iceberg table (https://aws.amazon.com/about-aws/whats-new/2022/04/amazon-athena-acid-transactions-powered-apache-iceberg/). To create an Iceberg table the only available option is to run a DDL query directly in Athena (https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-creating-tables.html) which is not very convenient in large production environments where all cloud infrastructure is mantained in Cloudformation.

This issue #1595 already pointed out the same but unfortunately it was closed without an actual solution being implemented.

Please take a look at this response #1595 (comment) which was liked by at least 7 others as well who are still facing the original problem.

There is still no direct Cloudformation support for creating Iceberg tables and you have to go via the Athena API route which is inconvenient and unexpected.

Other Details

We are working with AWS CDK to generate our CloudFormation specifications. As a workaround we are currently doing the following: using a CustomResource that calls a Lambda Function that calls the Athena API to execute an Iceberg CREATE TABLE statement. We don't consider this a long-term solution though and only as a workaround until support was added to CloudFormation

@milashenko
Copy link

milashenko commented Nov 1, 2023

@dmschauer Does the following chunk of CloudFormation code from https://aws.amazon.com/blogs/big-data/introducing-aws-glue-crawler-and-create-table-support-for-apache-iceberg-format/ solves the issue for you?

OpenTableFormatInput:
IcebergInput:
MetadataOperation: "CREATE"
Version: "2"

@dmschauer
Copy link
Author

@milashenko Thanks for the reply, do you know how to specify this in AWS CDK? So far in our project we only use the CDK to specify CF templates

@milashenko
Copy link

@dmschauer
Copy link
Author

dmschauer commented Nov 7, 2023

Hi @milashenko thanks for the reply.

This didn't help directly in our case as we're using AWS CDK to generate the CF templates, but it gave me hope that CF actually does support Iceberg tables and it does! My bad! This issue can be closed.

For anyone else wondering and stumbling upon this issue, the below AWS CDK (Python) code can be used to construct Iceberg tables via CloudFormation without the need for any weird workarounds. The trick is specifying the open_table_format_input

from aws_cdk import (
    Stack,
    aws_glue as glue,
)

class IcebergtabletestStack(Stack):

    def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)

        iceberg_table = glue.CfnTable(
            scope=self,
            id="iceberg_example_table",
            database_name="my_database",
            table_input=glue.CfnTable.TableInputProperty(
                table_type="EXTERNAL_TABLE",
                description="Enter description here",
                name="iceberg_example_table",
                storage_descriptor=glue.CfnTable.StorageDescriptorProperty(
                    location=f"s3://<my_bucket>/iceberg_example_table/",
                    columns=[
                        glue.CfnTable.ColumnProperty(name="mycol1", type="date"),
                        glue.CfnTable.ColumnProperty(name="mycol2", type="string"),
                        glue.CfnTable.ColumnProperty(name="mycol3", type="timestamp"),
                    ],
                )
            ),
            open_table_format_input=glue.CfnTable.OpenTableFormatInputProperty(
                iceberg_input=glue.CfnTable.IcebergInputProperty(
                    metadata_operation="CREATE",
                    version="2"
                )
            )
        )

@aws-jeffrey-yang
Copy link

Closing issue, can create Iceberg tables via CF.

@oleksiiburov
Copy link

Hi @milashenko , could you please assist in creating iceberg table with partitions?
I am using the snippet provided in the article you shared: https://aws.amazon.com/blogs/big-data/introducing-aws-glue-crawler-and-create-table-support-for-apache-iceberg-format/ but also add partition keys:

        PartitionKeys:
          - Name: year
            Type: int
          - Name: month
            Type: int
          - Name: day
            Type: int

during CF stack deploy I got:

Cannot create partitions in an iceberg table (Service: AWSGlue; Status Code: 400; Error Code: InvalidInputException; Request ID: 8e0a6c4f-c48e-4ddf-adc3-b3763d812d76; Proxy: null)

@milashenko
Copy link

@oleksiiburov Unfortunately I also was unable to add partitions as part of the template. Only later with Athena Spark notebook like:

ALTER TABLE telemetry_iceberg ADD PARTITION FIELD deviceid AS deviceid
ALTER TABLE telemetry_iceberg ADD PARTITION FIELD months(date_field) AS month

More can fe found here https://iceberg.apache.org/docs/latest/spark-ddl/#partitioned-by

@sfgarcia
Copy link

Hi @dmschauer. I think the issue with Iceberg tables in CDK is not totally solved, as for now the only allowed metadata operation is CREATE (https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-glue-table-iceberginput.html). It would still be necessary to run queries in Athena if you want to add columns, change column or table names, so all the operations can't be managed through AWS CDK

@dmschauer
Copy link
Author

@sfgarcia I'm aware of that. This issue I opened here is merely about support for CREATE being in place at all and as it turned out it is (although partitioning via CloudFormation isn't supported, that's a separate issue #1866)
"Iceberg tables in CDK" being "totally solved" isn't what this issue here is supposed to be about.
I agree with what you say but I'm not sure why this information is directed at me. I'm a user as well, I don't work for AWS.

Regarding the actual problem, someone else also already opened another issue about updates to Iceberg tables not being supported by Cloudformation. I see coincidentally both of us have been tagged there (#1919 (comment))

@Smotrov
Copy link

Smotrov commented Aug 6, 2024

So far you can do like this to create an Iceberg table

  const myTable = new glue.S3Table(props.scope, 'IcebergTest2', {
    database: props.database,
    tableName: 'iceberg_test2',
    bucket: props.bucket,
    s3Prefix: 'iceberg_test2',
    dataFormat: glue.DataFormat.PARQUET,
    columns: [{
      name: 'col1',
      type: glue.Schema.STRING,
    }],
  });


  // Hack starts here to make the table Iceberg
  const cfnTable = myTable.node.defaultChild as mainGlue.CfnTable;

  cfnTable.openTableFormatInput = {
    icebergInput: {
      metadataOperation: 'CREATE',
      version: '2',
    }
  };

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Shipped
Development

No branches or pull requests

7 participants