Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for creating an Iceberg table from existing table content #13552

Closed
findinpath opened this issue Aug 8, 2022 · 17 comments · Fixed by #14375
Closed

Add support for creating an Iceberg table from existing table content #13552

findinpath opened this issue Aug 8, 2022 · 17 comments · Fixed by #14375
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@findinpath
Copy link
Contributor

Use case

The content directory (data & metadata) corresponding to an Iceberg table exist on the object storage, but the table has been removed from the metastore.
Offer a way to recreate the table.

Existing workaround:

  • create the table in Hive as an external table and point the external_location to the data directory of the table (I'm not sure if this plays well with Iceberg delete files )
  • execute a CTAS statement to create a new Iceberg table

The outcome of the workaround is that all the existing content of the table has been copied to the newly created Iceberg table and also that the new Iceberg table lacks any history information.

Request

Provide a way to create an Iceberg table in Trino from existing content.

Feedback from @electrum:

Maybe need an explicit external flag (to point out in the CREATE TABLE statement that we're creating the table out of existing content).

@alexjo2144
Copy link
Member

Related Iceberg API: apache/iceberg#3851

@tnatssb
Copy link

tnatssb commented Aug 9, 2022

I'm sure you know this, but this is how you add an existing Delta table. It's a very nice feature to have. https://trino.io/docs/current/connector/delta-lake.html#creating-tables

@alexjo2144
Copy link
Member

create the table in Hive as an external table and point the external_location to the data directory of the table (I'm not sure if this plays well with Iceberg delete files )

This works if you run expire_snapshots and remove_orphan_files with an expiration window of 0s first.

One case we may need to handle that's different from delta is that there are situations where we won't be able to derive everything we need to know from the file system layer. That's why the Iceberg API also takes metadataFileLocation as a parameter. For example, lets say the last snapshot in the table is # 6, and two writers tried committing 6 at the same time. It's valid for the writer that looses the commit to leave a # 6 snapshot file around, expecting it to get cleaned up later by gc.

@findepi
Copy link
Member

findepi commented Sep 19, 2022

  • create the table in Hive as an external table and point the external_location to the data directory of the table (I'm not sure if this plays well with Iceberg delete files )

it won't work with non-deletion files either, since hive connector won't map columns by id correctly

@findepi
Copy link
Member

findepi commented Sep 19, 2022

A table needs to be registered with current metadata file path (and perhaps with previous one too)

Design question: metadata path

  • Should a user be responsible for providing the path to the "current" metadata file?
    • that's more explicit
    • but also a burden on a user; they may not know actually and would just take the last metadata file from the table directory
  • Or, should the operation pick what looks like the last metadata file in the table?
    • assuming they follow the naming convention (with sequence numbers), this would work well, and we may initially fail when some files don't follow convention

Design question: user's interface in SQL
What should be the SQL statement that does that job?

  • eg CREATE TABLE WITH ... + some new table attribute, like existing_location?
  • a procedure?

@tnatssb
Copy link

tnatssb commented Sep 19, 2022

but also a burden on a user; they may not know actually and would just take the last metadata file from the table directory

To me, this is a administrator type of function and wouldn't be done by someone that doesn't understand the underlying components of Iceberg. To me, it would be very similar to the delta lake create table for existing data. create table (dummy) with (location=metadata file);

@krvikash krvikash self-assigned this Sep 19, 2022
@alexjo2144
Copy link
Member

alexjo2144 commented Sep 19, 2022

Design question: user's interface in SQL
What should be the SQL statement that does that job?

Some users have complained about the CREATE TABLE WITH (...) syntax that we used for Delta Lake. Specifically that if you have the path wrong the operation still passes but you've created a table with one dummy column. #13568

@krvikash
Copy link
Contributor

I have couple of questions

  1. At the same time can two table points to same metadata location?
  2. If Point 1 is true, What happens if we insert data into old table. Is newly created table will have the those new data too?
  3. What happens if the location provided by user does not have metadata file or provided location itself is wrong/invalid?

@findinpath
Copy link
Contributor Author

findinpath commented Sep 23, 2022

There are a few possible ways to implement this feature from a user's perspective:

CREATE TABLE iceberg.default.my_table (
  dummy bigint
)
WITH (
  location = 's3://my-bucket/my-table/'
)
  • create table with the location of the metadata file to be used for the current version of the table
CREATE TABLE iceberg.default.my_table (
  dummy bigint
)
WITH (
  table_current_metadata_location = 's3://my-bucket/my-table/metadata/xxx-xxxx-xxxx.json'
)

This option comes at the cost of introducing a new parameter for the CREATE TABLE statement which can be used only in this rather exotic use case.

CALL iceberg.system.register_table('schema-name', 'table-name', metadata-location/table-base-directory-location)

This option comes with the freedom to choose (if necessary) meaningful parameter names for the procedure.
However this approach is different from the approach used in Delta Lake for creating tables which may confuse the Trino users (see https://trino.io/docs/current/connector/delta-lake.html#creating-tables ).

@krvikash
Copy link
Contributor

  • create table with the location of the metadata file to be used for the current version of the table

For Option 2, User needs to go and look for the latest metastore file and provide it at the CREATE table statement. But It gives the flexibility to the user to choose any/outdated metastore file (Not sure if this could be valid use case).

For Option 2 OR 3, We might need to change the way how delta table gets created using existing metadata to make it in-sync with iceberg.

@findepi
Copy link
Member

findepi commented Sep 23, 2022

@electrum @martint @alexjo2144 @phd3 @losipiuk please see #13552 (comment) and newer comments

@krvikash
Copy link
Contributor

  • At the same time can two table points to same metadata location?
  • If Point 1 is true, What happens if we insert data into old table. Is newly created table will have the those new data too?
  • What happens if the location provided by user does not have metadata file or provided location itself is wrong/invalid?

[Conclusion]
Point 1 -> Yes, It could be
Point 2 -> Newly created table won't have the new data
Point 3 -> Throw exception

@alexjo2144
Copy link
Member

register the new table through a stored procedure

I'm strongly in favor of this option. Mostly because it makes the distinction between creating a new table at a specific location unambiguously different from registering an existing table with the catalog. This is the problem users have had with Delta, a small typo in the path definition does not result in a failure but instead has unexpected consequences.

The other big one is that it makes it easy to add more parameters to the register procedure if we need to later. For example, if we can't programatically decide what the most recent snapshot file is, a user could provide it. That is much easier to do with the procedure.

@electrum
Copy link
Member

I agree with @alexjo2144 on this. The behavior is entirely different and none of the properties are relevant, so reusing CREATE TABLE doesn't make sense.

@findepi
Copy link
Member

findepi commented Sep 27, 2022

I agree with @alexjo2144 too. Seems we have agreement, awesome.
Let's call this new procedure: iceberg.system.register_table.

@krvikash
Copy link
Contributor

Thanks @alexjo2144 | @electrum | @findepi, I will proceed with the new procedure: iceberg.system.register_table.

@findepi
Copy link
Member

findepi commented Oct 6, 2022

#14489 shows how useful this feature is.
Without register_table being added in #14375, it would be pretty much impossible to verify large datasets query plans on CI.

@findepi findepi added this to the 403 milestone Nov 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

Successfully merging a pull request may close this issue.

6 participants