-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for creating an Iceberg table from existing table content #13552
Comments
Related Iceberg API: apache/iceberg#3851 |
I'm sure you know this, but this is how you add an existing Delta table. It's a very nice feature to have. https://trino.io/docs/current/connector/delta-lake.html#creating-tables |
This works if you run One case we may need to handle that's different from delta is that there are situations where we won't be able to derive everything we need to know from the file system layer. That's why the Iceberg API also takes |
it won't work with non-deletion files either, since hive connector won't map columns by id correctly |
A table needs to be registered with current metadata file path (and perhaps with previous one too) Design question: metadata path
Design question: user's interface in SQL
|
To me, this is a administrator type of function and wouldn't be done by someone that doesn't understand the underlying components of Iceberg. To me, it would be very similar to the delta lake create table for existing data. create table (dummy) with (location=metadata file); |
Some users have complained about the |
I have couple of questions
|
There are a few possible ways to implement this feature from a user's perspective:
This option comes at the cost of introducing a new parameter for the
This option comes with the freedom to choose (if necessary) meaningful parameter names for the procedure. |
For Option 2, User needs to go and look for the latest metastore file and provide it at the CREATE table statement. But It gives the flexibility to the user to choose any/outdated metastore file (Not sure if this could be valid use case). For Option 2 OR 3, We might need to change the way how delta table gets created using existing metadata to make it in-sync with iceberg. |
@electrum @martint @alexjo2144 @phd3 @losipiuk please see #13552 (comment) and newer comments |
[Conclusion] |
I'm strongly in favor of this option. Mostly because it makes the distinction between creating a new table at a specific location unambiguously different from registering an existing table with the catalog. This is the problem users have had with Delta, a small typo in the path definition does not result in a failure but instead has unexpected consequences. The other big one is that it makes it easy to add more parameters to the register procedure if we need to later. For example, if we can't programatically decide what the most recent snapshot file is, a user could provide it. That is much easier to do with the procedure. |
I agree with @alexjo2144 on this. The behavior is entirely different and none of the properties are relevant, so reusing |
I agree with @alexjo2144 too. Seems we have agreement, awesome. |
Thanks @alexjo2144 | @electrum | @findepi, I will proceed with the new procedure: |
Use case
The content directory (data & metadata) corresponding to an Iceberg table exist on the object storage, but the table has been removed from the metastore.
Offer a way to recreate the table.
Existing workaround:
external_location
to thedata
directory of the table (I'm not sure if this plays well with Iceberg delete files )The outcome of the workaround is that all the existing content of the table has been copied to the newly created Iceberg table and also that the new Iceberg table lacks any history information.
Request
Provide a way to create an Iceberg table in Trino from existing content.
Feedback from @electrum:
The text was updated successfully, but these errors were encountered: