-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build new TableMetadata
without reassigning field IDs
#919
Comments
It makes sense to be able to create an How we can do so is another question. in pyiceberg, we have a function that will reassign field-ids and return a |
Yeah, I missed the link into the main issue description, but I noticed this is also the case within the Java implementation too. I assume this carries over to all of the other language implementations as well.
Can you point me to where this is done? I can only find the regular |
For python, you can just pass in the Schema to the TableMetadata constructor we had some discussions around this topic too |
This is in part a question and open for discussion.
When building
TableMetadata
through theTableMetadataBuilder
, all options of building "from scratch" force a reassignment of field IDs:TableMetadataBuilder::new
TableMetadataBuilder::from_table_creation
, as this is a wrapper overTableMetadataBuilder::new
using theTableCreation
struct.I noticed that it would be possible to get any type of
TableMetadata
that was desired through using the object directly, but all of the fields are restricted topub(crate)
scope. I suspect the reason for this is safety, i.e. ensuring that creation occurs through the builder pattern where the relevant checks are performed on call tobuild()
.Questions:
TableMetadata
fields to bepub
1 or allow the creation ofTableMetadata
without reassigning field IDs?For extra context, we're currently constructing Iceberg metadata around pre-existing parquet files written by another system; however, there is no Iceberg catalog or prior metadata JSON. I noticed there is also a
StaticTable
; however, this requires either pre-existing JSON from FileIO or an inputTableMetadata
, this 2nd option brings us back to the above issue.This assignment leads to a mismatch in what is shown in the table metadata JSON vs the actual parquet file:
parquet schema
iceberg metadata JSON schema snippet
This reassignment occurs to the order that they appear within the parquet/arrow
Schema
, rather than the given field IDs.This is also referenced by a question in the iceberg slack
Footnotes
Considering this conflicts with the native Java implementation, I would also suspect it is problematic to do in the Rust version. ↩
The text was updated successfully, but these errors were encountered: