iceberg: add missing field to manifest_entry avro #21481
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds the file_sequence_number to the manifest_entry schema, as indicated by the spec[1].
It's unclear why the schema didn't have them in the first place (the schema was taken from DuckDB), but it's clearly defined upstream[2] and is important for efficiently writing metadata[3].
There is another field, distinct_counts, in the spec that is not in the manifest_entry schema. Interestingly enough, it doesn't appear to be defined at all in upstream Java code[4] (it is defined in iceberg-go, but not iceberg-python or iceberg-rust), so I'm leaving it out.
[1] https://iceberg.apache.org/spec/#manifests
[2] https://github.com/apache/iceberg/blob/319f29ea860e42e7cc21cda8c05d882134e6431f/core/src/main/java/org/apache/iceberg/ManifestEntry.java#L48-L49
[3] https://iceberg.apache.org/spec/#manifest-entry-fields
[4] https://github.com/apache/iceberg/blob/319f29ea860e42e7cc21cda8c05d882134e6431f/api/src/main/java/org/apache/iceberg/DataFile.java#L37-L105
Backports Required
Release Notes