-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Metrics mode when creating/writing Iceberg tables #9791
Comments
It is done for ORC, for parquet it is not yet done. |
This sounds generally correct, but some data volumes will suffer from such a change. Wonder whether it could be possible to come up with an automated decision making process for this. Knowing the data between files would be even better. That may be some statistical information that we gather from time to time on a table. |
We don't want the presence of column level metrics to be user configurable. |
By default all metrics [nullvaluecounts, nanvaluecounts, upper-lower-bounds, plus more] are persisted in manifests files leading to manifest bloat, occupying around 80% or more of the manifest contents. We should customize the metrics stats and keep the ones which customers usually query by.
We should support:
The text was updated successfully, but these errors were encountered: