From 28ad0fbedd0b2d3a9e3169e5fb09ca694cc710b7 Mon Sep 17 00:00:00 2001 From: Marius Grama Date: Mon, 14 Oct 2024 17:20:20 +0200 Subject: [PATCH 1/5] Document `iceberg.glue.cache-table-metadata` property --- docs/src/main/sphinx/object-storage/metastores.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/docs/src/main/sphinx/object-storage/metastores.md b/docs/src/main/sphinx/object-storage/metastores.md index 331a85e80822..7d149c666786 100644 --- a/docs/src/main/sphinx/object-storage/metastores.md +++ b/docs/src/main/sphinx/object-storage/metastores.md @@ -435,7 +435,12 @@ described with the following additional property: See [AWS Glue Skip Archive](https://iceberg.apache.org/docs/latest/aws/#skip-archive). - `true` -::: +* - `iceberg.glue.cache-table-metadata` + - While updating the table in AWS Glue, store the table metadata with the + purpose of accelerating `information_schema.columns` and + `system.metadata.table_comments` queries. + - `true` + ::: ## Iceberg-specific metastores From 923d9256a040a74386cc9f36be6fdf277922f11b Mon Sep 17 00:00:00 2001 From: Marius Grama Date: Mon, 14 Oct 2024 17:28:50 +0200 Subject: [PATCH 2/5] Document iceberg JDBC catalog properties --- .../main/sphinx/object-storage/metastores.md | 23 +++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/docs/src/main/sphinx/object-storage/metastores.md b/docs/src/main/sphinx/object-storage/metastores.md index 7d149c666786..ebb286a86446 100644 --- a/docs/src/main/sphinx/object-storage/metastores.md +++ b/docs/src/main/sphinx/object-storage/metastores.md @@ -517,6 +517,29 @@ The Iceberg JDBC catalog is supported for the Iceberg connector. At a minimum, database besides PostgreSQL, a JDBC driver jar file must be placed in the plugin directory. +:::{list-table} JDBC catalog configuration properties +:widths: 40, 60 +:header-rows: 1 + +* - Property name + - Description +* - `iceberg.jdbc-catalog.driver-class` + - JDBC driver class name. +* - `iceberg.jdbc-catalog.connection-url` + - The URI to connect to the JDBC server. +* - `iceberg.jdbc-catalog.connection-user` + - User name for JDBC client. +* - `iceberg.jdbc-catalog.connection-password` + - Password for JDBC client. +* - `iceberg.jdbc-catalog.catalog-name` + - Iceberg JDBC metastore catalog name. +* - `iceberg.jdbc-catalog.default-warehouse-dir` + - The default warehouse directory to use for JDBC. +* - `iceberg.jdbc-catalog.schema-version` + - JDBC catalog schema version. + Valid values are `V0` or `V1`. Defaults to `V1`. +::: + :::{warning} The JDBC catalog may have compatibility issues if Iceberg introduces breaking changes in the future. Consider the {ref}`REST catalog From 9a275a82dbeb46eafc2dde52903cd61dbd020ab6 Mon Sep 17 00:00:00 2001 From: Marius Grama Date: Mon, 14 Oct 2024 20:23:19 +0200 Subject: [PATCH 3/5] Document `iceberg.rest-catalog.vended-credentials-enabled` property --- docs/src/main/sphinx/object-storage/metastores.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/docs/src/main/sphinx/object-storage/metastores.md b/docs/src/main/sphinx/object-storage/metastores.md index ebb286a86446..0953d306f49f 100644 --- a/docs/src/main/sphinx/object-storage/metastores.md +++ b/docs/src/main/sphinx/object-storage/metastores.md @@ -491,7 +491,10 @@ following properties: * - `iceberg.rest-catalog.oauth2.scope` - Scope to be used when communicating with the REST Catalog. Applicable only when using `credential`. -::: +* - `iceberg.rest-catalog.vended-credentials-enabled` + - Use credentials provided by the REST backend for file system access. + Defaults to `false`. + ::: The following example shows a minimal catalog configuration using an Iceberg REST metadata catalog: From a58f5a73fa45582cf960d9a4abbbdc26ac99436d Mon Sep 17 00:00:00 2001 From: Marius Grama Date: Mon, 14 Oct 2024 20:41:04 +0200 Subject: [PATCH 4/5] Document missing Iceberg properties --- docs/src/main/sphinx/connector/iceberg.md | 26 ++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/docs/src/main/sphinx/connector/iceberg.md b/docs/src/main/sphinx/connector/iceberg.md index 261352682673..ecf520bedc0e 100644 --- a/docs/src/main/sphinx/connector/iceberg.md +++ b/docs/src/main/sphinx/connector/iceberg.md @@ -197,7 +197,28 @@ implementation is used: - Set to `false` to disable in-memory caching of metadata files on the coordinator. This cache is not used when `fs.cache.enabled` is set to true. - `true` - +* - `iceberg.expire_snapshots.min-retention` + - Minimal retention period for the + [`expire_snapshot` command](iceberg-expire-snapshots). + Equivalent session property is `expire_snapshots_min_retention`. + - `7d` +* - `iceberg.remove_orphan_files.min-retention` + - Minimal retention period for the + [`remove_orphan_files` command](iceberg-remove-orphan-files). + Equivalent session property is `remove_orphan_files_min_retention`. + - `7d` +* - `iceberg.idle-writer-min-file-size` + - Minimum data written by a single partition writer before it can + be considered as idle and can be closed by the engine. Equivalent + session property is `idle_writer_min_file_size`. + - `16MB` +* - `iceberg.sorted-writing-enabled` + - Enable [sorted writing](iceberg-sorted-files) to tables with a specified sort order. Equivalent + session property is `sorted_writing_enabled`. + - `true` +* - `iceberg.split-manager-threads` + - Number of threads to use for generating splits. + - Double the number of processors on the coordinator node. ::: (iceberg-fte-support)= @@ -707,6 +728,7 @@ EXECUTE `. ```{include} optimize.fragment ``` +(iceberg-expire-snapshots)= ##### expire_snapshots The `expire_snapshots` command removes all snapshots and all related metadata @@ -727,6 +749,7 @@ procedure fails with a similar message: `Retention specified (1.00d) is shorter than the minimum retention configured in the system (7.00d)`. The default value for this property is `7d`. +(iceberg-remove-orphan-files)= ##### remove_orphan_files The `remove_orphan_files` command removes all files from a table's data @@ -1409,6 +1432,7 @@ CREATE TABLE example.testdb.customer_orders ( WITH (partitioning = ARRAY['month(order_date)', 'bucket(account_number, 10)', 'country']) ``` +(iceberg-sorted-files)= #### Sorted tables The connector supports sorted files as a performance improvement. Data is sorted From 2f2e9ae3156bc28a34a49c5dc2bc53a781aefb8f Mon Sep 17 00:00:00 2001 From: Marius Grama Date: Wed, 16 Oct 2024 13:06:44 +0200 Subject: [PATCH 5/5] Add `add_files` procedure link --- docs/src/main/sphinx/connector/iceberg.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/src/main/sphinx/connector/iceberg.md b/docs/src/main/sphinx/connector/iceberg.md index ecf520bedc0e..9c7cc8de286e 100644 --- a/docs/src/main/sphinx/connector/iceberg.md +++ b/docs/src/main/sphinx/connector/iceberg.md @@ -169,7 +169,7 @@ implementation is used: - Enable to allow user to call [`register_table` procedure](iceberg-register-table). - `false` * - `iceberg.add_files-procedure.enabled` - - Enable to allow user to call `add_files` procedure. + - Enable to allow user to call [`add_files` procedure](iceberg-add-files). - `false` * - `iceberg.query-partition-filter-required` - Set to `true` to force a query to use a partition filter for schemas