Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document missing Iceberg properties #23777

Merged
merged 5 commits into from
Oct 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 26 additions & 2 deletions docs/src/main/sphinx/connector/iceberg.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ implementation is used:
- Enable to allow user to call [`register_table` procedure](iceberg-register-table).
- `false`
* - `iceberg.add_files-procedure.enabled`
- Enable to allow user to call `add_files` procedure.
- Enable to allow user to call [`add_files` procedure](iceberg-add-files).
- `false`
* - `iceberg.query-partition-filter-required`
- Set to `true` to force a query to use a partition filter for schemas
Expand Down Expand Up @@ -197,7 +197,28 @@ implementation is used:
- Set to `false` to disable in-memory caching of metadata files on the
coordinator. This cache is not used when `fs.cache.enabled` is set to true.
- `true`

* - `iceberg.expire_snapshots.min-retention`
- Minimal retention period for the
[`expire_snapshot` command](iceberg-expire-snapshots).
Equivalent session property is `expire_snapshots_min_retention`.
- `7d`
* - `iceberg.remove_orphan_files.min-retention`
- Minimal retention period for the
[`remove_orphan_files` command](iceberg-remove-orphan-files).
Equivalent session property is `remove_orphan_files_min_retention`.
- `7d`
* - `iceberg.idle-writer-min-file-size`
- Minimum data written by a single partition writer before it can
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats an implementation details .. what is the effect to the user and why would they configure this. Also link to https://trino.io/docs/current/admin/properties.html#data-size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#19649

@raunaqmorarka can you pls help me summarize to explain what this config is doing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It potentially helps to reduce memory usage for writes to partitioned tables #19649
It's very low level detail, i'm not sure there is a nicer way to explain it.

be considered as idle and can be closed by the engine. Equivalent
session property is `idle_writer_min_file_size`.
- `16MB`
* - `iceberg.sorted-writing-enabled`
- Enable [sorted writing](iceberg-sorted-files) to tables with a specified sort order. Equivalent
session property is `sorted_writing_enabled`.
- `true`
* - `iceberg.split-manager-threads`
- Number of threads to use for generating splits.
mosabua marked this conversation as resolved.
Show resolved Hide resolved
- Double the number of processors on the coordinator node.
:::

(iceberg-fte-support)=
Expand Down Expand Up @@ -707,6 +728,7 @@ EXECUTE <alter-table-execute>`.
```{include} optimize.fragment
```

(iceberg-expire-snapshots)=
##### expire_snapshots

The `expire_snapshots` command removes all snapshots and all related metadata
Expand All @@ -727,6 +749,7 @@ procedure fails with a similar message: `Retention specified (1.00d) is shorter
than the minimum retention configured in the system (7.00d)`. The default value
for this property is `7d`.

(iceberg-remove-orphan-files)=
##### remove_orphan_files

The `remove_orphan_files` command removes all files from a table's data
Expand Down Expand Up @@ -1409,6 +1432,7 @@ CREATE TABLE example.testdb.customer_orders (
WITH (partitioning = ARRAY['month(order_date)', 'bucket(account_number, 10)', 'country'])
```

(iceberg-sorted-files)=
#### Sorted tables

The connector supports sorted files as a performance improvement. Data is sorted
Expand Down
35 changes: 33 additions & 2 deletions docs/src/main/sphinx/object-storage/metastores.md
Original file line number Diff line number Diff line change
Expand Up @@ -435,7 +435,12 @@ described with the following additional property:
See [AWS Glue Skip
Archive](https://iceberg.apache.org/docs/latest/aws/#skip-archive).
- `true`
:::
* - `iceberg.glue.cache-table-metadata`
- While updating the table in AWS Glue, store the table metadata with the
purpose of accelerating `information_schema.columns` and
`system.metadata.table_comments` queries.
- `true`
:::

## Iceberg-specific metastores

Expand Down Expand Up @@ -486,7 +491,10 @@ following properties:
* - `iceberg.rest-catalog.oauth2.scope`
- Scope to be used when communicating with the REST Catalog. Applicable only
when using `credential`.
:::
* - `iceberg.rest-catalog.vended-credentials-enabled`
- Use credentials provided by the REST backend for file system access.
Defaults to `false`.
:::

The following example shows a minimal catalog configuration using an Iceberg
REST metadata catalog:
Expand All @@ -512,6 +520,29 @@ The Iceberg JDBC catalog is supported for the Iceberg connector. At a minimum,
database besides PostgreSQL, a JDBC driver jar file must be placed in the plugin
directory.

:::{list-table} JDBC catalog configuration properties
:widths: 40, 60
:header-rows: 1

* - Property name
- Description
* - `iceberg.jdbc-catalog.driver-class`
- JDBC driver class name.
findinpath marked this conversation as resolved.
Show resolved Hide resolved
* - `iceberg.jdbc-catalog.connection-url`
- The URI to connect to the JDBC server.
findinpath marked this conversation as resolved.
Show resolved Hide resolved
* - `iceberg.jdbc-catalog.connection-user`
- User name for JDBC client.
* - `iceberg.jdbc-catalog.connection-password`
- Password for JDBC client.
* - `iceberg.jdbc-catalog.catalog-name`
- Iceberg JDBC metastore catalog name.
findinpath marked this conversation as resolved.
Show resolved Hide resolved
* - `iceberg.jdbc-catalog.default-warehouse-dir`
- The default warehouse directory to use for JDBC.
findinpath marked this conversation as resolved.
Show resolved Hide resolved
* - `iceberg.jdbc-catalog.schema-version`
- JDBC catalog schema version.
Valid values are `V0` or `V1`. Defaults to `V1`.
:::

:::{warning}
The JDBC catalog may have compatibility issues if Iceberg introduces breaking
changes in the future. Consider the {ref}`REST catalog
Expand Down