Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document Iceberg time travel & snapshot queries #10515

Merged
merged 3 commits into from
Sep 13, 2022

Conversation

findinpath
Copy link
Contributor

@findinpath findinpath commented Jan 10, 2022

Description

Document Iceberg time travel & snapshot queries

Related issues, pull requests, and links

Fixes #12745

There are 2 ongoing PRs both trying to offer (with different approaches) the ability to do time travel on metadata tables:

If any of the above mentioned PRs land, we'll need to update the documentation.

Documentation

( ) No documentation is needed.
(x) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

@cla-bot cla-bot bot added the cla-signed label Jan 10, 2022
@findinpath findinpath force-pushed the docs/iceberg-snapshot-queries branch 2 times, most recently from 4df43ec to 4a9ae82 Compare January 10, 2022 10:40
@findepi findepi requested a review from phd3 January 10, 2022 12:34
Copy link
Member

@phd3 phd3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, % nits.

IIRC @electrum had an opinion on whether to document snapshot querying at all. but can't find relevant PR/issue now.

docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved

The snapshot queries can be also used with special tables::


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we avoid double newlines for uniformity in this file?

docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
@phd3 phd3 requested a review from electrum January 10, 2022 14:54
@hashhar
Copy link
Member

hashhar commented Jan 10, 2022

If we have plans to use the new AS OF syntax for querying snapshots then I believe we should hold on documenting this for now since it'll likely require users to change queries on their end eventually.

docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
@findinpath
Copy link
Contributor Author

If we have plans to use the new AS OF syntax for querying snapshots then I believe we should hold on documenting this for now since it'll likely require users to change queries on their end eventually.

The functionality is already provided in Trino. I see currently no harm in documenting it.

@findinpath findinpath force-pushed the docs/iceberg-snapshot-queries branch 4 times, most recently from 445b69d to a944b98 Compare January 11, 2022 11:10
@electrum
Copy link
Member

I agree with @hashhar. We should implement the new AS OF functionality for Iceberg (it is already supported in the engine) and deprecate this existing functionality. Ideally, we can remove it in the future. Documenting it now will cause more people to use it and make it harder to remove later.

@electrum
Copy link
Member

electrum commented Jan 12, 2022

The existing Iceberg syntax is problematic since the number can mean both a specific snapshot ID or an "as of" timestamp. Due to the snapshot IDs being random and overlapping with the timestamp space, a query relying on it as a snapshot may return incorrect results if the timestamp value happens to be a snapshot ID.

It should be easy to implement the AS OF functionality for Iceberg. See the new ConnectorMetadata.isSupportedVersionType() and ConnectorMetadata.getTableHandle(ConnectorSession, SchemaTableName, Optional<ConnectorTableVersion>, Optional<ConnectorTableVersion>) APIs.

@phd3
Copy link
Member

phd3 commented Jan 12, 2022

sounds good, seems like this is close: #10258 @findinpath we'd need docs after the above gets merged, so we can update this one afterwards or just create a new one.

Copy link
Contributor

@kbendick kbendick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Found my way over to some of these Iceberg documentation PRs based on questions asked in the Iceberg slack workspace. 👋

Rolling back to a previous snapshot
-----------------------------------
Snapshots
---------

Iceberg supports a "snapshot" model of data, where table snapshots are
identified by an snapshot IDs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: an snapshot -> a snapshot.

This might be a good place to move the introduction of the hidden metadata table "$snapshots" as well.

Though given you're not touching that paragraph, it's totally up to you 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified to snapshots are identified by a snapshot ID

Copy link
Contributor

@kbendick kbendick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. I missed the comment from @phd3 above. I found my way here as there was a lot of discussion in the Iceberg slack workspace on Trino docs recently. But agreed with the comments above about potential collisions with snapshot ID and timestamps.

@findinpath findinpath force-pushed the docs/iceberg-snapshot-queries branch from a944b98 to d9ec944 Compare January 19, 2022 05:29
@findinpath findinpath changed the title Docs/iceberg snapshot queries Document Iceberg time travel & snapshot queries Aug 30, 2022
@findinpath findinpath force-pushed the docs/iceberg-snapshot-queries branch from d9ec944 to 5ade27d Compare August 30, 2022 06:44
@findinpath findinpath requested a review from colebow August 30, 2022 06:47
@colebow colebow requested a review from mosabua August 31, 2022 19:09
docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
docs/src/main/sphinx/connector/iceberg.rst Show resolved Hide resolved
docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
@findinpath findinpath force-pushed the docs/iceberg-snapshot-queries branch from 5ade27d to 5bece28 Compare September 1, 2022 09:58
docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
@findinpath findinpath force-pushed the docs/iceberg-snapshot-queries branch from 5bece28 to 4210d73 Compare September 8, 2022 03:49
@findinpath findinpath requested a review from losipiuk September 12, 2022 18:38
@losipiuk losipiuk merged commit 2ecb1a6 into trinodb:master Sep 13, 2022
@github-actions github-actions bot added this to the 396 milestone Sep 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Document Iceberg time travel
8 participants