Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iceberg Time Travel Support #2400

Closed
sungwy opened this issue Jun 18, 2024 · 2 comments · Fixed by #2426
Closed

Iceberg Time Travel Support #2400

sungwy opened this issue Jun 18, 2024 · 2 comments · Fixed by #2426
Labels

Comments

@sungwy
Copy link

sungwy commented Jun 18, 2024

The IcebergScanOperator currently doesn't support time traveling on snapshot IDs.

This issue tracks the work of introducing a snapshot ID argument to the Daft Iceberg APIs to enable scanning data at a specific snapshot ID.

@jaychia
Copy link
Contributor

jaychia commented Jun 18, 2024

Seems simple enough - we just need to forward the argument to the PyIceberg table during the scan?

self._table.scan(limit=limit, snapshot_id=...)

How does this interact with .schema? Is it possible that Iceberg tables might have different schemas at different snapshots?

@sungwy
Copy link
Author

sungwy commented Jun 19, 2024

Yes that's right. The schema id associated with a specific snapshot is stored within the snapshot metadata. When we time travel, we will be projecting the schema associated with that snapshot ID (instead of the current schema ID).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants