Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement incremental MV refresh for Iceberg connector #20813

Closed
marton-bod opened this issue Feb 22, 2024 · 6 comments
Closed

Implement incremental MV refresh for Iceberg connector #20813

marton-bod opened this issue Feb 22, 2024 · 6 comments

Comments

@marton-bod
Copy link
Contributor

marton-bod commented Feb 22, 2024

In Trino Iceberg, currently we do a full refresh of MVs in all cases, meaning we drop all the existing data from the MV storage table and then insert all the data again (essentially a DROP + CTAS operation combined).

We could try to be smarter during refresh. While incrementally refreshing a MV can be tricky for certain operation types such as join, aggregations or window functions, there are some easy wins such as single-table, predicate-only queries. More complicated cases can be tackled in subsequent implementation phases.

For the first phase, I am preparing a patch to make use of the IncrementalAppendScan class from iceberg to only process the delta changes during refresh, for the single-table, predicate-only scenario.

@mosabua
Copy link
Member

mosabua commented Feb 23, 2024

Some of this might already be in place .. at least I remember that we had some of this but it was isolated to Iceberg tables only. It is still reflected in the doc which I am currently updating in #20665 with @martint and @findepi

Also @anjalinorwood might know more.

@marton-bod
Copy link
Contributor Author

Thanks @mosabua. I don't think incremental refresh is in place, as you can see here, the MV storage table is always truncated and then all data is reinserted (i.e. full refresh). The doc you linked is related to the grace period. Please let me know if I missed something else.

@mosabua
Copy link
Member

mosabua commented Feb 23, 2024

Yeah.. maybe you are right and the mention in the docs is just about detecting outdated data vs incrementally refreshing. The PR changes that whole section a bit since that is related to the grace period. Check out the paragraph "Updating the data in the materialized view..." in https://trino.io/docs/current/connector/iceberg.html#materialized-views and let me know if that makes sense.

@marton-bod
Copy link
Contributor Author

Yes, looking at the docs and the source code, they both suggest that there is no incremental refresh currently.

@findepi
Copy link
Member

findepi commented Feb 26, 2024

Forgive me if i close this issue as a duplicate of a pre-existing #18673

@findepi findepi closed this as completed Feb 26, 2024
@findepi findepi closed this as not planned Won't fix, can't repro, duplicate, stale Feb 26, 2024
@marton-bod
Copy link
Contributor Author

No worries @findepi, makes sense, I missed the earlier issue.
If I open a PR for the first phase implementation, are the maintainers open to reviewing it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants