-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow reading a stale materialized view #13484
Conversation
@@ -98,6 +100,7 @@ | |||
private boolean legacyCatalogRoles; | |||
private boolean incrementalHashArrayLoadFactorEnabled = true; | |||
private boolean allowSetViewAuthorization; | |||
private Duration materializedViewRequiredFreshness = new Duration(0, SECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that most likely should be a MV property
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should definitely be allowed as an MV property as it is much easier to think about how much staleness is acceptable for specific MVs rather than defining it globally.
In fact, I would lean against making it configurable globally at all as it seems very tricky for an admin to come up with an acceptable staleness value for all MVs. It also creates the possibility of end users unexpectedly receiving stale results without having opted into it explicitly through MV property or session property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data will likely have different retention policy, therefore it has to be defined per MV
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sopel39 @raunaqmorarka @hashhar
mistakenly replied in a different (related) thread here: #13484 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great enhancement. Thank you for adding it. LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add tests as well
@@ -2025,8 +2025,7 @@ public Optional<ConnectorOutputMetadata> finishRefreshMaterializedView( | |||
|
|||
String dependencies = sourceTableHandles.stream() | |||
.map(handle -> (IcebergTableHandle) handle) | |||
.filter(handle -> handle.getSnapshotId().isPresent()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question about this commit, if tables created by trino always have a snapshot id and CREATE MV cannot be pointed to pre-existing tables, then can this scenario arise in practise ?
Could we require that iceberg table always have a snapshot id here instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question about this commit, if tables created by trino always have a snapshot id and CREATE MV cannot be pointed to pre-existing tables, then can this scenario arise in practise ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, thanks
can we have a test which creates MV in trino on source iceberg table created by spark ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should (as part of #4832 ...)
@@ -98,6 +100,7 @@ | |||
private boolean legacyCatalogRoles; | |||
private boolean incrementalHashArrayLoadFactorEnabled = true; | |||
private boolean allowSetViewAuthorization; | |||
private Duration materializedViewRequiredFreshness = new Duration(0, SECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should definitely be allowed as an MV property as it is much easier to think about how much staleness is acceptable for specific MVs rather than defining it globally.
In fact, I would lean against making it configurable globally at all as it seems very tricky for an admin to come up with an acceptable staleness value for all MVs. It also creates the possibility of end users unexpectedly receiving stale results without having opted into it explicitly through MV property or session property.
@@ -1783,8 +1788,8 @@ protected Scope visitTable(Table table, Optional<Scope> scope) | |||
|
|||
Optional<MaterializedViewDefinition> optionalMaterializedView = metadata.getMaterializedView(session, name); | |||
if (optionalMaterializedView.isPresent()) { | |||
if (metadata.getMaterializedViewFreshness(session, name).isMaterializedViewFresh()) { | |||
// If materialized view is current, answer the query using the storage table | |||
if (isMaterializedViewSufficientlyFresh(name)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should let this be an implementation detail of the the connector's metadata.getMaterializedViewFreshness
code. Iceberg MV code could figure out the "sufficiently fresh" condition and indicate freshness to the engine through existing boolean flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to differentiate two use-cases
- staleness for explicit querying
- i didn't yet grasp why we need this, but Trino MVs accept no staleness today, unlike eg PostgreSQL's or Oracle's. In those systems, the user has control over when the view is refreshed (as in Trino), so it feels OK to query the MV directly if the user asks to, irrespective of the staleness. We chose hybrid approach, but -- for performance and cost control reasons -- users want to query stale MVs directly.
i think it's fine to have this configurable on per view basis, but I expect we will later need to add session-level controls (overrides) as well.
The question is how to we model this for per-MV. I can easily make this a connector property, but this is something we will want to define for all connectors, which calls for some unification (syntax?).
If we agree that we want to have both per-MV and per-session controls, does it matter which one we start with?
- i didn't yet grasp why we need this, but Trino MVs accept no staleness today, unlike eg PostgreSQL's or Oracle's. In those systems, the user has control over when the view is refreshed (as in Trino), so it feels OK to query the MV directly if the user asks to, irrespective of the staleness. We chose hybrid approach, but -- for performance and cost control reasons -- users want to query stale MVs directly.
- staleness for query rewrites
- this one is more tricky. I believe our MVs' "fresh or inline" philosophy was picked with query rewrites as the primary use-case (so departing from apparent experience of other systems). The query rewrites will also want to accept certain staleness, but this use-case is much more delicate, as the logic is implicit. I am sure we will want per-MV and per-session controls for this feature when we build it (but building it is not a goal for this PR)
@Override | ||
public boolean equals(Object obj) | ||
public boolean equals(Object o) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we skip variable rename or make it separate commit ?
sb.append("materializedViewFresh=").append(materializedViewFresh); | ||
sb.append('}'); | ||
return sb.toString(); | ||
return new StringJoiner(", ", MaterializedViewFreshness.class.getSimpleName() + "[", "]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StringBuilder -> StringJoiner change in separate commit ?
I’m not sure a global setting for this is appropriate. There’s usually not a one-size-fits-all value that captures the requirements for different use cases. Also, it’s a departure from the model we’ve been working towards, which is that materialized views have the semantics of a view. I.e., always fresh, either because it’s already up to date, refreshed on the fly, or inlined and computed like a normal view. We also discussed possible syntax extensions to allow a user to indicate how much staleness they are willing to tolerate on a case by case basis. |
This is the ideal situation from query engine perspective, but it's too strict in practice for end-users. For example, PostgreSQL materialized view feature https://www.postgresql.org/docs/14/rules-materializedviews.html started off from a different angle: one has explicit refreshes and implicitly accepts any amount of staleness. While our fresh-or-inline approach looks nicer, I don't think PostgreSQL's approach is absurd either. Both have their strengths and weaknesses. BTW PostgreSQL's approach seems to be directionally aligned with Oracle's. |
@@ -479,6 +482,20 @@ public FeaturesConfig setAllowSetViewAuthorization(boolean allowSetViewAuthoriza | |||
return this; | |||
} | |||
|
|||
@NotNull | |||
public Duration getMaterializedViewRequiredFreshness() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very much property of data, not a system-wide property. I also think MV refresh scheduler should be outside of core engine and engine should only be responsible for executing REFRESH or SELECT queries.
I don't think that engine needs to know freshness duration if it cannot do much about if without a scheduler.
High level question is:
- Do we need resolution as MV as view at all?
- If no, then the whole concept of freshness doesn't make sense.
- If yes, then IMO it's the connector that should be responsible for business (time-based) logic whether MV is fresh or not. For example MV freshness might be tightly coupled with refresh interval and MV can be refreshed based on regex (e.g. every day on midnight).
ef0055a
to
f8d0177
Compare
(just rebased) |
@romanvainb will follow up sooner or later |
Fixes #10606