-
Notifications
You must be signed in to change notification settings - Fork 466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REFRESH
docs
#27521
REFRESH
docs
#27521
Conversation
@ggevay: in the end, the content of the pattern page was reference documentation, so I split the base you provided across multiple existing pages instead (
Can you or @sthm articulate this in a clearer way? We should document what happens when: 1. Users try to query1 a materialized view outside the refresh time, when the scheduled cluster is turned off. Also noting that I left out this bit, though I think it's important and will work it in as a follow-up:
Eventually, we should add a SQL pattern that has a practical walkthrough of a real example, and describes the logic of partitioning the data and all the hard bits (in line with this musing from @chuck-alt-delete). Footnotes |
Would it be possible to include a very simple example of a hot/warm split? A full "common patterns" guide would go into more detail, but it would be nice if the examples section of the reference documentation had a very simple implementation just to remind the user of how we'd like them to use this feature. |
I don't have more time to work on this, and think any practical example (simple as it might be) should live under SQL Patterns, not reference documentation. |
This explains what happens and why (and when) all these options are requires. It's much longer than I want it to be, but it's quite nuanced and I don't know how to shorten it without making it harder to understand: For some use cases it makes sense to trade of freshness of data and cost. For instance, for some use cases it makes sense to keep results on the most recent data (say the last week) as fresh as possible. Changes to the inputs should be reflected in the outputs as quickly as possible. But once the data is older than a week, it's tolerable for changes in the inputs to take up to 24 hours until they are reflected in the outputs. This pattern can be realized by creating a materialized view with an When queries to materialized views with For instance, the following view refreshes once a day at midnight UTC. CREATE MATERIALIZED VIEW mv_refresh_every
WITH (
-- Refresh at creation, so the view is populated ahead of
-- the first scheduled refresh on Jun 18
REFRESH AT '2024-06-17 00:00:00',
-- Refresh every day at midnight UTC
REFRESH EVERY '1 day' ALIGNED TO '2024-04-17 00:00:00'
)
... Assuming the last refresh happened on Jun 19 at midnight, queries will return, even if the cluster maintaining the view is turned off, between Jun 19 midnight until Jun 20 midnight. Queries will start to hang at Jun 20 midnight until the next refresh completed. If the cluster is running continuously, the refresh happens promptly at midnight, minimizing the time that queries hang. But, the whole purpose of refresh strategies is to remove resources from clusters in between refreshes. To avoid hanging queries when the cluster is turned off, the cluster can be configured to automatically add resources to the cluster at the next scheduled refresh of any of it's materialized views. ALTER CLUSTER my_refresh_cluster
SET (SCHEDULE = ON REFRESH); Note, however, that it may take a considerable amount of time between the refresh starts and actually completes. Let's assume that in the above example it takes 23 min to complete the refresh of the materialized view To avoid hanging queries as much as possible, the cluster can be configured to start the refresh before it is actually due. ALTER CLUSTER my_refresh_cluster
SET (SCHEDULE = ON REFRESH (REHYDRATION TIME ESTIMATE = '30 min')); With the preceding configuration, resources will already be added 30 minutes before midnight. In this way, the bulk of the required work (the so-called hydration of the materialized view) can already be done before midnight (and while queries can still return). Only right after midnight, queries may hang for a brief moment until the actual refresh is completed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a few minor fixes:
- There was this "ahead of the first scheduled refresh". Since the AT CREATION is also a scheduled refresh, this was not entirely accurate. I've changed this at several places to things like "ahead of the first EVERY refresh".
- There was "We recommend always using the
REFRESH AT CREATION
strategy withREFRESH EVERY
"- I moved this to
REFRESH EVERY
, because when usingREFRESH EVERY
is when the user should definitely see it. - I rephrased it a bit, because the wording was symmetric between
REFRESH EVERY
andREFRESH AT CREATION
, but actually only one of these makes the other recommended.
- I moved this to
- There was "and any indexes built on these views". I modified this to "and any indexes supporting these views". This is because it's ok to have indexed views that REFRESH materialized views read from. (In fact, https://github.com/MaterializeInc/accounts/issues/3 does have them, for CSE purposes.) (Also note that REFRESH materialized views are typically not indexed on the refresh cluster, and even if they are, it's only in support of other REFRESH materialized views. This is because these clusters are not always on, so these indexes are not good for serving.)
- And a few even more minor things.
One more question: Where should we work in Steffen's text? |
@morsapaes |
Thank you very much for all the feedback and improvements! Merging! |
As discussed with @morsapaes, this is a draft of the docs for
REFRESH
materialized views. It's based on https://www.notion.so/materialize/REFRESH-user-docs-draft-4a8f30b737a94619ac9f645abc9f84ceI added a separate page under "Common patterns", as discussed here. However, I couldn't figure out how to actually make a link appear under that menu item. @morsapaes , could you please help with that?
I haven't yet updated
create-materialized-view.md
. I'd like to do that after #27325 is merged.Motivation
Tips for reviewer
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.