-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YSQL] REFRESH Materialized view (non-concurrent) #10510
Comments
fizaaluthra
pushed a commit
that referenced
this issue
Dec 10, 2021
…TERIALIZED VIEW Summary: `CREATE MATERIALIZED VIEW`: - Created like any other YB relation. `REFRESH MATERIALIZED VIEW`: - In vanilla PG, a non-concurrent refresh on the materialized view entails the following: # Load the new matview data into a transient relation. # Refresh the matview by swapping it with the transient relation. The swap is done by swapping the `relfilenode` of the two relations. (Therefore the `relfilenode` of the matview is now the `OID` of the transient relation). # Reindex the new matview. - In YB mode we do the following: # Load the new matview data into a transient relation, and swap the `relfilenode` (as in vanilla PG). # We implement a `YBGetStorageRelid` utility function that returns, if valid, the `relfilenode` (which will be different from the `OID` if there has been a swap) or the `OID` itself. # Any subsequent scans/drops on the matview will now use `YBGetStorageRelid` to retrieve the `OID` of the appropriate relation in DocDB. # We reindex the matview by dropping and recreating the matview's indexes. `REFRESH MATERIALIZED VIEW CONCURRENTLY`: - In PG, a concurrent refresh entails the following: # Ensure there's a unique index on the matview. # Load the new data into a temporary table. # Ensure there are no duplicate rows without any null values (i.e., duplicate rows are only allowed when at least one column value is null). # Create a temporary diff table which tells us what rows are present in the old data and not in the new data and vice versa. - The diff table is computed as a full join of the old matview and the new temporary table on the unique index. # Delete rows in the matview where row.ctid = any old matview ctid in diff where the newdata is null. (i,e,, if the join did not find a match for this row amidst the new data, this row has been deleted). # Insert rows from the new data in the diff table where old matview ctid is null. (i,e., if the join did not find a match amidst the old data, this row has been newly inserted). - In YB mode: # We do essentially the same thing, except that since YB doesn't have ctids we use all the column values instead. # Additionally, since there aren't any ctids, if the new data has rows with **all** null columns, the joined diff table will have duplicate entries for such rows. # Therefore, in YB mode, we disallow rows with all null columns. `DROP MATERIALIZED VIEW`: - The syscatalog tuples are dropped like any other YB relation. - `YBGetStorageRelid` is used to drop the appropriate relation in DocDB. Test Plan: Jenkins: urgent, test regex: .*TestPgRegressFeature.*|.*TestPgRegressExtension.*|.*TestPgRegressPartitions.* Run the `yb_feature_matview` test using the `TestPgRegressFeature` test suite. Reviewers: myang, mihnea Reviewed By: myang, mihnea Subscribers: bogdan, yql Differential Revision: https://phabricator.dev.yugabyte.com/D13792
fizaaluthra
pushed a commit
that referenced
this issue
Jan 6, 2022
Summary: In YB mode, we restrict matviews from having rows with all null values when performing concurrent refreshes. Previously, we performed a full scan of the data to check for completely null rows, which leads to a performance hit. This diff performs the check for completely null rows on the computed diff table instead (which is a much smaller table). Performance test: Tested on centOS 7 machine with Intel Broadwell CPU. ``` CREATE TABLE base ... ; // 100 MB dataset CREATE MATERIALIZED VIEW test AS SELECT * FROM base; CREATE UNIQUE INDEX ON test ... // Unique index required for concurrent refreshes INSERT a few rows INTO base; REFRESH MATERIALIZED VIEW CONCURRENTLY base; ``` The above snippet gets stuck on the scan of the temporary table with the new data for at least 25 minutes in the current version of the code. With this diff, it takes about 3 minutes for the refresh to execute. Test Plan: Run yb_feature_matview in TestPgRegressFeature Jenkins: urgent Reviewers: mihnea Reviewed By: mihnea Subscribers: zyu, smishra, yql Differential Revision: https://phabricator.dev.yugabyte.com/D14596
Implemented in commit f69988a |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
Issue for tracking
REFRESH MATERIALIZED VIEW
. By default, the refresh is non-concurrent.The text was updated successfully, but these errors were encountered: