[YSQL] REFRESH Materialized view (non-concurrent) #10510

fizaaluthra · 2021-11-05T18:55:23Z

Description

Issue for tracking REFRESH MATERIALIZED VIEW. By default, the refresh is non-concurrent.

The text was updated successfully, but these errors were encountered:

…TERIALIZED VIEW Summary: `CREATE MATERIALIZED VIEW`: - Created like any other YB relation. `REFRESH MATERIALIZED VIEW`: - In vanilla PG, a non-concurrent refresh on the materialized view entails the following: # Load the new matview data into a transient relation. # Refresh the matview by swapping it with the transient relation. The swap is done by swapping the `relfilenode` of the two relations. (Therefore the `relfilenode` of the matview is now the `OID` of the transient relation). # Reindex the new matview. - In YB mode we do the following: # Load the new matview data into a transient relation, and swap the `relfilenode` (as in vanilla PG). # We implement a `YBGetStorageRelid` utility function that returns, if valid, the `relfilenode` (which will be different from the `OID` if there has been a swap) or the `OID` itself. # Any subsequent scans/drops on the matview will now use `YBGetStorageRelid` to retrieve the `OID` of the appropriate relation in DocDB. # We reindex the matview by dropping and recreating the matview's indexes. `REFRESH MATERIALIZED VIEW CONCURRENTLY`: - In PG, a concurrent refresh entails the following: # Ensure there's a unique index on the matview. # Load the new data into a temporary table. # Ensure there are no duplicate rows without any null values (i.e., duplicate rows are only allowed when at least one column value is null). # Create a temporary diff table which tells us what rows are present in the old data and not in the new data and vice versa. - The diff table is computed as a full join of the old matview and the new temporary table on the unique index. # Delete rows in the matview where row.ctid = any old matview ctid in diff where the newdata is null. (i,e,, if the join did not find a match for this row amidst the new data, this row has been deleted). # Insert rows from the new data in the diff table where old matview ctid is null. (i,e., if the join did not find a match amidst the old data, this row has been newly inserted). - In YB mode: # We do essentially the same thing, except that since YB doesn't have ctids we use all the column values instead. # Additionally, since there aren't any ctids, if the new data has rows with **all** null columns, the joined diff table will have duplicate entries for such rows. # Therefore, in YB mode, we disallow rows with all null columns. `DROP MATERIALIZED VIEW`: - The syscatalog tuples are dropped like any other YB relation. - `YBGetStorageRelid` is used to drop the appropriate relation in DocDB. Test Plan: Jenkins: urgent, test regex: .*TestPgRegressFeature.*|.*TestPgRegressExtension.*|.*TestPgRegressPartitions.* Run the `yb_feature_matview` test using the `TestPgRegressFeature` test suite. Reviewers: myang, mihnea Reviewed By: myang, mihnea Subscribers: bogdan, yql Differential Revision: https://phabricator.dev.yugabyte.com/D13792

Summary: In YB mode, we restrict matviews from having rows with all null values when performing concurrent refreshes. Previously, we performed a full scan of the data to check for completely null rows, which leads to a performance hit. This diff performs the check for completely null rows on the computed diff table instead (which is a much smaller table). Performance test: Tested on centOS 7 machine with Intel Broadwell CPU. ``` CREATE TABLE base ... ; // 100 MB dataset CREATE MATERIALIZED VIEW test AS SELECT * FROM base; CREATE UNIQUE INDEX ON test ... // Unique index required for concurrent refreshes INSERT a few rows INTO base; REFRESH MATERIALIZED VIEW CONCURRENTLY base; ``` The above snippet gets stuck on the scan of the temporary table with the new data for at least 25 minutes in the current version of the code. With this diff, it takes about 3 minutes for the refresh to execute. Test Plan: Run yb_feature_matview in TestPgRegressFeature Jenkins: urgent Reviewers: mihnea Reviewed By: mihnea Subscribers: zyu, smishra, yql Differential Revision: https://phabricator.dev.yugabyte.com/D14596

fizaaluthra · 2022-01-12T15:50:05Z

Implemented in commit f69988a

fizaaluthra added the area/ysql Yugabyte SQL (YSQL) label Nov 5, 2021

fizaaluthra self-assigned this Nov 5, 2021

fizaaluthra mentioned this issue Nov 5, 2021

[YSQL] Support for Materialized Views #10102

Closed

sushantrmishra added the pgcm label Jan 12, 2022

fizaaluthra closed this as completed Jan 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[YSQL] REFRESH Materialized view (non-concurrent) #10510

[YSQL] REFRESH Materialized view (non-concurrent) #10510

fizaaluthra commented Nov 5, 2021

fizaaluthra commented Jan 12, 2022

[YSQL] REFRESH Materialized view (non-concurrent) #10510

[YSQL] REFRESH Materialized view (non-concurrent) #10510

Comments

fizaaluthra commented Nov 5, 2021

Description

fizaaluthra commented Jan 12, 2022