Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[#4415] YSQL: Use default value for existing rows after ALTER TABLE A…
…DD COLUMN ... DEFAULT Summary: Presently, in YB, we disallow this operation for volatile default values: ``` yugabyte=# ALTER TABLE test ADD COLUMN y float DEFAULT random(); ERROR: Rewriting of YB table is not yet implemented ``` Meanwhile, for non-volatile values, we do not backfill existing rows. The new column’s value for the existing rows is null. This departs from PG’s semantics and can also lead to a constraint violation if the new column has a NOT NULL constraint. Note: For new rows, the default value is inserted correctly. ``` yugabyte=# CREATE TABLE test (t int); CREATE TABLE yugabyte=# INSERT INTO test VALUES (1); INSERT 0 1 yugabyte=# ALTER TABLE test ADD COLUMN y int DEFAULT 2 NOT NULL; ALTER TABLE yugabyte=# SELECT * FROM test; t | y ---+--- 1 | -- in PG, y would have value of 2 here. This is also a constraint violation. (1 row) yugabyte=# INSERT INTO test(t) VALUES (2); INSERT 0 1 yugabyte=# SELECT * FROM test; t | y ---+--- 1 | -- in PG, y would have value of 2 here 2 | 2 -- Newly inserted rows correctly use the default value. (2 rows) ``` This diff fixes this issue by introducing a new field (missing_val) which will be stored in the column's metadata (ColumnSchema and for packed rows in ColumnPackingData as well). We do **not** backfill the existing rows. The default value will be filled in on-the-fly if the column value is missing for a row. Implementation details: High-level design: - Evaluate the non-volatile default expression in the PG layer. - Augment the existing Alter Table Add Column flow to take an optional missing value, and pass it down to DocDB. - Store the missing value in the newly added column’s schema (`ColumnSchema`). - When doc_reader reads a row (using the latest schema) and encounters a missing column for the row, it will look at the column schema to see if there is a missing default value for the column. If there is, it will fill in the missing default value (instead of null as it currently does). Regular storage: - For this feature, we need a way to distinguish whether the column entry is missing, or if a null was explicitly inserted by the user. - Presently, null entries are stored as kTombstone records. These tombstone records are compacted away after the retention history period has exceeded. - Therefore, for columns that have a missing value, we will store null entries as kNullLow. - Note: It does not matter what we store the column nulls as. All of kNullLow, kNullHigh, kTombstone are interpreted as null by docdb. - Although we can fill in missing values on a full compaction, it is not required for correctness, and can be handled later as an optimization. Therefore, we will NOT backfill the rows on compaction in this case. We will rely on the stored missing value in ColumnSchema instead. Packed storage: - Explicitly inserted nulls are stored as columns with 0 length in packed storage format. This behavior will not change. - On compaction, we write the packed rows in accordance with the latest schema packing. - Presently, for missing values we write an explicit null. This behavior will be altered to write the missing value instead (if any). - Therefore, packed rows will be backfilled with the missing value on compaction. Note: This feature is under the `Persisted` autoflag, as we are adding a new field to the `ColumnSchema` which is used in backups. The feature will only be turned on after upgrade is complete. Once enabled, it is not safe to downgrade/rollback. Jira: DB-1463 Test Plan: - Imported PG test `yb_pg_fast_default` - Tests in `yb_alter_table`: test different default value types and expressions (UDTs, functions etc.), test explicit null values for the new column, test `SET DEFAULT`, test to verify that we don't set `attmissingval` and `atthasmissing` in YB, test default values for partitioned tables - `YBAddColumnDefaultBackupTest.TestYSQLDefaultMissingValues` -- verify that missing default values are backed up and restored - `YbAdminSnapshotScheduleTestWithYsqlParam.PgsqlDeleteColumnWithMissingDefault` -- verify that missing default values are restored after PITR - `PgDdlAtomicitySanityTest.DropColumnWithMissingDefaultTest` -- verify that we are able to roll back `DROP COLUMN` on a column with a missing default value - `PgAddColumnDefaultTest.AddColumnDefaultConcurrency ` -- verify that concurrently inserted rows use the missing default value, and that the missing default values are read even after compaction. - `PgAddColumnDefaultTest.AddColumnDefaultCompactionAfterUpdate` -- verify that compaction after updates on columns with missing default values works correctly. - `PgAddColumnDefaultTest.AddColumnDefaultCopy` -- verify that `COPY FROM` command works correctly on a table with columns that have missing default values. - `CDCYsqlAddColumnBeforeImageTest.TestAddColumnBeforeImage/0` -- verify that the `BeforeImage` in CDC uses missing default value Reviewers: sergei, timur, dsrinivasan, abharadwaj Reviewed By: sergei, dsrinivasan Subscribers: hsunder, ycdcxcluster, ybase, yql, sergei, rthallam, bogdan Differential Revision: https://phorge.dev.yugabyte.com/D25297
- Loading branch information