[YSQL] Backfill existing rows when new column is added with default value #4415

d-uspenskiy · 2020-05-07T19:21:07Z

Jira Link: DB-1463

yugabyte=# create table t(k int PRIMARY KEY);
yugabyte=# insert into t values(1), (2), (3);
yugabyte=# alter table t add v int not null default 5;
yugabyte=# select * from t;
 k | v 
---+---
 1 |  
 2 |  
 3 |  
(3 rows)

expected (vanilla postgresql) result:

postgres=# select * from t;
 k | v 
---+---
 1 | 5
 2 | 5
 3 | 5
(3 rows)

The text was updated successfully, but these errors were encountered:

ndeodhar · 2020-05-07T19:30:05Z

This is expected behavior for now @d-uspenskiy and is being tracked under the umbrella of online schema changes: #4192

jaki · 2021-04-27T00:07:55Z

With D10666 (relaxing conflicts on ALTER), I found a bug with default not null:

session A                     | session B
CREATE TABLE t (i int);
\c
BEGIN;
SELECT 1;
                              | ALTER TABLE t ADD j int default 44 NOT NULL;
INSERT INTO t VALUES (7);
COMMIT;
SELECT * FROM t;

i | j 
---+---
 7 |  
(1 row)

A possible explanation is that this bug was previously masked by catalog version mismatch. An other explanation is that our transaction model is broken (insert after alter should fail).

lukaseder · 2021-09-10T15:00:22Z

Another way to look at this bug: #9970

lukaseder · 2021-09-10T15:01:34Z

For the record, SQL Server and Sybase ASE don't fill default values with this statement:

alter table t add v int default 5;

But they do with this one:

alter table t add v int not null default 5;

lukaseder · 2021-09-10T15:04:26Z

A workaround for this is:

alter table t add v int not null default 5;
update t set v = default;

…DD COLUMN ... DEFAULT Summary: Presently, in YB, we disallow this operation for volatile default values: ``` yugabyte=# ALTER TABLE test ADD COLUMN y float DEFAULT random(); ERROR: Rewriting of YB table is not yet implemented ``` Meanwhile, for non-volatile values, we do not backfill existing rows. The new column’s value for the existing rows is null. This departs from PG’s semantics and can also lead to a constraint violation if the new column has a NOT NULL constraint. Note: For new rows, the default value is inserted correctly. ``` yugabyte=# CREATE TABLE test (t int); CREATE TABLE yugabyte=# INSERT INTO test VALUES (1); INSERT 0 1 yugabyte=# ALTER TABLE test ADD COLUMN y int DEFAULT 2 NOT NULL; ALTER TABLE yugabyte=# SELECT * FROM test; t | y ---+--- 1 | -- in PG, y would have value of 2 here. This is also a constraint violation. (1 row) yugabyte=# INSERT INTO test(t) VALUES (2); INSERT 0 1 yugabyte=# SELECT * FROM test; t | y ---+--- 1 | -- in PG, y would have value of 2 here 2 | 2 -- Newly inserted rows correctly use the default value. (2 rows) ``` This diff fixes this issue by introducing a new field (missing_val) which will be stored in the column's metadata (ColumnSchema and for packed rows in ColumnPackingData as well). We do **not** backfill the existing rows. The default value will be filled in on-the-fly if the column value is missing for a row. Implementation details: High-level design: - Evaluate the non-volatile default expression in the PG layer. - Augment the existing Alter Table Add Column flow to take an optional missing value, and pass it down to DocDB. - Store the missing value in the newly added column’s schema (`ColumnSchema`). - When doc_reader reads a row (using the latest schema) and encounters a missing column for the row, it will look at the column schema to see if there is a missing default value for the column. If there is, it will fill in the missing default value (instead of null as it currently does). Regular storage: - For this feature, we need a way to distinguish whether the column entry is missing, or if a null was explicitly inserted by the user. - Presently, null entries are stored as kTombstone records. These tombstone records are compacted away after the retention history period has exceeded. - Therefore, for columns that have a missing value, we will store null entries as kNullLow. - Note: It does not matter what we store the column nulls as. All of kNullLow, kNullHigh, kTombstone are interpreted as null by docdb. - Although we can fill in missing values on a full compaction, it is not required for correctness, and can be handled later as an optimization. Therefore, we will NOT backfill the rows on compaction in this case. We will rely on the stored missing value in ColumnSchema instead. Packed storage: - Explicitly inserted nulls are stored as columns with 0 length in packed storage format. This behavior will not change. - On compaction, we write the packed rows in accordance with the latest schema packing. - Presently, for missing values we write an explicit null. This behavior will be altered to write the missing value instead (if any). - Therefore, packed rows will be backfilled with the missing value on compaction. Note: This feature is under the `Persisted` autoflag, as we are adding a new field to the `ColumnSchema` which is used in backups. The feature will only be turned on after upgrade is complete. Once enabled, it is not safe to downgrade/rollback. Jira: DB-1463 Test Plan: - Imported PG test `yb_pg_fast_default` - Tests in `yb_alter_table`: test different default value types and expressions (UDTs, functions etc.), test explicit null values for the new column, test `SET DEFAULT`, test to verify that we don't set `attmissingval` and `atthasmissing` in YB, test default values for partitioned tables - `YBAddColumnDefaultBackupTest.TestYSQLDefaultMissingValues` -- verify that missing default values are backed up and restored - `YbAdminSnapshotScheduleTestWithYsqlParam.PgsqlDeleteColumnWithMissingDefault` -- verify that missing default values are restored after PITR - `PgDdlAtomicitySanityTest.DropColumnWithMissingDefaultTest` -- verify that we are able to roll back `DROP COLUMN` on a column with a missing default value - `PgAddColumnDefaultTest.AddColumnDefaultConcurrency ` -- verify that concurrently inserted rows use the missing default value, and that the missing default values are read even after compaction. - `PgAddColumnDefaultTest.AddColumnDefaultCompactionAfterUpdate` -- verify that compaction after updates on columns with missing default values works correctly. - `PgAddColumnDefaultTest.AddColumnDefaultCopy` -- verify that `COPY FROM` command works correctly on a table with columns that have missing default values. - `CDCYsqlAddColumnBeforeImageTest.TestAddColumnBeforeImage/0` -- verify that the `BeforeImage` in CDC uses missing default value Reviewers: sergei, timur, dsrinivasan, abharadwaj Reviewed By: sergei, dsrinivasan Subscribers: hsunder, ycdcxcluster, ybase, yql, sergei, rthallam, bogdan Differential Revision: https://phorge.dev.yugabyte.com/D25297

…perations Summary: NOTE: This backport diverges from the original commit. It does not include a `NOTICE` for `ALTER TYPE` since the operation is not supported in 2.18. It also adds a `NOTICE` for `ADD COLUMN ... DEFAULT` to indicate that existing rows are not backfilled with the default value. This was not included in the original master commit/2.20 backport as the issue (#4415) was resolved in 2.20. Original commit's summary: Table rewrite operations (ALTER primary key, ALTER TYPE) currently do not acquire a table lock and are therefore unsafe (concurrent writes may be dropped). This diff adds a `NOTICE` to explicitly indicate the same when executing these commands.' The `NOTICE` can be suppressed by setting the `ysql_suppress_unsafe_alter_notice` tserver gflag to true. Jira: DB-8160 Original commit: 0baeed2 / D29430 Test Plan: Jenkins: java only Reviewers: jason Reviewed By: jason Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D31136

d-uspenskiy added the kind/bug This issue is a bug label May 7, 2020

d-uspenskiy assigned ndeodhar May 7, 2020

d-uspenskiy added this to To do in YSQL via automation May 7, 2020

ndeodhar changed the title ~~[YSQL] Rows for new column with DEFAULT value contains NULLs~~ [YSQL] Backfill existing rows when new column is added with default value May 7, 2020

rkarthik007 mentioned this issue May 7, 2020

Online schema migrations #4192

Open

ndeodhar assigned m-iancu and unassigned ndeodhar May 7, 2020

ndeodhar added area/ysql Yugabyte SQL (YSQL) and removed kind/bug This issue is a bug labels May 7, 2020

hbhanawat mentioned this issue Mar 23, 2021

Django test suite issues with YB as backend #7760

Closed

lukaseder mentioned this issue Sep 10, 2021

[YSQL] Table is in inconsistent state when adding column that is DEFAULT NOT NULL #9970

Closed

yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jun 8, 2022

yugabyte-ci added kind/enhancement This is an enhancement of an existing feature and removed kind/bug This issue is a bug labels Oct 5, 2022

yugabyte-ci assigned fizaaluthra and unassigned m-iancu Jan 30, 2023

yugabyte-ci closed this as completed Aug 15, 2023

YSQL automation moved this from To do to Done Aug 15, 2023

Sfurti-yb mentioned this issue May 7, 2024

Updating test configuration for expected failures yugabyte/yb-django#6

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[YSQL] Backfill existing rows when new column is added with default value #4415

[YSQL] Backfill existing rows when new column is added with default value #4415

d-uspenskiy commented May 7, 2020 •

edited by yugabyte-ci

Loading

ndeodhar commented May 7, 2020

jaki commented Apr 27, 2021

lukaseder commented Sep 10, 2021

lukaseder commented Sep 10, 2021

lukaseder commented Sep 10, 2021

[YSQL] Backfill existing rows when new column is added with default value #4415

[YSQL] Backfill existing rows when new column is added with default value #4415

Comments

d-uspenskiy commented May 7, 2020 • edited by yugabyte-ci Loading

ndeodhar commented May 7, 2020

jaki commented Apr 27, 2021

lukaseder commented Sep 10, 2021

lukaseder commented Sep 10, 2021

lukaseder commented Sep 10, 2021

d-uspenskiy commented May 7, 2020 •

edited by yugabyte-ci

Loading