Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Backfill existing rows when new column is added with default value #4415

Closed
d-uspenskiy opened this issue May 7, 2020 · 5 comments
Closed
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue
Projects

Comments

@d-uspenskiy
Copy link
Contributor

d-uspenskiy commented May 7, 2020

Jira Link: DB-1463

yugabyte=# create table t(k int PRIMARY KEY);
yugabyte=# insert into t values(1), (2), (3);
yugabyte=# alter table t add v int not null default 5;
yugabyte=# select * from t;
 k | v 
---+---
 1 |  
 2 |  
 3 |  
(3 rows)

expected (vanilla postgresql) result:

postgres=# select * from t;
 k | v 
---+---
 1 | 5
 2 | 5
 3 | 5
(3 rows)
@d-uspenskiy d-uspenskiy added the kind/bug This issue is a bug label May 7, 2020
@d-uspenskiy d-uspenskiy added this to To do in YSQL via automation May 7, 2020
@ndeodhar
Copy link
Contributor

ndeodhar commented May 7, 2020

This is expected behavior for now @d-uspenskiy and is being tracked under the umbrella of online schema changes: #4192

@ndeodhar ndeodhar changed the title [YSQL] Rows for new column with DEFAULT value contains NULLs [YSQL] Backfill existing rows when new column is added with default value May 7, 2020
@ndeodhar ndeodhar assigned m-iancu and unassigned ndeodhar May 7, 2020
@ndeodhar ndeodhar added area/ysql Yugabyte SQL (YSQL) and removed kind/bug This issue is a bug labels May 7, 2020
@jaki
Copy link
Contributor

jaki commented Apr 27, 2021

With D10666 (relaxing conflicts on ALTER), I found a bug with default not null:

session A                     | session B
CREATE TABLE t (i int);
\c
BEGIN;
SELECT 1;
                              | ALTER TABLE t ADD j int default 44 NOT NULL;
INSERT INTO t VALUES (7);
COMMIT;
SELECT * FROM t;
i | j 
---+---
 7 |  
(1 row)

A possible explanation is that this bug was previously masked by catalog version mismatch. An other explanation is that our transaction model is broken (insert after alter should fail).

@lukaseder
Copy link

Another way to look at this bug: #9970

@lukaseder
Copy link

For the record, SQL Server and Sybase ASE don't fill default values with this statement:

alter table t add v int default 5;

But they do with this one:

alter table t add v int not null default 5;

@lukaseder
Copy link

A workaround for this is:

alter table t add v int not null default 5;
update t set v = default;

@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jun 8, 2022
@yugabyte-ci yugabyte-ci added kind/enhancement This is an enhancement of an existing feature and removed kind/bug This issue is a bug labels Oct 5, 2022
@yugabyte-ci yugabyte-ci assigned fizaaluthra and unassigned m-iancu Jan 30, 2023
fizaaluthra added a commit that referenced this issue Aug 15, 2023
…DD COLUMN ... DEFAULT

Summary:
Presently, in YB, we disallow this operation for volatile default values:
```
yugabyte=# ALTER TABLE test ADD COLUMN y float DEFAULT random();
ERROR:  Rewriting of YB table is not yet implemented
```

Meanwhile, for non-volatile values, we do not backfill existing rows. The new column’s value for the existing rows is null. This departs from PG’s semantics and can also lead to a constraint violation if the new column has a NOT NULL constraint.
Note: For new rows, the default value is inserted correctly.
```
yugabyte=# CREATE TABLE test (t int);
CREATE TABLE
yugabyte=# INSERT INTO test VALUES (1);
INSERT 0 1
yugabyte=# ALTER TABLE test ADD COLUMN y int DEFAULT 2 NOT NULL;
ALTER TABLE
yugabyte=# SELECT * FROM test;
 t | y
---+---
 1 |     -- in PG, y would have value of 2 here. This is also a constraint violation.
(1 row)
yugabyte=# INSERT INTO test(t) VALUES (2);
INSERT 0 1
yugabyte=# SELECT * FROM test;
 t | y
---+---
 1 |      -- in PG, y would have value of 2 here
 2 | 2    -- Newly inserted rows correctly use the default value.
(2 rows)
```

This diff fixes this issue by introducing a new field (missing_val) which will be stored in the column's metadata (ColumnSchema and for packed rows in ColumnPackingData as well). We do **not** backfill the existing rows. The default value will be filled in on-the-fly if the column value is missing for a row.

Implementation details:

High-level design:

  - Evaluate the non-volatile default expression in the PG layer.
  - Augment the existing Alter Table Add Column flow to take an optional missing value, and pass it down to DocDB.
  - Store the missing value in the newly added column’s schema (`ColumnSchema`).
  - When doc_reader reads a row (using the latest schema) and encounters a missing column for the row, it will look at the column schema to see if there is a missing default value for the column. If there is, it will fill in the missing default value (instead of null as it currently does).

Regular storage:

  - For this feature, we need a way to distinguish whether the column entry is missing, or if a null was explicitly inserted by the user.
  - Presently, null entries are stored as kTombstone records. These tombstone records are compacted away after the retention history period has exceeded.
  - Therefore, for columns that have a missing value, we will store null entries as kNullLow.
  - Note: It does not matter what we store the column nulls as. All of kNullLow, kNullHigh, kTombstone are interpreted as null by docdb.
  - Although we can fill in missing values on a full compaction, it is not required for correctness, and can be handled later as an optimization. Therefore, we will NOT backfill the rows on compaction in this case. We will rely on the stored missing value in ColumnSchema instead.

Packed storage:

  - Explicitly inserted nulls are stored as columns with 0 length in packed storage format. This behavior will not change.
  - On compaction, we write the packed rows in accordance with the latest schema packing.
  - Presently, for missing values we write an explicit null. This behavior will be altered to write the missing value instead (if any).
  - Therefore, packed rows will be backfilled with the missing value on compaction.

Note: This feature is under the `Persisted` autoflag, as we are adding a new field to the `ColumnSchema` which is used in backups. The feature will only be turned on after upgrade is complete. Once enabled, it is not safe to downgrade/rollback.
Jira: DB-1463

Test Plan:
  - Imported PG test `yb_pg_fast_default`
  - Tests in `yb_alter_table`: test different default value types and expressions (UDTs, functions etc.), test explicit null values for the new column, test `SET DEFAULT`, test to verify that we don't set `attmissingval` and `atthasmissing` in YB, test default values for partitioned tables
  - `YBAddColumnDefaultBackupTest.TestYSQLDefaultMissingValues` -- verify that missing default values are backed up and restored
  - `YbAdminSnapshotScheduleTestWithYsqlParam.PgsqlDeleteColumnWithMissingDefault` -- verify that missing default values are restored after PITR
  - `PgDdlAtomicitySanityTest.DropColumnWithMissingDefaultTest` -- verify that we are able to roll back `DROP COLUMN` on a column with a missing default value
  - `PgAddColumnDefaultTest.AddColumnDefaultConcurrency ` -- verify that concurrently inserted rows use the missing default value, and that the missing default values are read even after compaction.
  - `PgAddColumnDefaultTest.AddColumnDefaultCompactionAfterUpdate` -- verify that compaction after updates on columns with missing default values works correctly.
  - `PgAddColumnDefaultTest.AddColumnDefaultCopy` -- verify that `COPY FROM` command works correctly on a table with columns that have missing default values.
  - `CDCYsqlAddColumnBeforeImageTest.TestAddColumnBeforeImage/0` -- verify that the `BeforeImage` in CDC uses missing default value

Reviewers: sergei, timur, dsrinivasan, abharadwaj

Reviewed By: sergei, dsrinivasan

Subscribers: hsunder, ycdcxcluster, ybase, yql, sergei, rthallam, bogdan

Differential Revision: https://phorge.dev.yugabyte.com/D25297
YSQL automation moved this from To do to Done Aug 15, 2023
fizaaluthra added a commit that referenced this issue Dec 22, 2023
…perations

Summary:
NOTE: This backport diverges from the original commit. It does not include a `NOTICE` for `ALTER TYPE` since the operation is not supported in 2.18. It also adds a `NOTICE` for `ADD COLUMN ... DEFAULT` to indicate that existing rows are not backfilled with the default value. This was not included in the original master commit/2.20 backport as the issue (#4415) was resolved in 2.20.

Original commit's summary:

Table rewrite operations (ALTER primary key, ALTER TYPE) currently do not acquire a table lock
and are therefore unsafe (concurrent writes may be dropped). This diff adds a `NOTICE` to
explicitly indicate the same when executing these commands.'

The `NOTICE` can be suppressed by setting the `ysql_suppress_unsafe_alter_notice` tserver gflag to true.
Jira: DB-8160

Original commit: 0baeed2 / D29430

Test Plan: Jenkins: java only

Reviewers: jason

Reviewed By: jason

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D31136
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue
Projects
Status: Done
YSQL
  
Done
Development

No branches or pull requests

7 participants