Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL][backup] Backup-restoring issue with OID double using for colocated tables. #7378

Closed
OlegLoginov opened this issue Feb 25, 2021 · 4 comments
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug
Milestone

Comments

@OlegLoginov
Copy link
Contributor

OlegLoginov commented Feb 25, 2021

Problem: backup-restore procedure calls 'ysqlsh' tool to apply YSQL dump (generated by 'ysql_dump' tool). The dump includes permanent OID for tables, but there is a chance that the OID has been already used by an internal PG object. As result the table (with the conflicting OID) cannot be created, and we can get the error:
"ERROR: Oid 16431 is in use."
Later 'yb-admin import_snapshot' step cannot find the table and the backup-restoring failed.

To reproduce the issue - run on a new cluster the dump:
./bin/ysqlsh -f YSQLDump1.txt

@OlegLoginov OlegLoginov added the area/ysql Yugabyte SQL (YSQL) label Feb 25, 2021
@OlegLoginov OlegLoginov self-assigned this Feb 25, 2021
@OlegLoginov
Copy link
Contributor Author

@OlegLoginov OlegLoginov added the kind/bug This issue is a bug label Feb 25, 2021
@OlegLoginov
Copy link
Contributor Author

OlegLoginov commented Feb 25, 2021

Related internal doc: https://docs.google.com/document/d/1M_y8OMSJJ9k4mp5vzeOqhmP6qYI21wzLKliB5Jd_RJc
Solution 2 (new mapping: table-oid -> colocated-oid) is preferable for now. (In SchemaPB protobuf. See the doc for details!!!)

@hulien22
Copy link
Contributor

Not sure if this work-around is known, so thought I'd share this here:
One potential work-around while this issue is still being fixed is to artificially increase the next oids to be generated (since postgres shares the same oid generator for all dbs). For example, I created and dropped a bunch of tables on a new fresh cluster, and afterwards the ysqldump (from Oleg's example above) import was able to succeed:

# On a fresh cluster, run:
yugabyte=# do $$
begin
   for counter in 0..2000 loop
    EXECUTE format('CREATE TABLE IF NOT EXISTS test_%s (i int primary key, v int)', counter);
    EXECUTE format('DROP TABLE test_%s', counter);
   end loop;
end; $$;
DO
# This creates and drops 2000 tables (a bit overkill perhaps)

# Verify that we are past the highest table oid required in the ysqldump (around 22000)
yugabyte=# create table test (i int primary key);
CREATE TABLE
yugabyte=# select relname, oid from pg_class where oid >= 16384 and relkind='r';
 relname |  oid
---------+-------
 test    | 26389
(1 row)

# 26389 > 22000, so we're good
yugabyte=# drop table test;
yugabyte=# \q

# Now import the ysql_dump, which will succeed
~/code/yugabyte-db >>> ./bin/ysqlsh -f ~/Downloads/YSQLDump1.txt

Since all the objects that will be created will now have oid > 26389, we won't encounter any oid conflicts when creating the colocated tables with the set oids.

cc @oleg @m-iancu

@skorobogatydmitry
Copy link
Contributor

frozenspider added a commit that referenced this issue Mar 17, 2022
…or colocation

Summary:
# Background

Colocated tables are YSQL tables backed by the same singular tablet. To disambiguate different tables in that tablet's RocksDB, previously we used table OIDs as row prefix. However, when it comes to data backup/restore, this leads to a following problem.
The way `yb-admin` backup works is by backing up actual SST storage files, and relying on table schema exported through `ysqldump` to recreate YSQL metadata. Usually this works, but colocated tables have their OID used as a RocksDB prefix, so they should match exactly - but we cannot guarantee that those OIDs are free and can be chosen in YSQL.

# Solution

Instead of using table OID, we introduce a special colocation ID of the same internal type (`Oid` a.k.a. `uint32`).
- When creating a colocated table, unique colocation ID will be randomly assigned to it.
- Alternatively, an explicit `colocation_id` reloption may optionally be specified, this is used in `CREATE TABLE` statements generated by `ysqldump` with YB metadata enabled.
- This colocation ID is stored in `SchemaPB`. For backward compatibility, if this field is missing, we use table OID in its place.
- Colocation ID is freed when the table is deleted (safe because data is tombstoned: see 6286e50).

# Additional changes

- Colocation ID column has been added to the Master UI.
- Renamed `tablegroup` reloption to `tablegroup_oid`.
- Modified `yb_table_properties` function, added `tablegroup_oid` and `colocation_id` columns to it. Added a relevant migration.
- Added a new reloption kind `RELOPT_KIND_YB_LSM` for YB-specific indexes, to get an actual parsed `StdRdOptions` out of `ybcinoptions`.
- Partially reverted D13402 / #10259, `CREATE TABLE ... WITH (tablegroup_oid = ...)` is disallowed once again.
- Changed mutable tablegroup access guard from `SharedLock` to `LockGuard`
- Fixed `flatten_reloptions` (used in `pg_get_indexdef` and `pg_get_constraintdef`) to not append a leading comma when the first reloption is `tablegroup_oid`.
- Few minor internal naming fixes.
- Left a bunch of TODOs to clean up and disambiguate colocation a bit.

# Note

This diff actually introduces an (extremely unlikely) possibility for a data corruption to happen during yb-master rolling upgrade. Imagine the following sequence of events:

- Upgraded master becomes a leader
- User creates a new colocated table, leader randomly assigns a colocation_id of X
- Old master becomes a leader
- User creates a new colocated table with the OID=X, leader adds it to a colocation group

We'll get into the situation where two tables have the same colocation IDs, leading to either data corruption or consistency breaks.
This situation is almost impossible in practice for two reasons:

- There are 4 billion possibilities for colocation ID
- Our rolling upgrade procedure involves upgrading non-leader masters first in order to minimize a chance that old master would take over the leadership

# Backport note

Original commit: D15799 / 71d527c

Additionally, this:
- Grabs a piece of f598333 that simplifies `entity_ids.cc` YSQL ID logic.
- Takes `GetYbTableIdFromPB` from c5f5125
- Takes `IsTablegroupParentTableId` as a dependency
- Incorporates D16009 / 8474a0b that fixes an issue introduced here (which would prevent existing YB DB from working properly)

---

Resolves #7378

Test Plan:
Jenkins: rebase: 2.12

ybd --java-test 'org.yb.pgsql.TestPgRegressTablegroup'

Existing colocation tests are modified accordingly and they also partially cover the feature.

See test notes in D15799

Reviewers: mihnea, jason

Reviewed By: jason

Subscribers: bogdan, yql

Differential Revision: https://phabricator.dev.yugabyte.com/D15992
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug
Projects
Status: Done
Development

No branches or pull requests

5 participants