Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] asan build test failure: YbAdminSnapshotScheduleTest.SysCatalogRetention #23459

Closed
1 task done
myang2021 opened this issue Aug 9, 2024 · 0 comments
Closed
1 task done
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@myang2021
Copy link
Contributor

myang2021 commented Aug 9, 2024

Jira Link: DB-12379

Description

YbAdminSnapshotScheduleTest.SysCatalogRetention seems to be failing consistently on asan build. Example:

[m-3] F0805 18:54:37.246038 218452 operation_driver.cc:423] T 00000000000000000000000000000000 P 0deac4231641432685e6356797ea60b4 S RD-P Ts { days: 19940 time: 18:54:36.777167 } kSnapshot (0x000050e00004cf60): Apply failed: Not found (yb/master/restore_sys_catalog_state.cc:404): Restore sys catalog failed: Determine restoring entries failed: Not found restoring table: 000040000000300080000000000040c3, request: operation: RESTORE_SYS_CATALOG snapshot_id: "7FD5714E4B99435C8ED1CBDCA67C1A0B" snapshot_hybrid_time: 7056991983307026432 restoration_id: "C81E641E7EE74F7EAAD967F031B19689"

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@myang2021 myang2021 added the area/ysql Yugabyte SQL (YSQL) label Aug 9, 2024
@myang2021 myang2021 self-assigned this Aug 9, 2024
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Aug 9, 2024
myang2021 added a commit that referenced this issue Aug 13, 2024
…on failure

Summary:
The test YbAdminSnapshotScheduleTest.SysCatalogRetention has been failing consistently on asan build. Example:

```
[m-3] F0805 18:54:37.246038 218452 operation_driver.cc:423] T 00000000000000000000000000000000 P 0deac4231641432685e6356797ea60b4 S RD-P Ts { days: 19940 time: 18:54:36.777167 } kSnapshot (0x000050e00004cf60): Apply failed: Not found (yb/master/restore_sys_catalog_state.cc:404): Restore sys catalog failed: Determine restoring entries failed: Not found restoring table: 000040000000300080000000000040c3, request: operation: RESTORE_SYS_CATALOG snapshot_id: "7FD5714E4B99435C8ED1CBDCA67C1A0B" snapshot_hybrid_time: 7056991983307026432 restoration_id: "C81E641E7EE74F7EAAD967F031B19689"
```

I found it is possible that `DeleteTableInternal` can be called more than once
in DDL atomicity helper function `CatalogManager::YsqlDdlTxnDropTableHelper`.
DDL atomicity handling is async and can be repeatedly triggered by client query
whether the DDL txn is done. While re-processing a table should be fine because
we expect duplicate processing should be idemponent so should be logically a
no-op. `DeleteTableInternal` is an exception when PITR is involved:

the first call to `DeleteTableInternal` results in the table gets HIDDEN rather
than DELETED when there is a PITR schedule to retain the table.
the second call to `DeleteTableInternal` results in the table gets DELETED and
PITR schedule no longer retain it.
As a result, DELETED tables are not restored later when a PITR restore operation
is performed.

To work around the problem, I made an exception for `DeleteTableInternal` to
detect duplicate calls.
Jira: DB-12379

Test Plan:
(1)
./yb_build.sh asan --cxx-test tools_yb-admin-snapshot-schedule-test --gtest_filter YbAdminSnapshotScheduleTest.SysCatalogRetention --clang17 -n 20 --tp 1

(2)
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/0  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/1  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/2  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/3  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/4  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/5  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/6  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/7  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/8  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/9  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/10  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/11  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/12  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/13  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/14  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/15  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/16  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/17  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/18  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/19  --clang17 -n 20 --tp 1

Reviewers: hsunder, fizaa

Reviewed By: hsunder

Subscribers: ybase, yql

Differential Revision: https://phorge.dev.yugabyte.com/D37221
myang2021 added a commit that referenced this issue Aug 14, 2024
…SysCatalogRetention failure

Summary:
The test YbAdminSnapshotScheduleTest.SysCatalogRetention has been failing consistently on asan build. Example:

```
[m-3] F0805 18:54:37.246038 218452 operation_driver.cc:423] T 00000000000000000000000000000000 P 0deac4231641432685e6356797ea60b4 S RD-P Ts { days: 19940 time: 18:54:36.777167 } kSnapshot (0x000050e00004cf60): Apply failed: Not found (yb/master/restore_sys_catalog_state.cc:404): Restore sys catalog failed: Determine restoring entries failed: Not found restoring table: 000040000000300080000000000040c3, request: operation: RESTORE_SYS_CATALOG snapshot_id: "7FD5714E4B99435C8ED1CBDCA67C1A0B" snapshot_hybrid_time: 7056991983307026432 restoration_id: "C81E641E7EE74F7EAAD967F031B19689"
```

I found it is possible that `DeleteTableInternal` can be called more than once
in DDL atomicity helper function `CatalogManager::YsqlDdlTxnDropTableHelper`.
DDL atomicity handling is async and can be repeatedly triggered by client query
whether the DDL txn is done. While re-processing a table should be fine because
we expect duplicate processing should be idemponent so should be logically a
no-op. `DeleteTableInternal` is an exception when PITR is involved:

the first call to `DeleteTableInternal` results in the table gets HIDDEN rather
than DELETED when there is a PITR schedule to retain the table.
the second call to `DeleteTableInternal` results in the table gets DELETED and
PITR schedule no longer retain it.
As a result, DELETED tables are not restored later when a PITR restore operation
is performed.

To work around the problem, I made an exception for `DeleteTableInternal` to
detect duplicate calls.
Jira: DB-12379

Original commit: e6c2ee0 / D37221

Test Plan:
(1)
./yb_build.sh asan --cxx-test tools_yb-admin-snapshot-schedule-test --gtest_filter YbAdminSnapshotScheduleTest.SysCatalogRetention --clang17 -n 20 --tp 1

(2)
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/0  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/1  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/2  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/3  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/4  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/5  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/6  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/7  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/8  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/9  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/10  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/11  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/12  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/13  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/14  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/15  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/16  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/17  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/18  --clang17 -n 20 --tp 1
./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/19  --clang17 -n 20 --tp 1

Reviewers: hsunder, fizaa

Reviewed By: hsunder

Subscribers: yql, ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D37275
jasonyb pushed a commit that referenced this issue Aug 15, 2024
Summary:
 adca727 [PLAT-14226][PLAT-14227] update login flow and RBAC logic to include group memberships
 631592d [PLAT-14144]add retry policy for GcpProjectApiClient
 d1ac140 [#23418] docdb: Fix clone seq_no not being persisted
 Excluded: ade3a0e [#23461] YSQL: Fix memory corruption in UPDATE pushdown
 fd0c1e0 [PLAT-5874]: Expose state of db master "load balancing" process in Platform UI
 e6c2ee0 [#23459] YSQL: Fix asan YbAdminSnapshotScheduleTest.SysCatalogRetention failure
 43261b7 [#23421] YSQL: Reset catalog read time after table prefetching
 dd123d4 [doc] CLI name tidyups (#23477)
 Excluded: dc871f9 [#22843] YSQL: Change the CatalogCacheMisses prometheus name to have a label
 9d2b83a [doc][ybm] Slow query latency histogram (#23486)
 d50171b [PLAT-14900] Exclude empty string from regex validation for AZUoptional fields
 adf992d [#23237 ] DocDB: Master side DDL locking  - Part 1/happy path
 49039ec [PLAT-14620][YBA CLI] CLI gives invalid JSON output
 db993c6 [#23140] docdb: Clear table txn verifier state after drop
 0b93c5d [PLAT-14925] Should not allow turning off DB audit logging runtime config if enabled on a universe
 d0500a0 [#22989] YSQL: Fix regular expression pushdown
 6a4f23b [#23353] CDCSDK: Send null old tuples for updates without before image

Test Plan: Jenkins: rebase: pg15-cherrypicks

Reviewers: jason, tfoucher

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D37309
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

2 participants