-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YSQL] asan build test failure: YbAdminSnapshotScheduleTest.SysCatalogRetention #23459
Closed
1 task done
Labels
Comments
yugabyte-ci
added
kind/bug
This issue is a bug
priority/medium
Medium priority issue
labels
Aug 9, 2024
myang2021
added a commit
that referenced
this issue
Aug 13, 2024
…on failure Summary: The test YbAdminSnapshotScheduleTest.SysCatalogRetention has been failing consistently on asan build. Example: ``` [m-3] F0805 18:54:37.246038 218452 operation_driver.cc:423] T 00000000000000000000000000000000 P 0deac4231641432685e6356797ea60b4 S RD-P Ts { days: 19940 time: 18:54:36.777167 } kSnapshot (0x000050e00004cf60): Apply failed: Not found (yb/master/restore_sys_catalog_state.cc:404): Restore sys catalog failed: Determine restoring entries failed: Not found restoring table: 000040000000300080000000000040c3, request: operation: RESTORE_SYS_CATALOG snapshot_id: "7FD5714E4B99435C8ED1CBDCA67C1A0B" snapshot_hybrid_time: 7056991983307026432 restoration_id: "C81E641E7EE74F7EAAD967F031B19689" ``` I found it is possible that `DeleteTableInternal` can be called more than once in DDL atomicity helper function `CatalogManager::YsqlDdlTxnDropTableHelper`. DDL atomicity handling is async and can be repeatedly triggered by client query whether the DDL txn is done. While re-processing a table should be fine because we expect duplicate processing should be idemponent so should be logically a no-op. `DeleteTableInternal` is an exception when PITR is involved: the first call to `DeleteTableInternal` results in the table gets HIDDEN rather than DELETED when there is a PITR schedule to retain the table. the second call to `DeleteTableInternal` results in the table gets DELETED and PITR schedule no longer retain it. As a result, DELETED tables are not restored later when a PITR restore operation is performed. To work around the problem, I made an exception for `DeleteTableInternal` to detect duplicate calls. Jira: DB-12379 Test Plan: (1) ./yb_build.sh asan --cxx-test tools_yb-admin-snapshot-schedule-test --gtest_filter YbAdminSnapshotScheduleTest.SysCatalogRetention --clang17 -n 20 --tp 1 (2) ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/0 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/1 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/2 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/3 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/4 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/5 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/6 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/7 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/8 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/9 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/10 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/11 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/12 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/13 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/14 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/15 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/16 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/17 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/18 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/19 --clang17 -n 20 --tp 1 Reviewers: hsunder, fizaa Reviewed By: hsunder Subscribers: ybase, yql Differential Revision: https://phorge.dev.yugabyte.com/D37221
myang2021
added a commit
that referenced
this issue
Aug 14, 2024
…SysCatalogRetention failure Summary: The test YbAdminSnapshotScheduleTest.SysCatalogRetention has been failing consistently on asan build. Example: ``` [m-3] F0805 18:54:37.246038 218452 operation_driver.cc:423] T 00000000000000000000000000000000 P 0deac4231641432685e6356797ea60b4 S RD-P Ts { days: 19940 time: 18:54:36.777167 } kSnapshot (0x000050e00004cf60): Apply failed: Not found (yb/master/restore_sys_catalog_state.cc:404): Restore sys catalog failed: Determine restoring entries failed: Not found restoring table: 000040000000300080000000000040c3, request: operation: RESTORE_SYS_CATALOG snapshot_id: "7FD5714E4B99435C8ED1CBDCA67C1A0B" snapshot_hybrid_time: 7056991983307026432 restoration_id: "C81E641E7EE74F7EAAD967F031B19689" ``` I found it is possible that `DeleteTableInternal` can be called more than once in DDL atomicity helper function `CatalogManager::YsqlDdlTxnDropTableHelper`. DDL atomicity handling is async and can be repeatedly triggered by client query whether the DDL txn is done. While re-processing a table should be fine because we expect duplicate processing should be idemponent so should be logically a no-op. `DeleteTableInternal` is an exception when PITR is involved: the first call to `DeleteTableInternal` results in the table gets HIDDEN rather than DELETED when there is a PITR schedule to retain the table. the second call to `DeleteTableInternal` results in the table gets DELETED and PITR schedule no longer retain it. As a result, DELETED tables are not restored later when a PITR restore operation is performed. To work around the problem, I made an exception for `DeleteTableInternal` to detect duplicate calls. Jira: DB-12379 Original commit: e6c2ee0 / D37221 Test Plan: (1) ./yb_build.sh asan --cxx-test tools_yb-admin-snapshot-schedule-test --gtest_filter YbAdminSnapshotScheduleTest.SysCatalogRetention --clang17 -n 20 --tp 1 (2) ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/0 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/1 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/2 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/3 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/4 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/5 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/6 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/7 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/8 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/9 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/10 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/11 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/12 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/13 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/14 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/15 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/16 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/17 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/18 --clang17 -n 20 --tp 1 ./yb_build.sh asan --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/19 --clang17 -n 20 --tp 1 Reviewers: hsunder, fizaa Reviewed By: hsunder Subscribers: yql, ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D37275
jasonyb
pushed a commit
that referenced
this issue
Aug 15, 2024
Summary: adca727 [PLAT-14226][PLAT-14227] update login flow and RBAC logic to include group memberships 631592d [PLAT-14144]add retry policy for GcpProjectApiClient d1ac140 [#23418] docdb: Fix clone seq_no not being persisted Excluded: ade3a0e [#23461] YSQL: Fix memory corruption in UPDATE pushdown fd0c1e0 [PLAT-5874]: Expose state of db master "load balancing" process in Platform UI e6c2ee0 [#23459] YSQL: Fix asan YbAdminSnapshotScheduleTest.SysCatalogRetention failure 43261b7 [#23421] YSQL: Reset catalog read time after table prefetching dd123d4 [doc] CLI name tidyups (#23477) Excluded: dc871f9 [#22843] YSQL: Change the CatalogCacheMisses prometheus name to have a label 9d2b83a [doc][ybm] Slow query latency histogram (#23486) d50171b [PLAT-14900] Exclude empty string from regex validation for AZUoptional fields adf992d [#23237 ] DocDB: Master side DDL locking - Part 1/happy path 49039ec [PLAT-14620][YBA CLI] CLI gives invalid JSON output db993c6 [#23140] docdb: Clear table txn verifier state after drop 0b93c5d [PLAT-14925] Should not allow turning off DB audit logging runtime config if enabled on a universe d0500a0 [#22989] YSQL: Fix regular expression pushdown 6a4f23b [#23353] CDCSDK: Send null old tuples for updates without before image Test Plan: Jenkins: rebase: pg15-cherrypicks Reviewers: jason, tfoucher Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D37309
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Jira Link: DB-12379
Description
YbAdminSnapshotScheduleTest.SysCatalogRetention seems to be failing consistently on asan build. Example:
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: