Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[#23869] YSQL: Fix one type of ddl atomicity stress test flakiness
Summary: Sometimes DDL atomicity stress test can fail with the following kind of error: ``` Bad status: Corruption (yb/yql/pgwrapper/libpq_utils.cc:841): Unexpected NULL value at row: 0, column: 0 ``` On pg15 branch, this error happened more frequently than on master branch. The reason for this failure is that the table was initialized to have a NULL value for one of its columns. During the test, it has multi concurrently executing threads. All but one of these concurrent threads execute DDL statements. The other one thread executes DML statements (UPDATE) to set the NULL column value to a string value. Both the DDL statements and the UPDATE statements are executed repeatedly when an expected error is encountered until they finally execute successfully. Once all the DDL statements and the UPDATE statements complete, the test does post-run verification. For the UPDATE it expects a string value that is not NULL. The relevant code for this test bug is: ``` // In some cases, when "Unknown transaction, could be recently aborted" is returned, we don't know // whether the transaction failed or succeeded. In such cases, we retry the statement. However, // if the original transaction was not aborted, the retry could fail with "already exists" in case // ADD COLUMN or CREATE INDEX and "does not exist" in case of DROP COLUMN or DROP INDEX. Thus in // such cases, we consider this statement to be a success. static const auto failed_retry_msgs = { "does not exist"sv, "already exists"sv }; if (HasSubstring(msg, failed_retry_msgs)) { LOG(INFO) << "Execution of stmt " << stmt << " considered a success: " << s; return true; } ``` While this code is fine for a DDL statement to fail with these two expected error, for UPDATE statement failing with these two expected error does not mean the UPDATE has succeeded. It is wrong to consider the UPDATE successful. We should simply retry the UPDATE until it no longer sees an error. I made a fix by adding a new is_ddl argument, we only consider a DDL statement successful when seeing the above two errors, and for DML we will not consider it as successful. But we will consider the error as expected so that we can keep retrying the DML statement. Jira: DB-12775 Test Plan: ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/0 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/1 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/2 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/3 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/4 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/5 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/6 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/7 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/8 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/9 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/10 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/11 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/12 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/13 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/14 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/15 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/16 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/17 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/18 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/19 -n 20 --tp 1 Reviewers: fizaa Reviewed By: fizaa Subscribers: yql Differential Revision: https://phorge.dev.yugabyte.com/D37951
- Loading branch information