Skip to content

Commit

Permalink
[#23869] YSQL: Fix one type of ddl atomicity stress test flakiness
Browse files Browse the repository at this point in the history
Summary:
Sometimes DDL atomicity stress test can fail with the following kind of error:

```
Bad status: Corruption (yb/yql/pgwrapper/libpq_utils.cc:841): Unexpected NULL value at row: 0, column: 0
```

On pg15 branch, this error happened more frequently than on master branch.

The reason for this failure is that the table was initialized to have a NULL
value for one of its columns. During the test, it has multi concurrently
executing threads. All but one of these concurrent threads execute DDL
statements. The other one thread executes DML statements (UPDATE) to set the NULL
column value to a string value. Both the DDL statements and the UPDATE
statements are executed repeatedly when an expected error is encountered until
they finally execute successfully.

Once all the DDL statements and the UPDATE statements complete, the test does
post-run verification. For the UPDATE it expects a string value that is not NULL.

The relevant code for this test bug is:

```
  // In some cases, when "Unknown transaction, could be recently aborted" is returned, we don't know
  // whether the transaction failed or succeeded. In such cases, we retry the statement. However,
  // if the original transaction was not aborted, the retry could fail with "already exists" in case
  // ADD COLUMN or CREATE INDEX and "does not exist" in case of DROP COLUMN or DROP INDEX. Thus in
  // such cases, we consider this statement to be a success.
  static const auto failed_retry_msgs = {
    "does not exist"sv,
    "already exists"sv
  };
  if (HasSubstring(msg, failed_retry_msgs)) {
    LOG(INFO) << "Execution of stmt " << stmt << " considered a success: " << s;
    return true;
  }
```

While this code is fine for a DDL statement to fail with these two expected error, for
UPDATE statement failing with these two expected error does not mean the UPDATE
has succeeded. It is wrong to consider the UPDATE successful. We should simply retry
the UPDATE until it no longer sees an error.

I made a fix by adding a new is_ddl argument, we only consider a DDL statement
successful when seeing the above two errors, and for DML we will not consider it
as successful. But we will consider the error as expected so that we can keep
retrying the DML statement.
Jira: DB-12775

Test Plan:
./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/0 -n 20 --tp 1
./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/1 -n 20 --tp 1
./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/2 -n 20 --tp 1
./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/3 -n 20 --tp 1
./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/4 -n 20 --tp 1
./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/5 -n 20 --tp 1
./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/6 -n 20 --tp 1
./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/7 -n 20 --tp 1
./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/8 -n 20 --tp 1
./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/9 -n 20 --tp 1
./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/10 -n 20 --tp 1
./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/11 -n 20 --tp 1
./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/12 -n 20 --tp 1
./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/13 -n 20 --tp 1
./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/14 -n 20 --tp 1
./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/15 -n 20 --tp 1
./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/16 -n 20 --tp 1
./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/17 -n 20 --tp 1
./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/18 -n 20 --tp 1
./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/19 -n 20 --tp 1

Reviewers: fizaa

Reviewed By: fizaa

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D37951
  • Loading branch information
myang2021 committed Sep 11, 2024
1 parent fe8890d commit 7d9b57b
Showing 1 changed file with 19 additions and 12 deletions.
31 changes: 19 additions & 12 deletions src/yb/yql/pgwrapper/pg_ddl_atomicity_stress-test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -99,11 +99,12 @@ class PgDdlAtomicityStressTest
Status TestDml(PGConn* conn, const int num_iterations);

template<class... Args>
Result<bool> ExecuteFormatWithRetry(PGConn* conn, const std::string& format, Args&&... args) {
return DoExecuteWithRetry(conn, Format(format, std::forward<Args>(args)...));
Result<bool> ExecuteFormatWithRetry(
PGConn* conn, bool is_ddl, const std::string& format, Args&&... args) {
return DoExecuteWithRetry(conn, is_ddl, Format(format, std::forward<Args>(args)...));
}

Result<bool> DoExecuteWithRetry(PGConn* conn, const std::string& stmt);
Result<bool> DoExecuteWithRetry(PGConn* conn, bool is_ddl, const std::string& stmt);

Status InsertTestData(const int num_rows);

Expand Down Expand Up @@ -166,15 +167,16 @@ Status PgDdlAtomicityStressTest::TestDdl(
for (const auto& ddl : ddls) {
auto stmt = Format(ddl, kTable, i);
LOG(INFO) << "Executing stmt " << stmt;
while (!VERIFY_RESULT(DoExecuteWithRetry(conn, stmt))) {
while (!VERIFY_RESULT(DoExecuteWithRetry(conn, true /* is_ddl */, stmt))) {
LOG(INFO) << "Retry executing stmt " << stmt;
}
}
}
return Status::OK();
}

Result<bool> PgDdlAtomicityStressTest::DoExecuteWithRetry(PGConn* conn, const std::string& stmt) {
Result<bool> PgDdlAtomicityStressTest::DoExecuteWithRetry(
PGConn* conn, bool is_ddl, const std::string& stmt) {
auto s = conn->Execute(stmt);
if (s.ok()) {
LOG(INFO) << "Execution of stmt " << stmt << " succeeded";
Expand Down Expand Up @@ -214,8 +216,12 @@ Result<bool> PgDdlAtomicityStressTest::DoExecuteWithRetry(PGConn* conn, const st
"already exists"sv
};
if (HasSubstring(msg, failed_retry_msgs)) {
LOG(INFO) << "Execution of stmt " << stmt << " considered a success: " << s;
return true;
if (is_ddl) {
LOG(INFO) << "Execution of stmt " << stmt << " considered a success: " << s;
return true;
}
LOG(INFO) << "Execution of stmt " << stmt << " failed: " << s;
return false;
}

// Unexpected error
Expand All @@ -224,22 +230,23 @@ Result<bool> PgDdlAtomicityStressTest::DoExecuteWithRetry(PGConn* conn, const st
}

Status PgDdlAtomicityStressTest::TestConcurrentIndex(PGConn* conn, const int num_iterations) {
const bool is_ddl = true;
for (int i = 0; i < num_iterations; ++i) {
bool index_created = false;
while (!index_created) {
// If concurrent index creation fails, it does not clean up the invalid index. Thus to
// make the statement idempotent, drop the index if the create index failed before retrying.
index_created = VERIFY_RESULT(ExecuteFormatWithRetry(
conn, "CREATE INDEX idx_$0 ON $1(key)", i, kTable));
conn, is_ddl, "CREATE INDEX idx_$0 ON $1(key)", i, kTable));
if (!index_created) {
auto stmt = Format("DROP INDEX IF EXISTS idx_$0", i);
while (!VERIFY_RESULT(ExecuteFormatWithRetry(conn, stmt))) {
while (!VERIFY_RESULT(ExecuteFormatWithRetry(conn, is_ddl, stmt))) {
LOG(INFO) << "Retry executing stmt " << stmt;
}
}
}
auto stmt = Format("DROP INDEX idx_$0", i);
while (!VERIFY_RESULT(ExecuteFormatWithRetry(conn, stmt))) {
while (!VERIFY_RESULT(ExecuteFormatWithRetry(conn, is_ddl, stmt))) {
LOG(INFO) << "Retry executing stmt " << stmt;
}
}
Expand All @@ -248,8 +255,8 @@ Status PgDdlAtomicityStressTest::TestConcurrentIndex(PGConn* conn, const int num

Status PgDdlAtomicityStressTest::TestDml(PGConn* conn, const int num_iterations) {
for (int i = 1; i <= num_iterations;) {
if (VERIFY_RESULT(ExecuteFormatWithRetry(
conn, "UPDATE $0 SET value = 'value_$1' WHERE key = $1", kTable, i))) {
auto stmt = Format("UPDATE $0 SET value = 'value_$1' WHERE key = $1", kTable, i);
if (VERIFY_RESULT(ExecuteFormatWithRetry(conn, false /* is_ddl */, stmt))) {
++i;
}
}
Expand Down

0 comments on commit 7d9b57b

Please sign in to comment.