-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add roachtest that simulates system crash and sync failures
There is an existing synctest that verifies the database is correct and usable after a crash triggered by an I/O error. The charybdefs dependency it uses does error injection by manipulating return values. When it injects an error into a sync operation, that sync does no work and returns an error, but unsynced writes still survive in page cache. Then after process crash-recovery, the DB's state is the same as if the failed sync had succeeded. This new test attempts to simulate the effects of a failed sync more completely, in particular by ensuring unsynced writes are dropped. The approach taken in this new test is to buffer unsynced writes in process memory. This is achieved by providing our own implementation of a few C syscall wrappers via `LD_PRELOAD`. By buffering in process memory instead of page cache, we can easily drop unsynced writes. In this new test, sync failure injection (`system-crash/sync-errors=true`) involves both returning an error and deleting unsynced data. Assuming error handling is correct the process will crash itself shortly afterwards. There is also some logic in the failure injector to force crash a little while later in case there's ever a bug in RocksDB or Cockroach where we ignore the failure. We can also use this approach to simulate machine crash (`system-crash/sync-errors=false`). Simply killing the process will drop writes that aren't yet synced, which is the same as what would happen if a machine crashed. Right now the test relies on frequent consistency checks to find errors like missing writes. It hits the DB heavily with KV queries to try to trigger enough flushes/WAL changes/compactions in case there are bugs in those code paths. But I am open to suggestions for alternative workloads/verification mechanisms. Release note: None
- Loading branch information
Showing
10 changed files
with
438 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
132 changes: 132 additions & 0 deletions
132
c-deps/libroach/rocksdbutils/env_sync_fault_injection.cc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,132 @@ | ||
// Copyright 2019 The Cockroach Authors. | ||
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or | ||
// implied. See the License for the specific language governing | ||
// permissions and limitations under the License. | ||
|
||
#include "env_sync_fault_injection.h" | ||
#include "rocksdb/utilities/object_registry.h" | ||
|
||
namespace rocksdb_utils { | ||
|
||
// See comment above `SyncFaultInjectionEnv` class definition. | ||
class SyncFaultInjectionWritableFile : public rocksdb::WritableFileWrapper { | ||
public: | ||
SyncFaultInjectionWritableFile(std::unique_ptr<rocksdb::WritableFile> target, | ||
int crash_failure_one_in, | ||
int sync_failure_one_in, | ||
bool crash_after_sync_failure); | ||
|
||
rocksdb::Status Append(const rocksdb::Slice& data) override; | ||
rocksdb::Status Sync() override; | ||
|
||
private: | ||
std::unique_ptr<rocksdb::WritableFile> target_; | ||
const int crash_failure_one_in_; | ||
const int sync_failure_one_in_; | ||
const bool crash_after_sync_failure_; | ||
// Countdown until crash if a sync failure already happened. | ||
int num_syncs_until_crash_; | ||
// Lock needed to handle concurrent writes and syncs. | ||
std::mutex mu_; | ||
// A buffer of written but unsynced data. | ||
std::string buffer_; | ||
|
||
// Some constants for use with `num_syncs_until_crash_`. | ||
const static int kNoCountdown = -1; | ||
const static int kStartCountdown = 10; | ||
}; | ||
|
||
SyncFaultInjectionWritableFile::SyncFaultInjectionWritableFile( | ||
std::unique_ptr<rocksdb::WritableFile> target, | ||
int crash_failure_one_in, | ||
int sync_failure_one_in, | ||
bool crash_after_sync_failure) : | ||
rocksdb::WritableFileWrapper(target.get()), | ||
target_(std::move(target)), | ||
crash_failure_one_in_(crash_failure_one_in), | ||
sync_failure_one_in_(sync_failure_one_in), | ||
crash_after_sync_failure_(crash_after_sync_failure), | ||
num_syncs_until_crash_(kNoCountdown) {} | ||
|
||
rocksdb::Status SyncFaultInjectionWritableFile::Append( | ||
const rocksdb::Slice& data) { | ||
std::unique_lock<std::mutex> lock(mu_); | ||
buffer_.append(data.data(), data.size()); | ||
return rocksdb::Status::OK(); | ||
} | ||
|
||
// We are using process crash to simulate system crash for tests and don't | ||
// expect these tests to face actual system crashes. So for "syncing" it is | ||
// sufficient to push data into page cache via the underlying `WritableFile`'s | ||
// `Append()`. That should be enough for the file data to survive a process | ||
// crash. | ||
rocksdb::Status SyncFaultInjectionWritableFile::Sync() { | ||
std::unique_lock<std::mutex> lock(mu_); | ||
if (num_syncs_until_crash_ > kNoCountdown) { | ||
--num_syncs_until_crash_; | ||
if (num_syncs_until_crash_ == 0) { | ||
exit(0); | ||
} | ||
// On Linux the behavior after a sync failure occurred is to clear the error | ||
// state and continue accepting writes/syncs. To simulate that behavior, we | ||
// do not return early here, even though the file is known to have lost writes. | ||
} | ||
|
||
if (crash_failure_one_in_ > 0 && random() % crash_failure_one_in_ == 0) { | ||
exit(0); | ||
} else if (sync_failure_one_in_ > 0 && random() % sync_failure_one_in_ == 0) { | ||
if (num_syncs_until_crash_ == kNoCountdown && crash_after_sync_failure_) { | ||
// This was the first failure. Start the countdown. | ||
num_syncs_until_crash_ = kStartCountdown; | ||
} | ||
// As mentioned above, after a sync failure we allow continued writes and syncs | ||
// to the same file. To make sure those new writes are written at the proper offset, | ||
// we cannot drop unsynced writes simply by clearing the buffer. Instead we drop | ||
// unsynced writes by overwriting the buffer with all zeros (well, this assumes | ||
// the buffer didn't have all zeros to begin with). | ||
buffer_.replace(0, buffer_.size(), buffer_.size(), '\0'); | ||
return rocksdb::Status::IOError(); | ||
} | ||
std::string old_buffer; | ||
buffer_.swap(old_buffer); | ||
// It should be fine to buffer new writes while we're syncing old ones, so unlock. | ||
lock.unlock(); | ||
return target_->Append(old_buffer); | ||
} | ||
|
||
SyncFaultInjectionEnv::SyncFaultInjectionEnv( | ||
Env* target, | ||
int crash_failure_one_in, | ||
int sync_failure_one_in, | ||
bool crash_after_sync_failure) : | ||
rocksdb::EnvWrapper(target), | ||
crash_failure_one_in_(crash_failure_one_in), | ||
sync_failure_one_in_(sync_failure_one_in), | ||
crash_after_sync_failure_(crash_after_sync_failure) {} | ||
|
||
rocksdb::Status SyncFaultInjectionEnv::NewWritableFile( | ||
const std::string& filename, | ||
std::unique_ptr<rocksdb::WritableFile>* result, | ||
const rocksdb::EnvOptions& env_options) { | ||
std::unique_ptr<rocksdb::WritableFile> underlying_file; | ||
rocksdb::Status s = EnvWrapper::NewWritableFile(filename, &underlying_file, env_options); | ||
if (s.ok()) { | ||
result->reset(new SyncFaultInjectionWritableFile( | ||
std::move(underlying_file), | ||
crash_failure_one_in_, | ||
sync_failure_one_in_, | ||
crash_after_sync_failure_)); | ||
} | ||
return s; | ||
} | ||
|
||
} // rocksdb_utils |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
// Copyright 2019 The Cockroach Authors. | ||
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or | ||
// implied. See the License for the specific language governing | ||
// permissions and limitations under the License. | ||
|
||
#pragma once | ||
|
||
#include <mutex> | ||
#include <string> | ||
|
||
#include "rocksdb/env.h" | ||
|
||
namespace rocksdb_utils { | ||
|
||
// `SyncFaultInjectionEnv` creates files that buffer `Append()`s in process memory | ||
// until `Sync()` is called. Such files enable us to simulate machine crashes by only | ||
// crashing the process. This works since, unlike normal files whose writes survive | ||
// process crash in page cache, these files' unsynced writes are dropped on the floor. | ||
// | ||
// Such files also enable us to simulate sync failure by dropping unsynced writes at | ||
// the same time we inject a sync error. This is more comprehensive than the available | ||
// fault injection tools I looked at (like libfiu and charybdefs), as those ones only | ||
// inject errors without dropping unsynced writes. | ||
class SyncFaultInjectionEnv : public rocksdb::EnvWrapper { | ||
public: | ||
// - `target`: A pointer to the underlying `Env`. | ||
// - `crash_failure_one_in`: During a sync operation, crash the process immediately | ||
// with a probability of 1/n. All unsynced writes are lost since they are buffered | ||
// in process memory. | ||
// - `sync_failure_one_in`: A sync operation will return failure with a probability | ||
// of 1/n. All unsynced writes for the file are dropped to simulate the failure. | ||
// - `crash_after_sync_failure`: If set to true, the program will crash itself some | ||
// time after the first simulated sync failure. It does not happen immediately to | ||
// allow the system to get itself into a weird state in case it doesn't handle sync | ||
// failures properly. | ||
SyncFaultInjectionEnv( | ||
Env* target, | ||
int crash_failure_one_in, | ||
int sync_failure_one_in, | ||
bool crash_after_sync_failure); | ||
|
||
rocksdb::Status NewWritableFile(const std::string& filename, | ||
std::unique_ptr<rocksdb::WritableFile>* result, | ||
const rocksdb::EnvOptions& env_options) override; | ||
|
||
private: | ||
const int crash_failure_one_in_; | ||
const int sync_failure_one_in_; | ||
const bool crash_after_sync_failure_; | ||
}; | ||
|
||
} // rocksdb_utils |
Submodule rocksdb
updated
from 47e4f2 to 90a02d
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.