Skip to content

Commit

Permalink
Add roachtest that simulates system crash and sync failures
Browse files Browse the repository at this point in the history
There is an existing synctest that verifies the database is correct and
usable after a crash triggered by an I/O error. The charybdefs
dependency it uses does error injection by manipulating return values.
When it injects an error into a sync operation, that sync does no work
and returns an error, but unsynced writes still survive in page cache.
Then after process crash-recovery, the DB's state is the same as if the
failed sync had succeeded. This new test attempts to simulate the
effects of a failed sync more completely, in particular by ensuring
unsynced writes are dropped.

The approach taken in this new test is to buffer unsynced writes in
process memory. This is achieved by providing our own implementation of
a few C syscall wrappers via `LD_PRELOAD`. By buffering in process
memory instead of page cache, we can easily drop unsynced writes.

In this new test, sync failure injection
(`system-crash/sync-errors=true`) involves both returning an error and
deleting unsynced data. Assuming error handling is correct the process
will crash itself shortly afterwards. There is also some logic in the
failure injector to force crash a little while later in case there's
ever a bug in RocksDB or Cockroach where we ignore the failure.

We can also use this approach to simulate machine crash
(`system-crash/sync-errors=false`). Simply killing the process will drop
writes that aren't yet synced, which is the same as what would happen if
a machine crashed.

Right now the test relies on frequent consistency checks to find errors
like missing writes. It hits the DB heavily with KV queries to try to
trigger enough flushes/WAL changes/compactions in case there are bugs in
those code paths. But I am open to suggestions for alternative
workloads/verification mechanisms.

Release note: None
  • Loading branch information
ajkr committed Apr 24, 2019
1 parent 165f452 commit 1c54042
Show file tree
Hide file tree
Showing 10 changed files with 438 additions and 4 deletions.
1 change: 1 addition & 0 deletions c-deps/libroach/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ add_library(roach
protos/util/log/log.pb.cc
protos/util/unresolved_addr.pb.cc
rocksdbutils/env_encryption.cc
rocksdbutils/env_sync_fault_injection.cc
)
target_include_directories(roach
PUBLIC ./include
Expand Down
36 changes: 36 additions & 0 deletions c-deps/libroach/db.cc
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
#include <rocksdb/sst_file_writer.h>
#include <rocksdb/table.h>
#include <rocksdb/utilities/checkpoint.h>
#include <rocksdb/utilities/object_registry.h>
#include <stdarg.h>
#include "batch.h"
#include "cache.h"
Expand All @@ -34,6 +35,7 @@
#include "iterator.h"
#include "merge.h"
#include "options.h"
#include "rocksdbutils/env_sync_fault_injection.h"
#include "snapshot.h"
#include "status.h"
#include "table_props.h"
Expand Down Expand Up @@ -151,7 +153,41 @@ static DBOpenHook* db_open_hook = DBOpenHookOSS;

void DBSetOpenHook(void* hook) { db_open_hook = (DBOpenHook*)hook; }

void DBRegisterTestingEnvs() {
// Have a couple `SyncFaultInjectionEnv`s registered with RocksDB for our test
// cases to use, which they can select using any of RocksDB's text-based options
// mechanisms. Currently the only one Cockroach exposes is options string which
// is passed the value from `DBOptions::rocksdb_options`.
static rocksdb::Registrar<rocksdb::Env> sync_failure_env_reg(
"sync-failure-injection-wrapping-default-env",
[](const std::string& /* name */,
std::unique_ptr<rocksdb::Env>* /* guard */) {
static rocksdb_utils::SyncFaultInjectionEnv env(
rocksdb::Env::Default(),
0 /* crash_failure_one_in */,
500000 /* sync_failure_one_in */,
true /* crash_after_sync_failure */);
return &env;
}
);

static rocksdb::Registrar<rocksdb::Env> crash_failure_env_reg(
"crash-failure-injection-wrapping-default-env",
[](const std::string& /* name */,
std::unique_ptr<rocksdb::Env>* /* guard */) {
static rocksdb_utils::SyncFaultInjectionEnv env(
rocksdb::Env::Default(),
500000 /* crash_failure_one_in */,
0 /* sync_failure_one_in */,
false /* crash_after_sync_failure */);
return &env;
}
);
}

DBStatus DBOpen(DBEngine** db, DBSlice dir, DBOptions db_opts) {
DBRegisterTestingEnvs();

rocksdb::Options options = DBMakeOptions(db_opts);

const std::string additional_options = ToString(db_opts.rocksdb_options);
Expand Down
132 changes: 132 additions & 0 deletions c-deps/libroach/rocksdbutils/env_sync_fault_injection.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
// Copyright 2019 The Cockroach Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
// implied. See the License for the specific language governing
// permissions and limitations under the License.

#include "env_sync_fault_injection.h"
#include "rocksdb/utilities/object_registry.h"

namespace rocksdb_utils {

// See comment above `SyncFaultInjectionEnv` class definition.
class SyncFaultInjectionWritableFile : public rocksdb::WritableFileWrapper {
public:
SyncFaultInjectionWritableFile(std::unique_ptr<rocksdb::WritableFile> target,
int crash_failure_one_in,
int sync_failure_one_in,
bool crash_after_sync_failure);

rocksdb::Status Append(const rocksdb::Slice& data) override;
rocksdb::Status Sync() override;

private:
std::unique_ptr<rocksdb::WritableFile> target_;
const int crash_failure_one_in_;
const int sync_failure_one_in_;
const bool crash_after_sync_failure_;
// Countdown until crash if a sync failure already happened.
int num_syncs_until_crash_;
// Lock needed to handle concurrent writes and syncs.
std::mutex mu_;
// A buffer of written but unsynced data.
std::string buffer_;

// Some constants for use with `num_syncs_until_crash_`.
const static int kNoCountdown = -1;
const static int kStartCountdown = 10;
};

SyncFaultInjectionWritableFile::SyncFaultInjectionWritableFile(
std::unique_ptr<rocksdb::WritableFile> target,
int crash_failure_one_in,
int sync_failure_one_in,
bool crash_after_sync_failure) :
rocksdb::WritableFileWrapper(target.get()),
target_(std::move(target)),
crash_failure_one_in_(crash_failure_one_in),
sync_failure_one_in_(sync_failure_one_in),
crash_after_sync_failure_(crash_after_sync_failure),
num_syncs_until_crash_(kNoCountdown) {}

rocksdb::Status SyncFaultInjectionWritableFile::Append(
const rocksdb::Slice& data) {
std::unique_lock<std::mutex> lock(mu_);
buffer_.append(data.data(), data.size());
return rocksdb::Status::OK();
}

// We are using process crash to simulate system crash for tests and don't
// expect these tests to face actual system crashes. So for "syncing" it is
// sufficient to push data into page cache via the underlying `WritableFile`'s
// `Append()`. That should be enough for the file data to survive a process
// crash.
rocksdb::Status SyncFaultInjectionWritableFile::Sync() {
std::unique_lock<std::mutex> lock(mu_);
if (num_syncs_until_crash_ > kNoCountdown) {
--num_syncs_until_crash_;
if (num_syncs_until_crash_ == 0) {
exit(0);
}
// On Linux the behavior after a sync failure occurred is to clear the error
// state and continue accepting writes/syncs. To simulate that behavior, we
// do not return early here, even though the file is known to have lost writes.
}

if (crash_failure_one_in_ > 0 && random() % crash_failure_one_in_ == 0) {
exit(0);
} else if (sync_failure_one_in_ > 0 && random() % sync_failure_one_in_ == 0) {
if (num_syncs_until_crash_ == kNoCountdown && crash_after_sync_failure_) {
// This was the first failure. Start the countdown.
num_syncs_until_crash_ = kStartCountdown;
}
// As mentioned above, after a sync failure we allow continued writes and syncs
// to the same file. To make sure those new writes are written at the proper offset,
// we cannot drop unsynced writes simply by clearing the buffer. Instead we drop
// unsynced writes by overwriting the buffer with all zeros (well, this assumes
// the buffer didn't have all zeros to begin with).
buffer_.replace(0, buffer_.size(), buffer_.size(), '\0');
return rocksdb::Status::IOError();
}
std::string old_buffer;
buffer_.swap(old_buffer);
// It should be fine to buffer new writes while we're syncing old ones, so unlock.
lock.unlock();
return target_->Append(old_buffer);
}

SyncFaultInjectionEnv::SyncFaultInjectionEnv(
Env* target,
int crash_failure_one_in,
int sync_failure_one_in,
bool crash_after_sync_failure) :
rocksdb::EnvWrapper(target),
crash_failure_one_in_(crash_failure_one_in),
sync_failure_one_in_(sync_failure_one_in),
crash_after_sync_failure_(crash_after_sync_failure) {}

rocksdb::Status SyncFaultInjectionEnv::NewWritableFile(
const std::string& filename,
std::unique_ptr<rocksdb::WritableFile>* result,
const rocksdb::EnvOptions& env_options) {
std::unique_ptr<rocksdb::WritableFile> underlying_file;
rocksdb::Status s = EnvWrapper::NewWritableFile(filename, &underlying_file, env_options);
if (s.ok()) {
result->reset(new SyncFaultInjectionWritableFile(
std::move(underlying_file),
crash_failure_one_in_,
sync_failure_one_in_,
crash_after_sync_failure_));
}
return s;
}

} // rocksdb_utils
61 changes: 61 additions & 0 deletions c-deps/libroach/rocksdbutils/env_sync_fault_injection.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
// Copyright 2019 The Cockroach Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
// implied. See the License for the specific language governing
// permissions and limitations under the License.

#pragma once

#include <mutex>
#include <string>

#include "rocksdb/env.h"

namespace rocksdb_utils {

// `SyncFaultInjectionEnv` creates files that buffer `Append()`s in process memory
// until `Sync()` is called. Such files enable us to simulate machine crashes by only
// crashing the process. This works since, unlike normal files whose writes survive
// process crash in page cache, these files' unsynced writes are dropped on the floor.
//
// Such files also enable us to simulate sync failure by dropping unsynced writes at
// the same time we inject a sync error. This is more comprehensive than the available
// fault injection tools I looked at (like libfiu and charybdefs), as those ones only
// inject errors without dropping unsynced writes.
class SyncFaultInjectionEnv : public rocksdb::EnvWrapper {
public:
// - `target`: A pointer to the underlying `Env`.
// - `crash_failure_one_in`: During a sync operation, crash the process immediately
// with a probability of 1/n. All unsynced writes are lost since they are buffered
// in process memory.
// - `sync_failure_one_in`: A sync operation will return failure with a probability
// of 1/n. All unsynced writes for the file are dropped to simulate the failure.
// - `crash_after_sync_failure`: If set to true, the program will crash itself some
// time after the first simulated sync failure. It does not happen immediately to
// allow the system to get itself into a weird state in case it doesn't handle sync
// failures properly.
SyncFaultInjectionEnv(
Env* target,
int crash_failure_one_in,
int sync_failure_one_in,
bool crash_after_sync_failure);

rocksdb::Status NewWritableFile(const std::string& filename,
std::unique_ptr<rocksdb::WritableFile>* result,
const rocksdb::EnvOptions& env_options) override;

private:
const int crash_failure_one_in_;
const int sync_failure_one_in_;
const bool crash_after_sync_failure_;
};

} // rocksdb_utils
2 changes: 1 addition & 1 deletion c-deps/rocksdb
4 changes: 4 additions & 0 deletions pkg/cmd/roachprod/install/cockroach.go
Original file line number Diff line number Diff line change
Expand Up @@ -386,6 +386,10 @@ tar cvf certs.tar certs
cmd += `echo ">>> roachprod start: $(date)" >> ` + logDir + "/roachprod.log; " +
`ps axeww -o pid -o command >> ` + logDir + "/roachprod.log; " +
`[ -x /usr/bin/lslocks ] && /usr/bin/lslocks >> ` + logDir + "/roachprod.log; "
if c.IsLocal() {
// This is consistent with the working directory used by `roachprod run`.
cmd += fmt.Sprintf("cd ${HOME}/local/%d ; ", nodes[i])
}
cmd += keyCmd +
fmt.Sprintf(" export ROACHPROD=%d%s && ", nodes[i], c.Tag) +
"GOTRACEBACK=crash " +
Expand Down
2 changes: 1 addition & 1 deletion pkg/cmd/roachprod/vm/gce/gcloud.go
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ var projectsWithGC = []string{defaultProject, "andrei-jepsen"}

// init will inject the GCE provider into vm.Providers, but only if the gcloud tool is available on the local path.
func init() {
var p vm.Provider
var p vm.Provider = &Provider{}
if _, err := exec.LookPath("gcloud"); err != nil {
p = flagstub.New(p, "please install the gcloud CLI utilities "+
"(https://cloud.google.com/sdk/downloads)")
Expand Down
4 changes: 2 additions & 2 deletions pkg/cmd/roachtest/cluster.go
Original file line number Diff line number Diff line change
Expand Up @@ -925,7 +925,7 @@ func (c *cluster) All() nodeListOption {
return c.Range(1, c.nodes)
}

// All returns a node list containing the nodes [begin,end].
// Range returns a node list containing the nodes [begin,end].
func (c *cluster) Range(begin, end int) nodeListOption {
if begin < 1 || end > c.nodes {
c.t.Fatalf("invalid node range: %d-%d (1-%d)", begin, end, c.nodes)
Expand All @@ -937,7 +937,7 @@ func (c *cluster) Range(begin, end int) nodeListOption {
return r
}

// All returns a node list containing only the node i.
// Node returns a node list containing only the node i.
func (c *cluster) Node(i int) nodeListOption {
return c.Range(i, i)
}
Expand Down
1 change: 1 addition & 0 deletions pkg/cmd/roachtest/registry.go
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ func registerTests(r *registry) {
registerSQLsmith(r)
registerSyncTest(r)
registerSysbench(r)
registerSystemCrashTest(r)
registerTPCC(r)
registerTypeORM(r)
registerLoadSplits(r)
Expand Down
Loading

0 comments on commit 1c54042

Please sign in to comment.