Skip to content

Commit

Permalink
acceptance: introduce reference scenario test configs
Browse files Browse the repository at this point in the history
This patch introduces a couple example configuration files to use the
`shakespeare` tool with CockroachDB.

Release note: None
  • Loading branch information
knz committed Sep 15, 2019
1 parent 9583e2a commit 489bcca
Show file tree
Hide file tree
Showing 56 changed files with 1,794 additions and 0 deletions.
1 change: 1 addition & 0 deletions pkg/acceptance/testdata/scenarios/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
latest
6 changes: 6 additions & 0 deletions pkg/acceptance/testdata/scenarios/35890r/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Repro config for
https://github.com/cockroachdb/cockroach/issues/35890

- heavy workload 4000/5000 wh
- medium homogeneous topology 6 nodes
- Deco scenario
5 changes: 5 additions & 0 deletions pkg/acceptance/testdata/scenarios/35890r/conf.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
title issue 35890
parameter workload_size defaults to heavy
include ../tpcc/common.cfg
include ../tpcc/deco.cfg
include ../tpcc/cycle-script.cfg
6 changes: 6 additions & 0 deletions pkg/acceptance/testdata/scenarios/35938r/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Repro config for
https://github.com/cockroachdb/cockroach/issues/35938

- heavy workload 4000/5000 wh
- medium homogeneous topology 6 nodes
- quit scenario
5 changes: 5 additions & 0 deletions pkg/acceptance/testdata/scenarios/35938r/conf.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
title issue 35938
parameter workload_size defaults to heavy
include ../tpcc/common.cfg
include ../tpcc/quit.cfg
include ../tpcc/cycle-script.cfg
7 changes: 7 additions & 0 deletions pkg/acceptance/testdata/scenarios/36583r/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Repro config for
https://github.com/cockroachdb/cockroach/issues/36583

- using GCE not AWS
- heavy workload 4000/5000 wh
- large geo-dist topology 9 nodes
- DecoQ scenario
5 changes: 5 additions & 0 deletions pkg/acceptance/testdata/scenarios/36583r/conf.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
title issue 36583r
parameter workload_size defaults to heavy
include ../tpcc/common.cfg
include ../tpcc/decoq.cfg
include ../tpcc/cycle-script.cfg
9 changes: 9 additions & 0 deletions pkg/acceptance/testdata/scenarios/36623r/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Repro config for
https://github.com/cockroachdb/cockroach/issues/36623

- heavy workload 4000/5000 wh
- large geo-dist topology 9 nodes
- initialize but don't start workload
- decommission + quit node
- observe range imbalance

10 changes: 10 additions & 0 deletions pkg/acceptance/testdata/scenarios/36623r/scenario.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/usr/bin/env bash
set -eux

echo "While the decommission is ongoing, monitor for range imbalance."

time \
$ROACHPROD run $CLUSTER:1 --tag run -- \
"./cockroach node decommission $CENTER_MAPPING --insecure --wait all"
time \
$ROACHPROD stop $CLUSTER:$CENTER_MAPPING --tag node
7 changes: 7 additions & 0 deletions pkg/acceptance/testdata/scenarios/36676r/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Repro config for
https://github.com/cockroachdb/cockroach/issues/36676

- using AWS not GCE
- heavy workload 4000/5000 wh
- large homogeneous topology 9 nodes
- DecoQ scenario
5 changes: 5 additions & 0 deletions pkg/acceptance/testdata/scenarios/36676r/conf.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
title issue 36676
parameter workload_size defaults to heavy
include ../tpcc/common.cfg
include ../tpcc/decoq.cfg
include ../tpcc/cycle-script.cfg
102 changes: 102 additions & 0 deletions pkg/acceptance/testdata/scenarios/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# Scenario-based testing for CockroachDB

This directory contains example configurations to test CockroachDB
operational scenarios using the
[`shakespeare`](https://github.com/knz/shakespeare) tool.

## Setting up the environment

When using scenario tests that only run on the local machine,
set the env var `$COCKROACH` to the crdb binary to use.

When using `roachprod`-based tests, the scenarios assume
that a cluster already exists. For the TPC-C tests,
the scenarios also assume TPC-C is already loaded.

This requires several pre-test manual steps, but scripts are provided
to facilitate these steps.

- to generate the scripts, perform once:

1. set the env vars `ROACHPROD` and `COCKROACH` to the paths to
the `roachprod` and `cockroach` binaries, respectively.
2. set the env var `ROACHPROD_HOME` to the roachprod home directory,
typically `$HOME/.roachprod`.
3. set the env var `COCKROACH_DEV_LICENSE` to an enterprise license
string.
4. optional: set env `ROACHPROD_USER`, if the local username is not
the cloud username.
5. select a cluster size (`small`, `medium`, `large`) and a topology
(`-h` for homogeneous, `-g` for geo-distributed)
6. run `mkenv.sh <config>` (e.g. `mkenv.sh small-g`).

This generates cluster configuration scripts in
`$ROACHPROD_HOME/envs/<config>` and these scripts can be reused
henceforth. The values of all the environment variables are
preserved inside this directory and will thus persist across
shell sessions.

- initialize a VM pool, crdb cluster and TPC-C workload by running the
following commands:

eval $(~/.roachprod/envs/<config>/bin/setenv.sh)
export PATH=$CLUSTER_ENV/bin:$PATH
1-create-vms.sh
2-stage-cockroach-binary.sh
3-initial-upreplication.sh
4-initial-zone-config.sh
# for TPC-C tests using up to 100 warehouses:
5-import-tpcc-fixtures.sh 100
# if planning to run multiple scenario tests, or the same test multiple times:
6-snapshot-data.sh

Notes:

- all the scripts keep a log of which script was run and when,
in the file `~/.roachprod/envs/<config>/action-log.txt`.

- if step 1 "create vms" times out while attempting to create on
AWS, this may be a symptom of a misconfigured SSH key. `roachprod`
is unfortunately silent in that case.

- if step 3 "initial replication" fails or takes too long,
this is a known issue. The easiest is to:

1. interrupt the wait (eg ctrl+c)
2. run `quit-crdb.sh` to stop the nodes
3. retry `3-initial-upreplication.sh`

- if step 5 "import" fails with some error, optionally use
`get-logs.sh` to investigate, then retry the import with:

wipe-crdb.sh
3-initial-upreplication.sh
4-initial-zone-config.sh
5-import-tpcc-fixtures.sh NNN

- step 6 snapshotting is strongly recommended! however it requires
more than 50% free space on the data store directory on every
node. With the default VM configuration, this space will be
available up to about 5000 TPC-C warehouses.

## Running scenario tests

Each directory defines one test scenario. The directory name is
arbitrary but some attempt was made to name it after the main traits
of the scenario.

In each directory the scenario can be executed with:

# Choose an execution environment.
eval $(~/.roachprod/envs/xxxx/bin/setenv.sh)
# Run the scenario.
.../shakespeare conf.cfg -I../include
# (or alternatively `go run .../shakespeare.go`)

To upload artifacts at the end of "interesting" tests use

--upload-url gs://shakespeare-artifacts.crdb.io/public/yourname

(You can then replace `gs://` by `http://` to access the reports in a
browser.)

Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# define what a "client" is
include workers/workload_kv.cfg
# define what a "nemesis" is
include nemeses/decommission.cfg

# define 4 nodes, 2 clients, 1 nemesis
include servers.cfg
include workers.cfg
include nemesis_n1.cfg

# simplest scenario: initialize, then activate the nemesis for a short time
include scripts/short.cfg

# verify "good behavior"
include workers/check_stable_behavior_kv.cfg

# ensure that initialization also waits for upreplication
include wait_upreplication.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# define what a "client" is
include workers/workload_tpcc.cfg
# define what a "nemesis" is
include nemeses/random_decommission.cfg

# define 4 nodes, 2 clients, 1 nemesis
include servers.cfg
include workers.cfg
include nemesis_n1.cfg

# simplest scenario: initialize, then activate the nemesis for a short time
include scripts/short.cfg

# verify "good behavior"
include workers/check_stable_behavior_tpcc.cfg

# ensure that initialization also waits for upreplication
include wait_upreplication.cfg
40 changes: 40 additions & 0 deletions pkg/acceptance/testdata/scenarios/simple-example/include/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Common test definitions for CockroachDB test scenarios

This directory contains the "building blocks" to define scenario
tests.

The following conventions are used:

- servers are named `n1`, `n2`, ...
- clients (workers) are named `w1`, `w2`, ...
- acts 1 to 3 of the storyline are used for initialization:
- act 1 for server initialization.
- act 2 for client initialization.
- act 3 for ramp-up.
- acts 4 and further are used to exercise test clusters.
- the mood "blue" is used to identify ramp-up periods.
- for tests that define only one nemesis (operational event) at a time:
- mood "orange" identifies the period where the nemesis activates on the affected node
- mood "red" identifies the period during which the effects of the nemesis are active
- mood "yellow" identifies the period where the nemesis deactivates on the affected node

General template for a test:

```
# tweak this to change the server/client combination
include workers/xxx.cfg
include nemeses/yyy.cfg
# common cast + scenes
include servers.cfg
include workers.cfg
include nemesis_n1.cfg
# short/long/cycle or define your own
include scripts/zzz.cfg
# desired audience
include workers/check_xxx.cfg
# ...additional definitions here...
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
title decommission nemesis
role nemesis_decommission
:stop $COCKROACH node decommission $node_id --insecure --url $(cat ../$node/pgurl) --wait all >deco.log 2>&1
:restart $COCKROACH node recommission $node_id --insecure --url $(cat ../$node/pgurl) >reco.log 2>&1
end
parameter nemesis defaults to nemesis_decommission
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
title kill nemesis
role nemesis_kill
:stop kill -9 $(cat ../$node/pid)
:restart ../$node/actions/start.sh
end
parameter nemesis defaults to nemesis_kill
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
title quit nemesis
role nemesis_quit
:stop ../$node/actions/quit.sh
:restart ../$node/actions/start.sh
end
parameter nemesis defaults to nemesis_quit
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
title decommission nemesis
role nemesis_decommission
:stop echo $(($RANDOM%$cluster_size+1)) > deco_node && \\
$COCKROACH node decommission $(cat deco_node) --insecure --url $(cat ../$node/pgurl) --wait all >deco.log 2>&1
:restart $COCKROACH node recommission $(cat deco_node) --insecure --url $(cat ../$node/pgurl) >reco.log 2>&1
end
parameter nemesis defaults to nemesis_decommission
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
title roachprod kill/restart nemesis
role nemesis_rp_kill
:stop HOME=$ROACHPROD_HOME/.. \
$ROACHPROD stop $(cat ../$node/roachprod_target) >>kill.log 2>&1
:restart HOME=$ROACHPROD_HOME/.. TMPDIR=/tmp \
$ROACHPROD start $(cat ../$node/roachprod_target) >>restart.log 2>&1
end
parameter nemesis defaults to nemesis_rp_kill
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
title pain on n1

cast
badguy plays ~nemesis~ with node=n1 p=26257 h=8080 node_id=1 cluster_size=~cluster_size~
end

script
scene F entails for badguy: stop
scene F mood starts orange
scene F mood ends red
scene R entails for badguy: restart
scene R mood starts yellow
scene R mood ends clear
end
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
script
# let acts 1 to 3 alone, these are used to initialize the server and
# calibrate expected values.
storyline . . . ... FR ......
end
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
script
# let acts 1 to 3 alone, these are used to initialize the server and
# calibrate expected values.
storyline . . . ... F ............... R ........
end
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
script
# let acts 1 to 3 alone, these are used to initialize the server and
# calibrate expected values.
storyline . . . F ..... R .....
end
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
parameter cluster_size defaults to 4
title ~cluster_size~ nodes

parameter serverconf defaults to local
include servers/cockroach_~serverconf~.cfg

cast
n* play ~cluster_size~ nodes \
with nodei=$(($i+1)) \
pbase=$i \
peers=localhost:$((26257+($i+~cluster_size~-1)%~cluster_size~)),localhost:$((26257+($i+~cluster_size~+1)%~cluster_size~))
end

script
scene s entails for every node: start
scene i entails for n1: init
storyline si...
end

audience
server_events watches every node warning
server_events watches every node error
server_events watches every node fatal
server_events measures occurrences (y value is source index)
end
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
role node
:start export p=$((26257+$pbase)) h=$((8080+$pbase)); \
export TMPDIR=/tmp; \
echo "postgresql://root@localhost:$p/?sslmode=disable">pgurl; \
$COCKROACH start \\
--join=$peers --listen-addr=localhost:$p --http-addr=localhost:$h --insecure \\
--background --pid-file=pid
:init $COCKROACH init --url $(cat pgurl)
:quit $COCKROACH quit --url $(cat pgurl)
:reset kill $(cat pid) && sleep 1 && kill -9 $(cat pid) || true; rm -rf cockroach-data
cleanup if test -e pid; then kill $(cat pid) && sleep 1 && kill -9 $(cat pid) || true; fi; \
mkdir -p cockroach-data/logs; \
touch cockroach-data/logs/cockroach.log
spotlight tail -F cockroach-data/logs/cockroach.log | \\
stdbuf -oL grep -v 'server is not accepting clients'
signal warning event at ^W(?P<ts_log>)\s+\d+\s+\S+\s+(?P<event>.*)$
signal error event at ^E(?P<ts_log>)\s+\d+\s+\S+\s+(?P<event>.*)$
signal fatal event at ^F(?P<ts_log>)\s+\d+\s+\S+\s+(?P<event>.*)$
end
Loading

0 comments on commit 489bcca

Please sign in to comment.