Skip to content

Commit

Permalink
Add network test in CI and run demo command (#1552)
Browse files Browse the repository at this point in the history
<!-- Describe your change here -->

This PR enhances the [demo
tutorial](https://hydra.family/head-protocol/docs/getting-started/) by
enabling `hydra-cluster` benchmarks to run on an active Hydra cluster.

**usage**

See the newly introduced `network-test.yaml` for the related invocations of pumba and the hydra clients. Supposing they are running, you simply run:

```sh
nix run .#legacyPackages.x86_64-linux.hydra-cluster.components.benchmarks.bench-e2e -- \
          demo \
          --output-directory=$(pwd)/benchmarks \
          --scaling-factor=100 \
          --timeout=1000s \
          --testnet-magic 42 \
          --node-socket=${NETWORK_DIR}/node.socket \
          --hydra-client=localhost:4001 \
          --hydra-client=localhost:4002 \
          --hydra-client=localhost:4003
```

and you will get some statistics on txns confirmed, time taken, etc.

**prerequisites**
- A Cardano node must be running on specified `node-socket`.
- Hydra nodes must be operational on provided `hydra-client` hosts.
- There’s no need to pre-seed the keys, as the bench-demo script will
automatically fund them using the faucet.
- Note that the reference scripts should already be published, and the
Hydra nodes must be running with those scripts.


**Todo**

- [x] Fix the `FIXME` about `> 33`
- [x] Remove duplicate seeding
- [x] Make sure the entire CI process doesn't fail when the pumba causes
the network to fail
- [x] Make it so that if it _fails_ the head is closed.
- [x] Quick little matrix to run a few different scenarios
- [x] Make the bench-e2e fail if it didn't submit all the txns ( ideally
would also be able to see visually in the job list; but Github is
missing a feature see also actions/runner#2347
)
- [x] Get docker info via `docker inspect` instead of parsing yaml (!)
- [x] Make sure `results.csv` is written to the `outputDirectory` not
the tmp directory
- [x] Upload the results as part of the artifacts
- [x] Write the summary out even when it failed

---

<!-- Consider each and tick it off one way or the other -->
* [x] CHANGELOG updated or not needed
* [x] Documentation updated or not needed
* [x] Haddocks updated or not needed
* [x] No new TODOs introduced or explained herafter

---------

Co-authored-by: Noon van der Silk <noon.vandersilk@iohk.io>
Co-authored-by: Sebastian Nagel <sebastian.nagel@ncoding.at>
  • Loading branch information
3 people authored Aug 29, 2024
1 parent a51e04c commit b03fa32
Show file tree
Hide file tree
Showing 21 changed files with 771 additions and 269 deletions.
132 changes: 132 additions & 0 deletions .github/workflows/network-test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
name: "Network fault tolerance"

on:
pull_request:
workflow_dispatch:
inputs:
debug_enabled:
type: boolean
description: 'Run the build with tmate debugging enabled (https://github.com/marketplace/actions/debugging-with-tmate)'
required: false
default: false

jobs:
network-test:
runs-on: ubuntu-latest
strategy:
matrix:
# Note: At present we can only run for 3 peers; to configure this for
# more we need to make the docker-compose spin-up dynamic across
# however many we would like to configure here.
# Currently this is just a label and does not have any functional impact.
peers: [3]
scaling_factor: [10, 50]
netem_loss: [0, 1, 2, 3, 4, 5, 10, 20]
name: "Peers: ${{ matrix.peers }}, scaling: ${{ matrix.scaling_factor }}, loss: ${{ matrix.netem_loss }}"
steps:
- uses: actions/checkout@v4
with:
submodules: true

- name: ❄ Prepare nix
uses: cachix/install-nix-action@V27
with:
extra_nix_config: |
accept-flake-config = true
log-lines = 1000
- name: ❄ Cachix cache of nix derivations
uses: cachix/cachix-action@v15
with:
name: cardano-scaling
authToken: '${{ secrets.CACHIX_CARDANO_SCALING_AUTH_TOKEN }}'

- name: Build docker images for netem specifically
run: |
nix build .#docker-hydra-node-for-netem
./result | docker load
- name: Setup containers for network testing
run: |
cd demo
./prepare-devnet.sh
docker compose up -d cardano-node
sleep 5
# :tear: socket permissions.
sudo chown runner:docker devnet/node.socket
./export-tx-id-and-pparams.sh
# Specify two docker compose yamls; the second one overrides the
# images to use the netem ones specifically
docker compose -f docker-compose.yaml -f docker-compose-netem.yaml up -d hydra-node-{1,2,3}
sleep 3
docker ps
- name: Build required nix and docker derivations
run: |
nix build .#legacyPackages.x86_64-linux.hydra-cluster.components.benchmarks.bench-e2e
nix build github:noonio/pumba/noon/add-flake
# Use tmate to get a shell onto the runner to do some temporary hacking
#
# <https://github.com/mxschmitt/action-tmate>
#
- name: Setup tmate session
uses: mxschmitt/action-tmate@v3
if: ${{ github.event_name == 'workflow_dispatch' && github.event.inputs.debug_enabled }}
with:
limit-access-to-actor: true

- name: Run pumba and the benchmarks
# Note: We're going to allow everything to fail. In the job on GitHub,
# we will be able to see which ones _did_, in fact, fail. Originally,
# we were keeping track of our expectations with 'include' and
# 'exclude' directives here, but I think it's best to leave those out,
# as some of the tests (say 5%) fail, and overall the conditions of
# failure depend on the scaling factor, the peers, etc, and it becomes
# too complicated to track here.
continue-on-error: true
run: |
# Extract inputs with defaults for non-workflow_dispatch events
percent="${{ matrix.netem_loss }}"
scaling_factor="${{ matrix.scaling_factor }}"
target_peer="hydra-node-1"
other_peers="172.16.238.20 172.16.238.30"
.github/workflows/network/run_pumba.sh $target_peer $percent $other_peers
# Run benchmark on demo
mkdir benchmarks
touch benchmarks/test.log
nix run .#legacyPackages.x86_64-linux.hydra-cluster.components.benchmarks.bench-e2e -- \
demo \
--output-directory=benchmarks \
--scaling-factor="$scaling_factor" \
--timeout=1000s \
--testnet-magic 42 \
--node-socket=demo/devnet/node.socket \
--hydra-client=localhost:4001 \
--hydra-client=localhost:4002 \
--hydra-client=localhost:4003
- name: Acquire logs
if: always()
run: |
cd demo
docker compose logs > docker-logs
- name: 💾 Upload logs
if: always()
uses: actions/upload-artifact@v4
with:
name: "docker-logs-netem-loss=${{ matrix.netem_loss }},scaling_factor=${{ matrix.scaling_factor }},peers=${{ matrix.peers }}"
path: demo/docker-logs
if-no-files-found: ignore

- name: 💾 Upload build & test artifacts
if: always()
uses: actions/upload-artifact@v4
with:
name: "benchmarks-netem-loss=${{ matrix.netem_loss }},scaling_factor=${{ matrix.scaling_factor }},peers=${{ matrix.peers }}"
path: benchmarks
if-no-files-found: ignore
23 changes: 23 additions & 0 deletions .github/workflows/network/run_pumba.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#!/usr/bin/env bash

target_node_name=$1

percent=$2

rest_node_names=$3

# Build Pumba netem command
# Note: We leave it for 20 minutes; but really it's effectively unlimited. We don't
# expect any of our tests to run longer than that.
nix_command="nix run github:noonio/pumba/noon/add-flake -- -l debug netem --duration 20m"

while IFS= read -r network; do
nix_command+=" --target $network"
done <<< "$rest_node_names"

nix_command+=" loss --percent \"$percent\" \"re2:$target_node_name\" &"

echo "$nix_command"

# Run Pumba netem command
eval "$nix_command"
2 changes: 2 additions & 0 deletions demo/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
/benchmarks
/datasets
9 changes: 9 additions & 0 deletions demo/docker-compose-netem.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
services:
hydra-node-1:
image: hydra-node-for-netem

hydra-node-2:
image: hydra-node-for-netem

hydra-node-3:
image: hydra-node-for-netem
7 changes: 6 additions & 1 deletion demo/docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@ services:
, "--ledger-protocol-parameters", "/devnet/protocol-parameters.json"
, "--testnet-magic", "42"
, "--node-socket", "/devnet/node.socket"
, "--persistence-dir", "/devnet/persistence/alice"
, "--contestation-period", "3"
]
networks:
hydra_net:
Expand Down Expand Up @@ -83,6 +85,8 @@ services:
, "--ledger-protocol-parameters", "/devnet/protocol-parameters.json"
, "--testnet-magic", "42"
, "--node-socket", "/devnet/node.socket"
, "--persistence-dir", "/devnet/persistence/bob"
, "--contestation-period", "3"
]
networks:
hydra_net:
Expand Down Expand Up @@ -118,6 +122,8 @@ services:
, "--ledger-protocol-parameters", "/devnet/protocol-parameters.json"
, "--testnet-magic", "42"
, "--node-socket", "/devnet/node.socket"
, "--persistence-dir", "/devnet/persistence/carol"
, "--contestation-period", "3"
]
networks:
hydra_net:
Expand Down Expand Up @@ -188,7 +194,6 @@ services:
hydra_net:
ipv4_address: 172.16.238.5


networks:
hydra_net:
driver: bridge
Expand Down
71 changes: 71 additions & 0 deletions demo/export-tx-id-and-pparams.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
#!/usr/bin/env bash

set -eo pipefail

SCRIPT_DIR=${SCRIPT_DIR:-$(realpath $(dirname $(realpath $0)))}
NETWORK_ID=42

CCLI_CMD=
DEVNET_DIR=/devnet
if [[ -n ${1} ]]; then
echo >&2 "Using provided cardano-cli command: ${1}"
$(${1} version > /dev/null)
CCLI_CMD=${1}
DEVNET_DIR=${SCRIPT_DIR}/devnet
fi

HYDRA_NODE_CMD=
if [[ -n ${2} ]]; then
echo >&2 "Using provided hydra-node command: ${2}"
${2} --version > /dev/null
HYDRA_NODE_CMD=${2}
fi

# Invoke hydra-node in a container or via provided executable
function hnode() {
if [[ -n ${HYDRA_NODE_CMD} ]]; then
${HYDRA_NODE_CMD} ${@}
else
docker run --rm \
--pull always \
-v ${SCRIPT_DIR}/devnet:/devnet \
ghcr.io/cardano-scaling/hydra-node:0.18.1 -- ${@}
fi
}

function publishReferenceScripts() {
echo >&2 "Publishing reference scripts..."
hnode publish-scripts \
--testnet-magic ${NETWORK_ID} \
--node-socket ${DEVNET_DIR}/node.socket \
--cardano-signing-key devnet/credentials/faucet.sk
}

# Invoke cardano-cli in running cardano-node container or via provided cardano-cli
function ccli() {
ccli_ ${@} --testnet-magic ${NETWORK_ID}
}
function ccli_() {
if [[ -x ${CCLI_CMD} ]]; then
${CCLI_CMD} ${@}
else
${DOCKER_COMPOSE_CMD} exec cardano-node cardano-cli ${@}
fi
}

function queryPParams() {
echo >&2 "Query Protocol parameters"
if [[ -x ${CCLI_CMD} ]]; then
ccli query protocol-parameters --socket-path ${DEVNET_DIR}/node.socket --out-file /dev/stdout \
| jq ".txFeeFixed = 0 | .txFeePerByte = 0 | .executionUnitPrices.priceMemory = 0 | .executionUnitPrices.priceSteps = 0" > devnet/protocol-parameters.json
else
docker exec demo-cardano-node-1 cardano-cli query protocol-parameters --testnet-magic ${NETWORK_ID} --socket-path ${DEVNET_DIR}/node.socket --out-file /dev/stdout \
| jq ".txFeeFixed = 0 | .txFeePerByte = 0 | .executionUnitPrices.priceMemory = 0 | .executionUnitPrices.priceSteps = 0" > devnet/protocol-parameters.json
fi
echo >&2 "Saved in protocol-parameters.json"
}

queryPParams
echo "HYDRA_SCRIPTS_TX_ID=$(publishReferenceScripts)" > .env
echo >&2 "Environment variable stored in '.env'"
echo >&2 -e "\n\t$(cat .env)\n"
38 changes: 2 additions & 36 deletions demo/seed-devnet.sh
Original file line number Diff line number Diff line change
Expand Up @@ -43,18 +43,6 @@ function ccli_() {
fi
}

# Invoke hydra-node in a container or via provided executable
function hnode() {
if [[ -n ${HYDRA_NODE_CMD} ]]; then
${HYDRA_NODE_CMD} ${@}
else
docker run --rm -it \
--pull always \
-v ${SCRIPT_DIR}/devnet:/devnet \
ghcr.io/cardano-scaling/hydra-node:0.18.1 -- ${@}
fi
}

# Retrieve some lovelace from faucet
function seedFaucet() {
ACTOR=${1}
Expand Down Expand Up @@ -89,26 +77,6 @@ function seedFaucet() {
echo >&2 "Done"
}

function publishReferenceScripts() {
echo >&2 "Publishing reference scripts..."
hnode publish-scripts \
--testnet-magic ${NETWORK_ID} \
--node-socket ${DEVNET_DIR}/node.socket \
--cardano-signing-key devnet/credentials/faucet.sk
}

function queryPParams() {
echo >&2 "Query Protocol parameters"
if [[ -x ${CCLI_CMD} ]]; then
ccli query protocol-parameters --socket-path ${DEVNET_DIR}/node.socket --out-file /dev/stdout \
| jq ".txFeeFixed = 0 | .txFeePerByte = 0 | .executionUnitPrices.priceMemory = 0 | .executionUnitPrices.priceSteps = 0" > devnet/protocol-parameters.json
else
docker exec demo-cardano-node-1 cardano-cli query protocol-parameters --testnet-magic ${NETWORK_ID} --socket-path ${DEVNET_DIR}/node.socket --out-file /dev/stdout \
| jq ".txFeeFixed = 0 | .txFeePerByte = 0 | .executionUnitPrices.priceMemory = 0 | .executionUnitPrices.priceSteps = 0" > devnet/protocol-parameters.json
fi
echo >&2 "Saved in protocol-parameters.json"
}

echo >&2 "Fueling up hydra nodes of alice, bob and carol..."
seedFaucet "alice" 30000000 # 30 Ada to the node
seedFaucet "bob" 30000000 # 30 Ada to the node
Expand All @@ -117,7 +85,5 @@ echo >&2 "Distributing funds to alice, bob and carol..."
seedFaucet "alice-funds" 100000000 # 100 Ada to commit
seedFaucet "bob-funds" 50000000 # 50 Ada to commit
seedFaucet "carol-funds" 25000000 # 25 Ada to commit
queryPParams
echo "HYDRA_SCRIPTS_TX_ID=$(publishReferenceScripts)" > .env
echo >&2 "Environment variable stored in '.env'"
echo >&2 -e "\n\t$(cat .env)\n"

./export-tx-id-and-pparams.sh
11 changes: 11 additions & 0 deletions hydra-cluster/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,3 +140,14 @@ The benchmark can be run in two modes corresponding to two different commands:
* `datasets`: Runs one or more preexisting _datasets_ in sequence and collect their results in a single markdown formatted file. This is useful to track the evolution of hydra-node's performance over some well-known datasets over time and produce a human-readable summary.
Check out `cabal bench --benchmark-options --help` for more details.
# Network Testing
The benchmark can be also run over the running `demo` hydra-cluster, using `cabal bench` and produces a
`results.csv` file in a work directory. Same as for benchmarks results, you can use the `bench/plot.sh` script to plot the transaction confirmation times.
To run the benchmark in this mode, the command is:
* `demo`: Runs a single _dataset_ freshly generated and collects its results in a markdown formatted file. The purpose of this setup is to facilitate a variaty of network-resiliance scenarios, such as packet loss or node failures. This is useful to prove the robustness and performance of the hydra-node's network over time and produce a human-readable summary.
For instance, we make use of this in our [CI](https://github.com/cardano-scaling/hydra/blob/master/.github/workflows/network-test.yaml) to keep track for scenarios that we care about.
Loading

0 comments on commit b03fa32

Please sign in to comment.