Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(hole-punch): add hole-punch interoperability test suite #304

Merged
merged 88 commits into from
Oct 16, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
b704bab
Make a copy pasta version of multidim interop for holepunching
thomaseizinger Sep 14, 2023
c69694e
Rename directory
thomaseizinger Sep 14, 2023
2d56a3f
Add CI testing infra
thomaseizinger Sep 14, 2023
3ea3798
Fix cache generation for hole-punch tests
thomaseizinger Sep 14, 2023
c96a615
Add `make clean` command
thomaseizinger Sep 14, 2023
3624ccd
Prune networks before starting test
thomaseizinger Sep 14, 2023
090fc23
Activate result output
thomaseizinger Sep 14, 2023
1420758
Change subnet
thomaseizinger Sep 14, 2023
a7710a3
Print ip addresses on GitHub runner
thomaseizinger Sep 14, 2023
4bbe09b
`docker compose down` should remove all networks
thomaseizinger Sep 14, 2023
da6cc3d
Let the control network be compose managed
thomaseizinger Sep 14, 2023
a6399c0
Do not run in parallel because networks overlap
thomaseizinger Sep 15, 2023
4f9b2b5
Use built-in timeout functionality
thomaseizinger Sep 15, 2023
96e4937
Always capture logs for test runs
thomaseizinger Sep 15, 2023
4e67e77
Always upload logs and results
thomaseizinger Sep 15, 2023
157affc
Make more networks compose managed by resolving routers by hostname
thomaseizinger Sep 15, 2023
43a6132
Make all networks compose managed
thomaseizinger Sep 15, 2023
8f142d7
Restore parallelism
thomaseizinger Sep 15, 2023
e4f653e
Extract startupScriptFn
thomaseizinger Sep 15, 2023
649f3fe
Add redis dependency and increase latency
thomaseizinger Sep 15, 2023
3831d3d
Add docs for running locally
thomaseizinger Sep 15, 2023
8ac871d
Wait for interfaces in router to be up and running
thomaseizinger Sep 15, 2023
ee9beac
Wait for `eth0` to be online in clients
thomaseizinger Sep 15, 2023
783baa0
FIx syntax error in shell
thomaseizinger Sep 15, 2023
449f4ed
Run things sequentially
thomaseizinger Sep 15, 2023
9892ca4
Remove printing of ip conf
thomaseizinger Sep 15, 2023
3d97309
Add a healthcheck to the router
thomaseizinger Sep 15, 2023
4f1621e
Remove healthcheck in favor of pinging router
thomaseizinger Sep 15, 2023
c9b756a
Use endpoint directly
thomaseizinger Sep 15, 2023
f76b5b8
Set delays on all network interfaces
thomaseizinger Sep 15, 2023
e084d6d
Wait for redis to be online
thomaseizinger Sep 15, 2023
d3c4a16
Wait for redis to be online
thomaseizinger Sep 19, 2023
a53f3bc
Increase timeouts and add healthchecks
thomaseizinger Sep 19, 2023
3f6aabe
Progress towards asserting RTT
thomaseizinger Sep 19, 2023
310dd89
Add back `nslookup` check
thomaseizinger Sep 19, 2023
46a9a0a
Make delays configurable and assert against RTT
thomaseizinger Sep 19, 2023
d6faacb
Rename alice -> dialer and bob -> listener
thomaseizinger Sep 19, 2023
1b8461e
Rename networks to have them in the correct order again
thomaseizinger Sep 19, 2023
a2dbc1f
Capture tcpdumps of traffic
thomaseizinger Sep 19, 2023
233176d
Use `V1Lazy` for relay
thomaseizinger Sep 20, 2023
c4fd3ea
Reorder functions to be at the bottom
thomaseizinger Sep 20, 2023
6b8fd05
Remove unused parameter
thomaseizinger Sep 20, 2023
bb33456
Improve formatting of SQL statement
thomaseizinger Sep 20, 2023
c24a9fb
Make transports type safe
thomaseizinger Sep 20, 2023
61b41ea
fixup! Improve formatting of SQL statement
thomaseizinger Sep 20, 2023
b05056a
Simplify container image ID handling
thomaseizinger Sep 20, 2023
ffad8e1
Remove no longer existing timeout override
thomaseizinger Sep 20, 2023
101647d
Simplify calling DB
thomaseizinger Sep 20, 2023
ab1ec3c
Flatten out arguments
thomaseizinger Sep 20, 2023
87bb037
Re-order arguments
thomaseizinger Sep 20, 2023
c7d84b8
Extract filtering out of `buildSpec`
thomaseizinger Sep 20, 2023
88a3b6a
Rename to `testCase`
thomaseizinger Sep 20, 2023
5824dfb
WIP set log level in binaries
thomaseizinger Sep 20, 2023
6ea9d7d
Don't fail for high RTTs
thomaseizinger Sep 25, 2023
3557966
Set default log for relay
thomaseizinger Sep 25, 2023
257514a
Update rust-libp2p version and simplify logger
thomaseizinger Sep 25, 2023
10c1b99
Bump to image with `tcpdump` installed
thomaseizinger Sep 25, 2023
7a634d4
Print stdout and stderr of failed runs
thomaseizinger Sep 25, 2023
c55f4b6
Update README
thomaseizinger Sep 25, 2023
573af81
Add some docs
thomaseizinger Sep 25, 2023
d47c957
Don't print stdout
thomaseizinger Sep 25, 2023
a7b6d98
Update to latest version
thomaseizinger Sep 25, 2023
8000949
Disable connection logs for relay
thomaseizinger Sep 25, 2023
555270b
Wait for redis to be online in relay
thomaseizinger Sep 25, 2023
7d80c28
Remove `test` target from makefile
thomaseizinger Sep 25, 2023
78fae1e
Change nslookup's for `ping`s
thomaseizinger Sep 25, 2023
8017e57
Add `ping` to relay
thomaseizinger Sep 25, 2023
a8a35cd
Bump to latest `rust-libp2p` version
thomaseizinger Sep 25, 2023
aa3dc1d
Expand docs
thomaseizinger Sep 25, 2023
1729eae
Install more recent version of docker-compose
thomaseizinger Sep 25, 2023
b19e46a
Install compose before setting up buildx
thomaseizinger Sep 25, 2023
a9eddb1
Use sudo
thomaseizinger Sep 25, 2023
aec9623
Move `docker-compose` before downloading
thomaseizinger Sep 25, 2023
623ce78
Download to $HOME instead
thomaseizinger Sep 25, 2023
b3e2c15
Use correct CPU arch
thomaseizinger Sep 25, 2023
be9119f
Go back to 16 workers
thomaseizinger Sep 25, 2023
ffd5ca2
Remove all "healthchecks"
thomaseizinger Sep 25, 2023
b21e0c1
Actually make `--dry-run` work
thomaseizinger Sep 25, 2023
9d86898
Merge branch 'master' into feat/hole-punch-tests
thomaseizinger Sep 25, 2023
364354c
Fix workflow name
thomaseizinger Sep 26, 2023
9484407
Use image IDs in docker-compose yml
thomaseizinger Sep 26, 2023
4c843bf
Add column to table
thomaseizinger Sep 26, 2023
01e5b26
Fix typo
thomaseizinger Sep 26, 2023
14e33b5
Thomas writes bad go code - Vol. 1
thomaseizinger Sep 29, 2023
c4c6241
Merge branch 'master' into feat/hole-punch-tests
thomaseizinger Oct 9, 2023
ba1beb8
Don't include go until it works
thomaseizinger Oct 9, 2023
4354ec4
Fix typo
thomaseizinger Oct 9, 2023
d4b6c58
Delete go code entirely
thomaseizinger Oct 9, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
155 changes: 155 additions & 0 deletions .github/actions/run-interop-hole-punch-test/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
name: "libp2p hole-punch interop test"
description: "Run the libp2p hole-punch interoperability test suite"
inputs:
test-filter:
description: "Filter which tests to run out of the created matrix"
required: false
default: ""
test-ignore:
description: "Exclude tests from the created matrix that include this string in their name"
required: false
default: ""
extra-versions:
description: "Space-separated paths to JSON files describing additional images"
required: false
default: ""
s3-cache-bucket:
description: "Which S3 bucket to use for container layer caching"
required: false
default: ""
s3-access-key-id:
description: "S3 Access key id for the cache"
required: false
default: ""
s3-secret-access-key:
description: "S3 secret key id for the cache"
required: false
default: ""
aws-region:
description: "Which AWS region to use"
required: false
default: "us-east-1"
worker-count:
description: "How many workers to use for the test"
required: false
default: "2"
runs:
using: "composite"
steps:
- name: Configure AWS credentials for S3 build cache
if: inputs.s3-access-key-id != '' && inputs.s3-secret-access-key != ''
run: |
echo "PUSH_CACHE=true" >> $GITHUB_ENV
shell: bash

# This depends on where this file is within this repository. This walks up
# from here to the hole-punch-interop folder
- run: |
WORK_DIR=$(realpath "$GITHUB_ACTION_PATH/../../../hole-punch-interop")
echo "WORK_DIR=$WORK_DIR" >> $GITHUB_OUTPUT
shell: bash
id: find-workdir

- uses: actions/setup-node@v3
with:
node-version: 18

# Existence of /etc/buildkit/buildkitd.toml indicates that this is a
# self-hosted runner. If so, we need to pass the config to the buildx
# action. The config enables docker.io proxy which is required to
# work around docker hub rate limiting.
- run: |
if test -f /etc/buildkit/buildkitd.toml; then
echo "config=/etc/buildkit/buildkitd.toml" >> $GITHUB_OUTPUT
fi
shell: bash
id: buildkit

- name: Install more recent docker-compose version # https://stackoverflow.com/questions/54331949/having-networking-issues-with-docker-compose
shell: bash
run: |
mkdir -p $HOME/.docker/cli-plugins
wget -q -O- https://github.com/docker/compose/releases/download/v2.21.0/docker-compose-linux-x86_64 > $HOME/.docker/cli-plugins/docker-compose
chmod +x $HOME/.docker/cli-plugins/docker-compose
docker compose version

- name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@v2
with:
config: ${{ steps.buildkit.outputs.config }}

- name: Install deps
working-directory: ${{ steps.find-workdir.outputs.WORK_DIR }}
run: npm ci
shell: bash

- name: Load cache and build
working-directory: ${{ steps.find-workdir.outputs.WORK_DIR }}
run: npm run cache -- load
shell: bash

- name: Assert Git tree is clean.
working-directory: ${{ steps.find-workdir.outputs.WORK_DIR }}
shell: bash
run: |
if [[ -n "$(git status --porcelain)" ]]; then
echo "Git tree is dirty. This means that building an impl generated something that should probably be .gitignore'd"
git status
exit 1
fi

- name: Push the image cache
if: env.PUSH_CACHE == 'true'
working-directory: ${{ steps.find-workdir.outputs.WORK_DIR }}
env:
AWS_BUCKET: ${{ inputs.s3-cache-bucket }}
AWS_REGION: ${{ inputs.aws-region }}
AWS_ACCESS_KEY_ID: ${{ inputs.s3-access-key-id }}
AWS_SECRET_ACCESS_KEY: ${{ inputs.s3-secret-access-key }}
run: npm run cache -- push
shell: bash

- name: Run the test
working-directory: ${{ steps.find-workdir.outputs.WORK_DIR }}
env:
WORKER_COUNT: ${{ inputs.worker-count }}
EXTRA_VERSION: ${{ inputs.extra-versions }}
NAME_FILTER: ${{ inputs.test-filter }}
NAME_IGNORE: ${{ inputs.test-ignore }}
run: npm run test -- --extra-version=$EXTRA_VERSION --name-filter=$NAME_FILTER --name-ignore=$NAME_IGNORE
shell: bash

- name: Print the results
working-directory: ${{ steps.find-workdir.outputs.WORK_DIR }}
run: cat results.csv
shell: bash

- name: Render results
working-directory: ${{ steps.find-workdir.outputs.WORK_DIR }}
run: npm run renderResults > ./dashboard.md
shell: bash

- name: Show Dashboard Output
working-directory: ${{ steps.find-workdir.outputs.WORK_DIR }}
run: cat ./dashboard.md >> $GITHUB_STEP_SUMMARY
shell: bash

- name: Exit with Error
working-directory: ${{ steps.find-workdir.outputs.WORK_DIR }}
run: |
if grep -q ":red_circle:" ./dashboard.md; then
exit 1
else
exit 0
fi
shell: bash

- uses: actions/upload-artifact@v3
if: ${{ always() }}
with:
name: test-plans-output
path: |
${{ steps.find-workdir.outputs.WORK_DIR }}/results.csv
${{ steps.find-workdir.outputs.WORK_DIR }}/dashboard.md
${{ steps.find-workdir.outputs.WORK_DIR }}/runs
24 changes: 24 additions & 0 deletions .github/workflows/hole-punch-interop.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
on:
workflow_dispatch:
pull_request:
paths:
- 'hole-punch-interop/**'
push:
branches:
- "master"
paths:
- 'hole-punch-interop/**'

name: libp2p holepunching interop test

jobs:
run-hole-punch-interop:
runs-on: ['self-hosted', 'linux', 'x64', '4xlarge'] # https://github.com/pl-strflt/tf-aws-gh-runner/blob/main/runners.tf
steps:
- uses: actions/checkout@v3
- uses: ./.github/actions/run-interop-hole-punch-test
with:
s3-cache-bucket: libp2p-by-tf-aws-bootstrap
s3-access-key-id: ${{ vars.S3_AWS_ACCESS_KEY_ID }}
s3-secret-access-key: ${{ secrets.S3_AWS_SECRET_ACCESS_KEY }}
worker-count: 16
7 changes: 7 additions & 0 deletions hole-punch-interop/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# For now, not committing image.json files
image.json

results.csv
runs/

node_modules/
19 changes: 19 additions & 0 deletions hole-punch-interop/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
RUST_SUBDIRS := $(wildcard impl/rust/*/.)
GO_SUBDIRS := $(wildcard impl/go/*/.)

all: rust-relay router $(RUST_SUBDIRS) $(GO_SUBDIRS)
rust-relay:
$(MAKE) -C rust-relay
router:
$(MAKE) -C router
$(RUST_SUBDIRS):
$(MAKE) -C $@
$(GO_SUBDIRS):
$(MAKE) -C $@
clean:
$(MAKE) -C rust-relay clean
$(MAKE) -C router clean
$(MAKE) -C $(RUST_SUBDIRS) clean
$(MAKE) -C $(GO_SUBDIRS) clean

.PHONY: rust-relay router all $(RUST_SUBDIRS) $(GO_SUBDIRS)
84 changes: 84 additions & 0 deletions hole-punch-interop/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Hole punch tests

## How to run locally

1. `npm install`
2. `make`
3. `npm run test`

## Client configuration

| env variable | possible values |
|--------------|-----------------|
| MODE | listen \| dial |
| TRANSPORT | tcp \| quic |

- For TCP, the client MUST use noise + yamux to upgrade the connection.
- The relayed connection MUST use noise + yamux.

## Test flow

1. The relay starts and pushes its address to the following redis keys:
- `RELAY_TCP_ADDRESS` for the TCP test
- `RELAY_QUIC_ADDRESS` for the QUIC test
1. Upon start-up, clients connect to a redis server at `redis:6379` and block until this redis key comes available.
They then dial the relay on the provided address.
1. The relay supports identify.
Implementations SHOULD use that to figure out their external address next.
1. Once connected to the relay, a client in `MODE=listen` should listen on the relay and make a reservation.
Once the reservation is made, it pushes its `PeerId` to the redis key `LISTEN_CLIENT_PEER_ID`.
1. A client in `MODE=dial` blocks on the availability of `LISTEN_CLIENT_PEER_ID`.
Once available, it dials `<relay_addr>/p2p-circuit/<listen-client-peer-id>`.
1. Upon a successful hole-punch, the peer in `MODE=dial` measures the RTT across the newly established connection.
1. The RTT MUST be printed to stdout in the following format:
```json
{ "rtt_to_holepunched_peer_millis": 12 }
```
1. Once printed, the dialer MUST exit with `0`.

## Requirements for implementations

- Docker containers MUST have a binary called `hole-punch-client` in their $PATH
- MUST have `dig`, `curl`, `jq` and `tcpdump` installed
- Listener MUST NOT early-exit but wait to be killed by test runner
- Logs MUST go to stderr, RTT json MUST go to stdout
- Dialer and lister both MUST use 0RTT negotiation for protocols
- Implementations SHOULD disable timeouts on the redis client, i.e. use `0`
- Implementations SHOULD exit early with a non-zero exit code if anything goes wrong
- Implementations MUST set `TCP_NODELAY` for the TCP transport
- Implements MUST make sure connections are being kept alive

## Design notes

The design of this test runner is heavily influenced by [multidim-interop](../multidim-interop) but differs in several ways.

All files related to test runs will be written to the [./runs](./runs) directory.
This includes the `docker-compose.yml` files of each individual run as well as logs and `tcpdump`'s for the dialer and listener.

The docker-compose file uses 6 containers in total:

- 1 redis container for orchestrating the test
- 1 [relay](./rust-relay)
- 1 hole-punch client in `MODE=dial`
- 1 hole-punch client in `MODE=listen`
- 2 [routers](./router): 1 per client

The networks are allocated by docker-compose.
We dynamically fetch the IPs and subnets as part of a startup script to set the correct IP routes.

In total, we have three networks:

1. `lan_dialer`
2. `lan_listener`
3. `internet`

The two LANs host a router and a client each whereas the relay is connected (without a router) to the `internet` network.
On startup of the clients, we add an `ip route` that redirects all traffic to the corresponding `router` container.
The router container masquerades all traffic upon forwarding, see the [README](./router/README.md) for details.

## Running a single test

1. Build all containers using `make`
1. Generate all test definitions using `npm run test -- --no-run`
1. Pick the desired test from the [runs](./runs) directory
1. Execute it using `docker compose up`
Loading