Skip to content

Commit

Permalink
Merge pull request filecoin-project#4727 from filecoin-project/nonsen…
Browse files Browse the repository at this point in the history
…se/integrate-testplans-to-lotus

move testground/lotus-soup testplan from oni to lotus
  • Loading branch information
magik6k authored and bibibong committed Jan 7, 2021
1 parent d112e46 commit fac31e3
Show file tree
Hide file tree
Showing 90 changed files with 17,233 additions and 11 deletions.
36 changes: 29 additions & 7 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -264,22 +264,44 @@ jobs:
path: /tmp/test-artifacts/conformance-coverage.html
build-epik-soup:
description: |
Compile `epik-soup` Testground test plan using the current version of Epik.
Compile `epik-soup` Testground test plan
parameters:
<<: *test-params
executor: << parameters.executor >>
steps:
- install-deps
- prepare
- run: cd extern/oni && git submodule sync
- run: cd extern/oni && git submodule update --init
- run: cd extern/filecoin-ffi && make
- run:
name: "replace epik, filecoin-ffi, blst and fil-blst deps"
command: cd extern/oni/epik-soup && go mod edit -replace github.com/EpiK-Protocol/go-epik=../../../ && go mod edit -replace github.com/filecoin-project/filecoin-ffi=../../filecoin-ffi && go mod edit -replace github.com/supranational/blst=../../blst
- run:
name: "build epik-soup testplan"
command: pushd extern/oni/epik-soup && go build -tags=testground .
command: pushd testplans/epik-soup && go build -tags=testground .
trigger-testplans:
description: |
Trigger `epik-soup` test cases on TaaS
parameters:
<<: *test-params
executor: << parameters.executor >>
steps:
- install-deps
- prepare
- run:
name: "download testground"
command: wget https://gist.github.com/nonsense/5fbf3167cac79945f658771aed32fc44/raw/2e17eb0debf7ec6bdf027c1bdafc2c92dd97273b/testground-d3e9603 -O ~/testground-cli && chmod +x ~/testground-cli
- run:
name: "prepare .env.toml"
command: pushd testplans/epik-soup && mkdir -p $HOME/testground && cp env-ci.toml $HOME/testground/.env.toml && echo 'endpoint="https://ci.testground.ipfs.team"' >> $HOME/testground/.env.toml && echo 'user="circleci"' >> $HOME/testground/.env.toml
- run:
name: "prepare testground home dir"
command: mkdir -p $HOME/testground/plans && mv testplans/epik-soup testplans/graphsync $HOME/testground/plans/
- run:
name: "trigger deals baseline testplan on taas"
command: ~/testground-cli run composition -f $HOME/testground/plans/epik-soup/_compositions/baseline-k8s-3-1.toml --metadata-commit=$CIRCLE_SHA1 --metadata-repo=filecoin-project/epik --metadata-branch=$CIRCLE_BRANCH
- run:
name: "trigger payment channel stress testplan on taas"
command: ~/testground-cli run composition -f $HOME/testground/plans/epik-soup/_compositions/paych-stress-k8s.toml --metadata-commit=$CIRCLE_SHA1 --metadata-repo=filecoin-project/epik --metadata-branch=$CIRCLE_BRANCH
- run:
name: "trigger graphsync testplan on taas"
command: ~/testground-cli run composition -f $HOME/testground/plans/graphsync/_compositions/stress-k8s.toml --metadata-commit=$CIRCLE_SHA1 --metadata-repo=filecoin-project/epik --metadata-branch=$CIRCLE_BRANCH


build-macos:
Expand Down
3 changes: 0 additions & 3 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,6 @@
[submodule "extern/test-vectors"]
path = extern/test-vectors
url = https://github.com/filecoin-project/test-vectors.git
[submodule "extern/oni"]
path = extern/oni
url = https://github.com/filecoin-project/oni.git
[submodule "extern/blst"]
path = extern/blst
url = https://github.com/supranational/blst.git
1 change: 0 additions & 1 deletion extern/oni
Submodule oni deleted from 10ed9e
193 changes: 193 additions & 0 deletions testplans/DELVING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
# Delving into the unknown

This write-up summarises how to debug what appears to be a mischievous Lotus
instance during our Testground tests. It also goes enumerates which assets are
useful to report suspicious behaviours upstream, in a way that they are
actionable.

## Querying the Lotus RPC API

The `local:docker` and `cluster:k8s` map ports that you specify in the
composition.toml, so you can access them externally.

All our compositions should carry this fragment:

```toml
[global.run_config]
exposed_ports = { pprof = "6060", node_rpc = "1234", miner_rpc = "2345" }
```

This tells Testground to expose the following ports:

* `6060` => Go pprof.
* `1234` => Lotus full node RPC.
* `2345` => Lotus storage miner RPC.

### `local:docker`

1. Install the `lotus` binary on your host.
2. Find the container that you want to connect to in `docker ps`.
* Note that our _container names_ are slightly long, and they're the last
field on every line, so if your terminal is wrapping text, the port
numbers will end up ABOVE the friendly/recognizable container name (e.g. `tg-lotus-soup-deals-e2e-acfc60bc1727-miners-1`).
* The testground output displays the _container ID_ inside coloured angle
brackets, so if you spot something spurious in a particular node, you can
hone in on that one, e.g. `<< 54dd5ad916b2 >>`.

```
⟩ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
54dd5ad916b2 be3c18d7f0d4 "/testplan" 10 seconds ago Up 8 seconds 0.0.0.0:32788->1234/tcp, 0.0.0.0:32783->2345/tcp, 0.0.0.0:32773->6060/tcp, 0.0.0.0:32777->6060/tcp tg-lotus-soup-deals-e2e-acfc60bc1727-clients-2
53757489ce71 be3c18d7f0d4 "/testplan" 10 seconds ago Up 8 seconds 0.0.0.0:32792->1234/tcp, 0.0.0.0:32790->2345/tcp, 0.0.0.0:32781->6060/tcp, 0.0.0.0:32786->6060/tcp tg-lotus-soup-deals-e2e-acfc60bc1727-clients-1
9d3e83b71087 be3c18d7f0d4 "/testplan" 10 seconds ago Up 8 seconds 0.0.0.0:32791->1234/tcp, 0.0.0.0:32789->2345/tcp, 0.0.0.0:32779->6060/tcp, 0.0.0.0:32784->6060/tcp tg-lotus-soup-deals-e2e-acfc60bc1727-clients-0
7bd60e75ed0e be3c18d7f0d4 "/testplan" 10 seconds ago Up 8 seconds 0.0.0.0:32787->1234/tcp, 0.0.0.0:32782->2345/tcp, 0.0.0.0:32772->6060/tcp, 0.0.0.0:32776->6060/tcp tg-lotus-soup-deals-e2e-acfc60bc1727-miners-1
dff229d7b342 be3c18d7f0d4 "/testplan" 10 seconds ago Up 9 seconds 0.0.0.0:32778->1234/tcp, 0.0.0.0:32774->2345/tcp, 0.0.0.0:32769->6060/tcp, 0.0.0.0:32770->6060/tcp tg-lotus-soup-deals-e2e-acfc60bc1727-miners-0
4cd67690e3b8 be3c18d7f0d4 "/testplan" 11 seconds ago Up 8 seconds 0.0.0.0:32785->1234/tcp, 0.0.0.0:32780->2345/tcp, 0.0.0.0:32771->6060/tcp, 0.0.0.0:32775->6060/tcp tg-lotus-soup-deals-e2e-acfc60bc1727-bootstrapper-0
aeb334adf88d iptestground/sidecar:edge "testground sidecar …" 43 hours ago Up About an hour 0.0.0.0:32768->6060/tcp testground-sidecar
c1157500282b influxdb:1.8 "/entrypoint.sh infl…" 43 hours ago Up 25 seconds 0.0.0.0:8086->8086/tcp testground-influxdb
99ca4c07fecc redis "docker-entrypoint.s…" 43 hours ago Up About an hour 0.0.0.0:6379->6379/tcp testground-redis
bf25c87488a5 bitnami/grafana "/run.sh" 43 hours ago Up 26 seconds 0.0.0.0:3000->3000/tcp testground-grafana
cd1d6383eff7 goproxy/goproxy "/goproxy" 45 hours ago Up About a minute 8081/tcp testground-goproxy
```

3. Take note of the port mapping. Imagine in the output above, we want to query
`54dd5ad916b2`. We'd use `localhost:32788`, as it forwards to the container's
1234 port (Lotus Full Node RPC).
4. Run your Lotus CLI command setting the `FULLNODE_API_INFO` env variable,
which is a multiaddr:

```sh
$ FULLNODE_API_INFO=":/ip4/127.0.0.1/tcp/$port/http" lotus chain list
[...]
```

---

Alternatively, you could download gawk and setup a script in you .bashrc or .zshrc similar to:

```
lprt() {
NAME=$1
PORT=$2

docker ps --format "table {{.Names}}" | grep $NAME | xargs -I {} docker port {} $PORT | gawk --field-separator=":" '{print $2}'
}

envs() {
NAME=$1

local REMOTE_PORT_1234=$(lprt $NAME 1234)
local REMOTE_PORT_2345=$(lprt $NAME 2345)

export FULLNODE_API_INFO=":/ip4/127.0.0.1/tcp/$REMOTE_PORT_1234/http"
export STORAGE_API_INFO=":/ip4/127.0.0.1/tcp/$REMOTE_PORT_2345/http"

echo "Setting \$FULLNODE_API_INFO to $FULLNODE_API_INFO"
echo "Setting \$STORAGE_API_INFO to $STORAGE_API_INFO"
}
```
Then call commands like:
```
envs miners-0
lotus chain list
```
### `cluster:k8s`
Similar to `local:docker`, you pick a pod that you want to connect to and port-forward 1234 and 2345 to that specific pod, such as:
```
export PODNAME="tg-lotus-soup-ae620dfb2e19-miners-0"
kubectl port-forward pods/$PODNAME 1234:1234 2345:2345

export FULLNODE_API_INFO=":/ip4/127.0.0.1/tcp/1234/http"
export STORAGE_API_INFO=":/ip4/127.0.0.1/tcp/2345/http"
lotus-storage-miner storage-deals list
lotus-storage-miner storage-deals get-ask
```
### Useful commands / checks
* **Making sure miners are on the same chain:** compare outputs of `lotus chain list`.
* **Checking deals:** `lotus client list-deals`.
* **Sector queries:** `lotus-storage-miner info` , `lotus-storage-miner proving info`
* **Sector sealing errors:**
* `STORAGE_API_INFO=":/ip4/127.0.0.1/tcp/53624/http" FULLNODE_API_INFO=":/ip4/127.0.0.1/tcp/53623/http" lotus-storage-miner sector info`
* `STORAGE_API_INFO=":/ip4/127.0.0.1/tcp/53624/http" FULLNODE_API_INFO=":/ip4/127.0.0.1/tcp/53623/http" lotus-storage-miner sector status <sector_no>`
* `STORAGE_API_INFO=":/ip4/127.0.0.1/tcp/53624/http" FULLNODE_API_INFO=":/ip4/127.0.0.1/tcp/53623/http" lotus-storage-miner sector status --log <sector_no>`
## Viewing logs of a particular container `local:docker`
This works for both started and stopped containers. Just get the container ID
(in double angle brackets in Testground output, on every log line), and do a:
```shell script
$ docker logs $container_id
```

## Accessing the golang instrumentation

Testground exposes a pprof endpoint under local port 6060, which both
`local:docker` and `cluster:k8s` map.

For `local:docker`, see above to figure out which host port maps to the
container's 6060 port.

## Acquiring a goroutine dump

When things appear to be stuck, get a goroutine dump.

```shell script
$ wget -o goroutine.out http://localhost:${pprof_port}/debug/pprof/goroutine?debug=2
```

You can use whyrusleeping/stackparse to extract a summary:

```shell script
$ go get https://github.com/whyrusleeping/stackparse
$ stackparse --summary goroutine.out
```

## Acquiring a CPU profile

When the CPU appears to be spiking/rallying, grab a CPU profile.

```shell script
$ wget -o profile.out http://localhost:${pprof_port}/debug/pprof/profile
```

Analyse it using `go tool pprof`. Usually, generating a `png` graph is useful:

```shell script
$ go tool pprof profile.out
File: testground
Type: cpu
Time: Jul 3, 2020 at 12:00am (WEST)
Duration: 30.07s, Total samples = 2.81s ( 9.34%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) png
Generating report in profile003.png
```

## Submitting actionable reports / findings

This is useful both internally (within the Oni team, so that peers can help) and
externally (when submitting a finding upstream).

We don't need to play the full bug-hunting game on Lotus, but it's tremendously
useful to provide the necessary data so that any reports are actionable.

These include:

* test outputs (use `testground collect`).
* stack traces that appear in logs (whether panics or not).
* output of relevant Lotus CLI commands.
* if this is some kind of blockage / deadlock, goroutine dumps.
* if this is a CPU hotspot, a CPU profile would be useful.
* if this is a memory issue, a heap dump would be useful.

**When submitting bugs upstream (Lotus), make sure to indicate:**

* Lotus commit.
* FFI commit.
20 changes: 20 additions & 0 deletions testplans/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
SHELL = /bin/bash

.DEFAULT_GOAL := download-proofs

download-proofs:
go run github.com/filecoin-project/go-paramfetch/paramfetch 2048 ./docker-images/proof-parameters.json

build-images:
docker build -t "iptestground/oni-buildbase:v11-lotus" -f "docker-images/Dockerfile.oni-buildbase" "docker-images"
docker build -t "iptestground/oni-runtime:v5" -f "docker-images/Dockerfile.oni-runtime" "docker-images"

push-images:
docker push iptestground/oni-buildbase:v11-lotus
docker push iptestground/oni-runtime:v5

pull-images:
docker pull iptestground/oni-buildbase:v11-lotus
docker pull iptestground/oni-runtime:v5

.PHONY: download-proofs build-images push-images pull-images
Loading

0 comments on commit fac31e3

Please sign in to comment.