Set up the observability stack in cargo-make #674

fridrik01 · 2024-02-07T13:35:03Z

Closes: ENG-660

This PR implements the following:

Updates the fendermint app to use tracing-appender for saving logs to disk using rotational logs
Removing the existing trace4rs crate since we will be separating the traces from stdout/stderr.
Updates our infra cargo-make scripts to deploy promtail docker container.
The promtail container sole role here is to collect all stdout logs from other running containers and to collect trace log files from the fendermint container separately. This is configured in infra/promtail/promtail-config.yaml where we scrape logs from docker containers by listening to the host docker socket which we share with the promtail container. Also, in order for promtail container to have access to trace logs on the fendermint container, I added tasks to create/destroy docker volumes which is shared between these two containers on the host machine. I could have used docker's --volumes-from option, but decided against it since it copies the files (instead of sharing them like with volumes)
Promtail is auto-configured to include labels in the traces for easier filtering (hostname, user, runid)
CI pipeline uses PROMTAIL_CLIENT_URL secret to specify where promtail should upload its logs (currently set at fridrik.grafana.net)

Note: When I created the grafana cloud organization it seems to have used my name fridrik.grafana.net and for some reason I can't easily change it. We should just register a new one, but that can be done separately from this PR.

Note: You need to set the PROMTAIL_CLIENT_URL env var if you want to run this locally. We plan to setup a local grafana/loki stack in ENG-728

Test plan

Make sure the fendermint docker container is up2date:

cd fendermint/
make docker-build

Run smoke test:

cd fendermint/testing/smoke-test
cargo make
..
[cargo-make][1] INFO - Running Task: cometbft-wait
[cargo-make][1] INFO - Running Task: promtail-start
Starting promtail with promtailrunid: 1707313838
...
fendermint-logs
[cargo-make] INFO - Build Done in 47.09 seconds.

Tip: When running cargo make you should see an output which displays the promtail runid (here above its 1707313838). This is the unix time when promtail instance was started, and it can be used to filter logs based on the test you want to debug.

After running cargo make, then promtail container should have collected all the logs and traces from the other containers and sent them to our grafana loki instance, as defined in infra/promtail/promtail-config.yaml client ->url setting.

Go to our grafana cloud instance (you may need to be added as a user) and you should be able to view logs based on this cargo run instance:

You can also filter out logs by individual log container stdout. The following screenshot shows the stdout of ethapi container of the same cargo run instance (notice that I filter by promtailrunid to only show logs generated by this instance of running cargo make):

You can also do a lot more filtering, see Loki documentation

linear · 2024-02-07T13:35:06Z

ENG-660 Set up the observability stack in cargo-make

fendermint/app/src/cmd/run.rs

aakoshh · 2024-02-09T10:18:21Z

Amazing work!

Does it have to send logs from local tests to the cloud? Can it not run as a local docker container? I used to run grafana and prometheus locally, as part of the testnet.

fendermint/app/src/main.rs

aakoshh · 2024-02-09T10:23:00Z

fendermint/app/src/main.rs

+                .init();
+        }
+        None => {
+            tracing_subscriber::fmt()


Can you add some comments on what the expectations are? If there is a log directory, is it not going to log anything to the console? What will it show in docker logs for example if we use the volumes?

Can it be set up to log with INFO to stdout and with DEBUG or TRACE to the log files?

Currently, when log directory is set, then traces all go to the logfile. What gets printed to STDOUT will be anything like println! or if a crate uses normal logger.

Can it be set up to log with INFO to stdout and with DEBUG or TRACE to the log files?

Something like that should be possible, what would be the usecase for that though? I would think that viewing the debug/trace logs we would want to show alongside the info+ logs as well. Or do you mean log INFO to both stdout and log files?

Djeesus, I am having doubts of the tracing crate ecosystem, its not possible to have rotational logs with maximum filesize (but they have almost a year old PR which hasn't been merged).

Ideally I'm looking for this kind of setup:

in production we log to the console with an INFO level and to the files with a DEBUG levels, configurable individually: https://github.com/aakoshh/metronome/blob/develop/metronome/examples/resources/logback.xml

in testing we log to the console: https://github.com/aakoshh/metronome/blob/develop/metronome/checkpointing/interpreter/specs/resources/logback-test.xml

Logging to the console has the benefit that it's easily available: you start the thing and don't have to dig around files to see something happening, but you can go to the files and the promtail Web UI for more details; you can also use docker logs to see some high level things.

infra/promtail/promtail-config.yaml

infra/fendermint/scripts/promtail.toml

fendermint/testing/scripts/common.env

aakoshh · 2024-02-09T10:37:25Z

infra/fendermint/Makefile.toml

@@ -84,11 +85,14 @@ COMETBFT_SUBDIR = "cometbft"
 CMT_CONTAINER_NAME = "${NODE_NAME}-cometbft"
 FM_CONTAINER_NAME = "${NODE_NAME}-fendermint"
 ETHAPI_CONTAINER_NAME = "${NODE_NAME}-ethapi"
+PROMTAIL_CONTAINER_NAME="${NODE_NAME}-promtail"


And in the testnet for example there will be a different promtail for each container started on the local machine. Should there be only one for the testnet?

infra/promtail/promtail-config.yaml

aakoshh

This looks great but I'm slightly confused about how it works with multiple containers, log rotation, offline debugging, why we need the cloud, how we can protect the keys and still allow people to log locally, whether anything can appear both in the docker logs and the grafana stuff.

fridrik01 · 2024-02-11T19:48:07Z

~~Moving to draft while I add in local grafana/loki stack.~~ We will do that in a separate PR

This commit implements the following: - Updates the fendermint app to use tracing-appender for saving tracing logs to file. - Removing the existing trace4rs crate since we will be separating the traces from stdout/stderr - Updates our infra cargo-make scripts to deploy promtail docker container - The promtail container is setup to collect all stdout logs from other running containers and also to collect trace log files from the fendermint container separately - Promtail is auto-configured to include labels in the traces for easier filtering (hostname, user, runid) Closes: ENG-660

fridrik01 marked this pull request as ready for review February 7, 2024 14:15

fridrik01 changed the title ~~[WIP] Set up the observability stack in cargo-make~~ Set up the observability stack in cargo-make Feb 7, 2024

fridrik01 requested a review from a team February 7, 2024 15:17

raulk requested review from aakoshh, cryptoAtwill and mb1896 February 7, 2024 19:30

maciejwitowski reviewed Feb 7, 2024

View reviewed changes

fendermint/app/src/cmd/run.rs Outdated Show resolved Hide resolved