Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up the observability stack in cargo-make #674

Merged
merged 4 commits into from
Feb 16, 2024
Merged

Set up the observability stack in cargo-make #674

merged 4 commits into from
Feb 16, 2024

Conversation

fridrik01
Copy link
Contributor

@fridrik01 fridrik01 commented Feb 7, 2024

Closes: ENG-660

This PR implements the following:

  • Updates the fendermint app to use tracing-appender for saving logs to disk using rotational logs
  • Removing the existing trace4rs crate since we will be separating the traces from stdout/stderr.
  • Updates our infra cargo-make scripts to deploy promtail docker container.
  • The promtail container sole role here is to collect all stdout logs from other running containers and to collect trace log files from the fendermint container separately. This is configured in infra/promtail/promtail-config.yaml where we scrape logs from docker containers by listening to the host docker socket which we share with the promtail container. Also, in order for promtail container to have access to trace logs on the fendermint container, I added tasks to create/destroy docker volumes which is shared between these two containers on the host machine. I could have used docker's --volumes-from option, but decided against it since it copies the files (instead of sharing them like with volumes)
  • Promtail is auto-configured to include labels in the traces for easier filtering (hostname, user, runid)
  • CI pipeline uses PROMTAIL_CLIENT_URL secret to specify where promtail should upload its logs (currently set at fridrik.grafana.net)

Note: When I created the grafana cloud organization it seems to have used my name fridrik.grafana.net and for some reason I can't easily change it. We should just register a new one, but that can be done separately from this PR.

Note: You need to set the PROMTAIL_CLIENT_URL env var if you want to run this locally. We plan to setup a local grafana/loki stack in ENG-728

Test plan

Make sure the fendermint docker container is up2date:

cd fendermint/
make docker-build

Run smoke test:

cd fendermint/testing/smoke-test
cargo make
..
[cargo-make][1] INFO - Running Task: cometbft-wait
[cargo-make][1] INFO - Running Task: promtail-start
Starting promtail with promtailrunid: 1707313838
...
fendermint-logs
[cargo-make] INFO - Build Done in 47.09 seconds.

Tip: When running cargo make you should see an output which displays the promtail runid (here above its 1707313838). This is the unix time when promtail instance was started, and it can be used to filter logs based on the test you want to debug.

After running cargo make, then promtail container should have collected all the logs and traces from the other containers and sent them to our grafana loki instance, as defined in infra/promtail/promtail-config.yaml client ->url setting.

Go to our grafana cloud instance (you may need to be added as a user) and you should be able to view logs based on this cargo run instance:

image

You can also filter out logs by individual log container stdout. The following screenshot shows the stdout of ethapi container of the same cargo run instance (notice that I filter by promtailrunid to only show logs generated by this instance of running cargo make):

image

You can also do a lot more filtering, see Loki documentation

Copy link

linear bot commented Feb 7, 2024

@fridrik01 fridrik01 marked this pull request as ready for review February 7, 2024 14:15
@fridrik01 fridrik01 changed the title [WIP] Set up the observability stack in cargo-make Set up the observability stack in cargo-make Feb 7, 2024
@fridrik01 fridrik01 requested a review from a team February 7, 2024 15:17
@aakoshh
Copy link
Contributor

aakoshh commented Feb 9, 2024

Amazing work!

Does it have to send logs from local tests to the cloud? Can it not run as a local docker container? I used to run grafana and prometheus locally, as part of the testnet.

fendermint/app/src/main.rs Outdated Show resolved Hide resolved
.init();
}
None => {
tracing_subscriber::fmt()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some comments on what the expectations are? If there is a log directory, is it not going to log anything to the console? What will it show in docker logs for example if we use the volumes?

Can it be set up to log with INFO to stdout and with DEBUG or TRACE to the log files?

Copy link
Contributor Author

@fridrik01 fridrik01 Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, when log directory is set, then traces all go to the logfile. What gets printed to STDOUT will be anything like println! or if a crate uses normal logger.

Can it be set up to log with INFO to stdout and with DEBUG or TRACE to the log files?

Something like that should be possible, what would be the usecase for that though? I would think that viewing the debug/trace logs we would want to show alongside the info+ logs as well. Or do you mean log INFO to both stdout and log files?

Copy link
Contributor Author

@fridrik01 fridrik01 Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Djeesus, I am having doubts of the tracing crate ecosystem, its not possible to have rotational logs with maximum filesize (but they have almost a year old PR which hasn't been merged).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally I'm looking for this kind of setup:

Logging to the console has the benefit that it's easily available: you start the thing and don't have to dig around files to see something happening, but you can go to the files and the promtail Web UI for more details; you can also use docker logs to see some high level things.

@@ -84,11 +85,14 @@ COMETBFT_SUBDIR = "cometbft"
CMT_CONTAINER_NAME = "${NODE_NAME}-cometbft"
FM_CONTAINER_NAME = "${NODE_NAME}-fendermint"
ETHAPI_CONTAINER_NAME = "${NODE_NAME}-ethapi"
PROMTAIL_CONTAINER_NAME="${NODE_NAME}-promtail"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in the testnet for example there will be a different promtail for each container started on the local machine. Should there be only one for the testnet?

Copy link
Contributor

@aakoshh aakoshh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great but I'm slightly confused about how it works with multiple containers, log rotation, offline debugging, why we need the cloud, how we can protect the keys and still allow people to log locally, whether anything can appear both in the docker logs and the grafana stuff.

@fridrik01 fridrik01 force-pushed the tracing branch 11 times, most recently from c1ef5aa to c24b41d Compare February 11, 2024 18:46
@fridrik01 fridrik01 marked this pull request as draft February 11, 2024 19:47
@fridrik01
Copy link
Contributor Author

fridrik01 commented Feb 11, 2024

Moving to draft while I add in local grafana/loki stack. We will do that in a separate PR

@fridrik01 fridrik01 marked this pull request as ready for review February 11, 2024 20:11
@fridrik01 fridrik01 marked this pull request as draft February 11, 2024 22:39
@fridrik01 fridrik01 marked this pull request as ready for review February 12, 2024 18:32
@fridrik01 fridrik01 force-pushed the tracing branch 2 times, most recently from 8ff87e8 to d248007 Compare February 14, 2024 15:58
This commit implements the following:
- Updates the fendermint app to use tracing-appender for saving
tracing logs to file.
- Removing the existing trace4rs crate since we will be separating
the traces from stdout/stderr
- Updates our infra cargo-make scripts to deploy promtail docker
container
- The promtail container is setup to collect all stdout logs from
other running containers and also to collect trace log files from
the fendermint container separately
- Promtail is auto-configured to include labels in the traces
for easier filtering (hostname, user, runid)

Closes: ENG-660
@fridrik01 fridrik01 merged commit d5eec43 into main Feb 16, 2024
19 of 20 checks passed
@fridrik01 fridrik01 deleted the tracing branch February 16, 2024 08:44
@fridrik01 fridrik01 mentioned this pull request Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants