-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set up the observability stack in cargo-make #674
Conversation
Amazing work! Does it have to send logs from local tests to the cloud? Can it not run as a local docker container? I used to run grafana and prometheus locally, as part of the testnet. |
fendermint/app/src/main.rs
Outdated
.init(); | ||
} | ||
None => { | ||
tracing_subscriber::fmt() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some comments on what the expectations are? If there is a log directory, is it not going to log anything to the console? What will it show in docker logs
for example if we use the volumes?
Can it be set up to log with INFO
to stdout
and with DEBUG
or TRACE
to the log files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, when log directory is set, then traces all go to the logfile. What gets printed to STDOUT will be anything like println! or if a crate uses normal logger.
Can it be set up to log with INFO to stdout and with DEBUG or TRACE to the log files?
Something like that should be possible, what would be the usecase for that though? I would think that viewing the debug/trace logs we would want to show alongside the info+ logs as well. Or do you mean log INFO to both stdout and log files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Djeesus, I am having doubts of the tracing
crate ecosystem, its not possible to have rotational logs with maximum filesize (but they have almost a year old PR which hasn't been merged).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally I'm looking for this kind of setup:
- in production we log to the console with an INFO level and to the files with a DEBUG levels, configurable individually: https://github.com/aakoshh/metronome/blob/develop/metronome/examples/resources/logback.xml
- in testing we log to the console: https://github.com/aakoshh/metronome/blob/develop/metronome/checkpointing/interpreter/specs/resources/logback-test.xml
Logging to the console has the benefit that it's easily available: you start the thing and don't have to dig around files to see something happening, but you can go to the files and the promtail Web UI for more details; you can also use docker logs
to see some high level things.
@@ -84,11 +85,14 @@ COMETBFT_SUBDIR = "cometbft" | |||
CMT_CONTAINER_NAME = "${NODE_NAME}-cometbft" | |||
FM_CONTAINER_NAME = "${NODE_NAME}-fendermint" | |||
ETHAPI_CONTAINER_NAME = "${NODE_NAME}-ethapi" | |||
PROMTAIL_CONTAINER_NAME="${NODE_NAME}-promtail" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And in the testnet
for example there will be a different promtail for each container started on the local machine. Should there be only one for the testnet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great but I'm slightly confused about how it works with multiple containers, log rotation, offline debugging, why we need the cloud, how we can protect the keys and still allow people to log locally, whether anything can appear both in the docker logs and the grafana stuff.
c1ef5aa
to
c24b41d
Compare
|
8ff87e8
to
d248007
Compare
This commit implements the following: - Updates the fendermint app to use tracing-appender for saving tracing logs to file. - Removing the existing trace4rs crate since we will be separating the traces from stdout/stderr - Updates our infra cargo-make scripts to deploy promtail docker container - The promtail container is setup to collect all stdout logs from other running containers and also to collect trace log files from the fendermint container separately - Promtail is auto-configured to include labels in the traces for easier filtering (hostname, user, runid) Closes: ENG-660
Closes: ENG-660
This PR implements the following:
tracing-appender
for saving logs to disk using rotational logstrace4rs
crate since we will be separating the traces from stdout/stderr.infra/promtail/promtail-config.yaml
where we scrape logs from docker containers by listening to the host docker socket which we share with the promtail container. Also, in order for promtail container to have access to trace logs on the fendermint container, I added tasks to create/destroy docker volumes which is shared between these two containers on the host machine. I could have used docker's--volumes-from
option, but decided against it since it copies the files (instead of sharing them like with volumes)Note: When I created the grafana cloud organization it seems to have used my name fridrik.grafana.net and for some reason I can't easily change it. We should just register a new one, but that can be done separately from this PR.
Note: You need to set the PROMTAIL_CLIENT_URL env var if you want to run this locally. We plan to setup a local grafana/loki stack in ENG-728
Test plan
Make sure the fendermint docker container is up2date:
Run smoke test:
Tip: When running cargo make you should see an output which displays the promtail runid (here above its 1707313838). This is the unix time when promtail instance was started, and it can be used to filter logs based on the test you want to debug.
After running cargo make, then promtail container should have collected all the logs and traces from the other containers and sent them to our grafana loki instance, as defined in
infra/promtail/promtail-config.yaml
client ->url setting.Go to our grafana cloud instance (you may need to be added as a user) and you should be able to view logs based on this cargo run instance:
You can also filter out logs by individual log container stdout. The following screenshot shows the stdout of ethapi container of the same cargo run instance (notice that I filter by promtailrunid to only show logs generated by this instance of running cargo make):
You can also do a lot more filtering, see Loki documentation