-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: audit journal of commands and trace of actor messages #3325
base: main
Are you sure you want to change the base?
feat: audit journal of commands and trace of actor messages #3325
Conversation
@@ -181,6 +181,7 @@ tokio-util = { version = "0.7", features = ["codec"] } | |||
toml = "0.8" | |||
tower = "0.4" | |||
tracing = { version = "0.1", features = ["attributes", "log"] } | |||
tracing-appender = "0.2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use for the rolling-file feature.
}; | ||
|
||
// Audit journal | ||
let audit_appender = tracing_appender::rolling::hourly("/tmp", "tedge.audit.log"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will have to be configurable ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to log what change the device state and thin-edge behavior.
2025-01-10T17:14:17.945362368Z INFO Audit: tedge-agent started
2025-01-10T17:14:18.011907868Z INFO Audit: Execute software_list command, log = /var/log/tedge/agent/workflow-software_list-c8y-mapper-2025-01-10T17:44:04.806507397Z.log
2025-01-10T17:14:18.038213949Z INFO Audit: Executed software_list command
2025-01-10T17:20:36.250039963Z INFO Audit: tedge config unset aws.root_cert_path
-
tedge config set
commands -
tedge connect/disconnect
commands -
tedge cert create/request/renew
commands - Workflow command (init / success / failure)
- Tedge config file updates
- Tedge daemon start/stop
- What else?
- What is the correct level of details?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about changes to c8y supported operations? Would show when supported commands are added/deleted and when the set is reloaded. Recently when live-debugging with a user we were confused why c8y_Command
was not present, so it would be ideal to be able to go into the log audit to see why it was removed!
.with_filter(filter_fn(|metadata| metadata.target() == "Audit")); | ||
|
||
// Actor traces | ||
let trace_appender = tracing_appender::rolling::hourly("/tmp", "tedge.actors.log"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will have to be configurable ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is observe & debug actors and ideally to test actors.
- Distinguish source messages (MQTT / inotify / service start / ^C) from reactions
- Add trace identifiers to follow the consequence of an input messages
- Do we have to interleave actions on the system here (HTTP requests? File Updates? System commands?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Distinguish source messages (MQTT / inotify / service start / ^C) from reactions
I think ideally we should distinguish between what are strictly inputs/outputs from internal actor messages. To that effect, we do already have MQTT_sub
, MQTT_pub
and MQTT_recv
targets.
Codecov ReportAttention: Patch coverage is Additional details and impacted files📢 Thoughts on this report? Let us know! |
Robot Results
Failed Tests
|
All the system tests are failing for the same reason: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely need to improve and formalise our logging a little more, so I like the idea of separate journals for audited operations and actor messages. Another possible category would be also IO (network/disk or both).
For tracking the causes of events, we could definitely make better use of spans. I have one branch where I had to instrument a bit of code to track a bug, so I will also create a PR with that once I cleaned it up a bit.
I reckon there will be more changes, so will review again when the PR is ready.
71aae42
to
b3b34ea
Compare
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
b3b34ea
to
dbea35d
Compare
Proposed changes
Experiment with the idea of
Types of changes
Paste Link to the issue
Checklist
cargo fmt
as mentioned in CODING_GUIDELINEScargo clippy
as mentioned in CODING_GUIDELINESFurther comments