Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: audit journal of commands and trace of actor messages #3325

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

didier-wenzek
Copy link
Contributor

Proposed changes

Experiment with the idea of

  • an audit journal logging user actions and commands
  • and a log tracing all the actor messages.

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
  • Documentation Update (if none of the other choices apply)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue


Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA (in all commits with git commit -s)
  • I ran cargo fmt as mentioned in CODING_GUIDELINES
  • I used cargo clippy as mentioned in CODING_GUIDELINES
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

@@ -181,6 +181,7 @@ tokio-util = { version = "0.7", features = ["codec"] }
toml = "0.8"
tower = "0.4"
tracing = { version = "0.1", features = ["attributes", "log"] }
tracing-appender = "0.2"
Copy link
Contributor Author

@didier-wenzek didier-wenzek Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use for the rolling-file feature.

};

// Audit journal
let audit_appender = tracing_appender::rolling::hourly("/tmp", "tedge.audit.log");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will have to be configurable ;-)

Copy link
Contributor Author

@didier-wenzek didier-wenzek Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to log what change the device state and thin-edge behavior.

2025-01-10T17:14:17.945362368Z  INFO Audit: tedge-agent started
2025-01-10T17:14:18.011907868Z  INFO Audit: Execute software_list command, log = /var/log/tedge/agent/workflow-software_list-c8y-mapper-2025-01-10T17:44:04.806507397Z.log 
2025-01-10T17:14:18.038213949Z  INFO Audit: Executed software_list command
2025-01-10T17:20:36.250039963Z  INFO Audit: tedge config unset aws.root_cert_path
  • tedge config set commands
  • tedge connect/disconnect commands
  • tedge cert create/request/renew commands
  • Workflow command (init / success / failure)
  • Tedge config file updates
  • Tedge daemon start/stop
  • What else?
  • What is the correct level of details?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about changes to c8y supported operations? Would show when supported commands are added/deleted and when the set is reloaded. Recently when live-debugging with a user we were confused why c8y_Command was not present, so it would be ideal to be able to go into the log audit to see why it was removed!

.with_filter(filter_fn(|metadata| metadata.target() == "Audit"));

// Actor traces
let trace_appender = tracing_appender::rolling::hourly("/tmp", "tedge.actors.log");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will have to be configurable ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is observe & debug actors and ideally to test actors.

  • Distinguish source messages (MQTT / inotify / service start / ^C) from reactions
  • Add trace identifiers to follow the consequence of an input messages
  • Do we have to interleave actions on the system here (HTTP requests? File Updates? System commands?)

Copy link
Contributor

@Bravo555 Bravo555 Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Distinguish source messages (MQTT / inotify / service start / ^C) from reactions

I think ideally we should distinguish between what are strictly inputs/outputs from internal actor messages. To that effect, we do already have MQTT_sub, MQTT_pub and MQTT_recv targets.

Copy link
Contributor

github-actions bot commented Jan 10, 2025

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
550 2 2 552 99.64 1h28m54.359426999s

Failed Tests

Name Message ⏱️ Duration Suite
Check existence of init directories ! test -d '/etc/tedge/device-c... returned an unexpected exit code stdout: stderr: 32.036 s Tedge Init
Check absence of OpenSSL Error messages #3024 'active Jan 14 09:41:38 7146cfa0f1e2 mosquitto[499]: 1736847698: Received PUBACK from Cumulocity (Mid: 29, RC:0) Jan 14 09:41:38 7146cfa0f1e2 mosquitto[499]: 1736847698: Received PUBACK from Cumulocity (Mid: 30, RC:0) Jan 14 09:41:38 7146cfa0f1e2 mosquitto[499]: 1736847698: New connection from 127.0.0.1:58394 on port 8883. Jan 14 09:41:38 7146cfa0f1e2 mosquitto[499]: 1736847698: New client connected from 127.0.0.1:58394 as auto-F11046AF-601D-AFE1-A8A5-38DC05304CCE (p2, c1, k60). Jan 14 09:41:38 7146cfa0f1e2 mosquitto[499]: 1736847698: No will message specified. Jan 14 09:41:38 7146cfa0f1e2 mosquitto[499]: 1736847698: Sending CONNACK to auto-F11046AF-601D-AFE1-A8A5-38DC05304CCE (0, 0) Jan 14 09:41:38 7146cfa0f1e2 mosquitto[499]: 1736847698: Received SUBSCRIBE from auto-F11046AF-601D-AFE1-A8A5-38DC05304CCE Jan 14 09:41:38 7146cfa0f1e2 mosquitto[499]: 1736847698: te/device/main/service/tedge-mapper-c8y/status/health (QoS 0) Jan 14 09:41:38 7146cfa0f1e2 mosquitto[499]: 1736847698: auto-F11046AF-601D-AFE1-A8A5-38DC05304CCE 0 te/device/main/service/tedge-mapper-c8y/status/healt... [ Message content over the limit has been removed. ] ...7 7146cfa0f1e2 mosquitto[1314]: 1736847717: Client zvukvnmmfk disconnected. Jan 14 09:41:58 7146cfa0f1e2 mosquitto[1314]: 1736847718: Received PUBLISH from tedge-mapper-c8y (d0, q1, r1, m38, 'te/device/main///cmd/software_list/c8y-mapper-2025-01-14T09:41:57.64247516Z', ... (0 bytes)) Jan 14 09:41:58 7146cfa0f1e2 mosquitto[1314]: 1736847718: Sending PUBLISH to tedge-agent#te/device/main// (d0, q1, r0, m25, 'te/device/main///cmd/software_list/c8y-mapper-2025-01-14T09:41:57.64247516Z', ... (0 bytes)) Jan 14 09:41:58 7146cfa0f1e2 mosquitto[1314]: 1736847718: Sending PUBLISH to tedge-mapper-c8y (d0, q1, r0, m181, 'te/device/main///cmd/software_list/c8y-mapper-2025-01-14T09:41:57.64247516Z', ... (0 bytes)) Jan 14 09:41:58 7146cfa0f1e2 mosquitto[1314]: 1736847718: Sending PUBLISH to auto-DF877659-5BDE-6B86-C26D-F802079DDA42 (d0, q0, r0, m0, 'te/device/main///cmd/software_list/c8y-mapper-2025-01-14T09:41:57.64247516Z', ... (0 bytes)) Jan 14 09:41:58 7146cfa0f1e2 mosquitto[1314]: 1736847718: Sending PUBACK to tedge-mapper-c8y (m38, rc0) Jan 14 09:41:58 7146cfa0f1e2 mosquitto[1314]: 1736847718: Received PUBACK from tedge-mapper-c8y (Mid: 181, RC:0) Jan 14 09:41:58 7146cfa0f1e2 mosquitto[1314]: 1736847718: Received PUBACK from tedge-agent#te/device/main// (Mid: 25, RC:0) ' contains 'OpenSSL Error' 0.324 s Tedge Connect Test

@didier-wenzek
Copy link
Contributor Author

All the system tests are failing for the same reason: initializing rolling file appender failed: InitError { context: "failed to create initial log file", source: Os { code: 13, kind: PermissionDenied, message: "Permission denied" }.

@Bravo555 Bravo555 self-assigned this Jan 13, 2025
Copy link
Contributor

@Bravo555 Bravo555 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely need to improve and formalise our logging a little more, so I like the idea of separate journals for audited operations and actor messages. Another possible category would be also IO (network/disk or both).
For tracking the causes of events, we could definitely make better use of spans. I have one branch where I had to instrument a bit of code to track a bug, so I will also create a PR with that once I cleaned it up a bit.
I reckon there will be more changes, so will review again when the PR is ready.

Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants