Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit logging messages and map them to Debug/Info/Error #3595

Closed
scrye opened this issue Jan 9, 2020 · 8 comments · Fixed by #3789
Closed

Audit logging messages and map them to Debug/Info/Error #3595

scrye opened this issue Jan 9, 2020 · 8 comments · Fixed by #3789
Assignees
Labels
feature New feature or request i/breaking change PR that breaks forwards or backwards compatibility
Milestone

Comments

@scrye
Copy link
Contributor

scrye commented Jan 9, 2020

It would be useful to remove the tracing logging level, because:

  • the formatting of trace log lines is not consistent with the formatting of other levels;
  • it requires complex logic/hacks in the go/lib/log package, because the underlying logging library does not support custom levels
  • it provides no tangible benefits; five logging levels should be enough, a sixth does not add much

As a first step, we should reevaluate all logging calls in the code base, and decide what level they should be. As a general guideline, the following could be used:

  • Debug: very low-level details about what the application is doing; target audience is developers familiar with the code base; for everyone else, it would be challenging to understand them. Everything Trace log entry in current code should be bumped up to Debug.
  • Info: high-level details about what the application is doing; target audience is developers and operators with knowledge of SCION and application APIs. Some Debug log entries in current code should be bumped up to Info.
  • Warn: details about events that do not impact the application's functionality, but are unexpected during normal operation
  • Error: details about events that might impact the application's functionality, but from which the application was able to recover
  • Critical: details about events that severely impact the application's functionality, including events that cause it to crash
@scrye scrye added feature New feature or request i/proposal A new idea requiring additional input and discussion labels Jan 9, 2020
@scrye scrye self-assigned this May 4, 2020
@scrye
Copy link
Contributor Author

scrye commented May 4, 2020

@worxli, @mkowalski, @karampok: Does the proposal sound reasonable to you?

@scrye scrye removed the i/proposal A new idea requiring additional input and discussion label May 5, 2020
@scrye
Copy link
Contributor Author

scrye commented May 5, 2020

Switching this from a proposal to a planned change.

@oncilla
Copy link
Contributor

oncilla commented Jun 5, 2020

I would like to have a guideline written in our docs that covers the following:

  • When to use which log level (I think the description in this issue covers this pretty well)
  • Interaction between logs and metrics: More precisely: When to use logs vs when to use metrics

@scrye
Copy link
Contributor Author

scrye commented Jun 5, 2020

I would like to have a guideline written in our docs that covers the following:

* When to use which log level (I think the description in this issue covers this pretty well)

* Interaction between logs and metrics: More precisely: When to use logs vs when to use metrics

Extracted to Anapaya/scion#3072 (sorry, internal link).

@scrye
Copy link
Contributor Author

scrye commented Jun 8, 2020

Let's go even further with the reduction in logging levels. As discussed in Slack, we can use three levels: Debug, Info, Error.

From Slack (sorry, internal link):

  • Debug: developer only, very low-level details
  • Info: what the application is doing, quite verbose, includes “errors” that are part of normal operation. A lot of Errors from today would move here, because they’re confusing to third-parties who might be genuinely concerned by some of these, even though they’re part of normal operation.
  • Error: application encountered a major problem (e.g., DB unreachable), and it might even need to terminate.

@scrye
Copy link
Contributor Author

scrye commented Jun 8, 2020

To have this actionable:

  • All Trace logging calls become Debug
  • All Debug calls are re-evaluated, and some become Info
  • All Info calls stay Info
  • All Warning calls become Info
  • All Error calls are re-evaluated, and some become Info
  • All Crit calls become Error

Library go/lib/log is also refactored to only expose Debug, Info, and Error APIs. Trace is removed from internal logic.

@scrye scrye added the i/breaking change PR that breaks forwards or backwards compatibility label Jun 8, 2020
@scrye
Copy link
Contributor Author

scrye commented Jun 9, 2020

None are of the above recommendations are set in stone, so we'll need to take a look at each of them and decide what makes most sense, maybe even editing the message and arguments.

@scrye
Copy link
Contributor Author

scrye commented Jun 9, 2020

Let's tackle this for go/lib first, and see how much work that is. If it's too much, we'll break it down further.

@scrye scrye added this to the S01 E12 milestone Jun 9, 2020
@scrye scrye removed their assignment Jun 11, 2020
@scrye scrye changed the title Only use Debug/Info/Warn/Error/Crit levels for logging Audit logging messages and map them to Debug/Info/Error Jun 12, 2020
@lukedirtwalker lukedirtwalker self-assigned this Jun 12, 2020
lukedirtwalker added a commit to lukedirtwalker/scion that referenced this issue Jun 16, 2020
This is a BREAKING CHANGE, because logging infrastructure might need to
re-evaluate scripts/alerting/logging levels in configs, etc. also this commit
removes the logdog tool.

All Trace logs are converted to Debug.
All Crit logs are converted to Error.
All Warn logs are converted to Info.

Having more than those remaining 3 levels is not really needed:
- debug: Contains low level details for developers.
- info: Contains general information about the process, failures that
  happen as part of normal operation are also logged at this level.
- error: Contains unexpected errors that lead to mal-functioning of the
  system in severe cases the application may choose to exit after an
  error log.

This PR also removes logdog since that was a tool that was seldomly used
and is strictly tied to the log format we have now, the format might
change in the future.

The new recommended logging level for production is info.

Further audits are needed of what we need to log at info level instead
of debug/error.

Fixes scionproto#3595
lukedirtwalker added a commit that referenced this issue Jun 17, 2020
This is a BREAKING CHANGE, because logging infrastructure might need to
re-evaluate scripts/alerting/logging levels in configs, etc. also this commit
removes the logdog tool.

All Trace logs are converted to Debug.
All Crit logs are converted to Error.
All Warn logs are converted to Info.

Having more than those remaining 3 levels is not really needed:
- debug: Contains low level details for developers.
- info: Contains general information about the process, failures that
  happen as part of normal operation are also logged at this level.
- error: Contains unexpected errors that lead to mal-functioning of the
  system in severe cases the application may choose to exit after an
  error log.

This PR also removes logdog since that was a tool that was seldomly used
and is strictly tied to the log format we have now, the format might
change in the future.

The new recommended logging level for production is info.

Further audits are needed of what we need to log at info level instead
of debug/error.

Fixes #3595
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request i/breaking change PR that breaks forwards or backwards compatibility
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants