Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging guidelines draft #512

Merged
merged 13 commits into from
Sep 15, 2020
48 changes: 26 additions & 22 deletions docs/development-plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,11 @@ It will run WASM code and measure CPU time spent on computations.

### Project Artifacts

| Platform | Artifacts |
|--|--|
| Linux | Debian/Ubuntu Package (.deb) |
| macOS | Apple Disk Image (.dmg) |
| Windows | Windows Installer (.msi) |
| Platform | Artifacts |
| -------- | ---------------------------- |
| Linux | Debian/Ubuntu Package (.deb) |
| macOS | Apple Disk Image (.dmg) |
| Windows | Windows Installer (.msi) |

## Development Process

Expand All @@ -30,24 +30,24 @@ Milestones should be used to pin issues to specific releases.

Labels should be used to filter issues by other criteria:

| Label | Description |
|--|--|
| enhancement | A new feature that can be implemented. |
| bug | A bug report. |
| in progress | Work on this issue is in progress. |
| duplicate | A very similar issue already exists. |
| deferred | Will be done later, after other issues are closed. |
| Label | Description |
| ----------- | -------------------------------------------------- |
| enhancement | A new feature that can be implemented. |
| bug | A bug report. |
| in progress | Work on this issue is in progress. |
| duplicate | A very similar issue already exists. |
| deferred | Will be done later, after other issues are closed. |

### Working on Git Branches

Software development should be done on Git branches.

| Branch Name or Prefix | Meaning |
|--|--|
| master | Main branch. |
| feature/ | New feature, e.g. feature/connection-manager. |
| bugfix/ | Bug fix, e.g. bugfix/division-by-zero. |
| release/ | Branch for a special release, e.g. release/3.0. |
| Branch Name or Prefix | Meaning |
| --------------------- | ----------------------------------------------- |
| master | Main branch. |
| feature/ | New feature, e.g. feature/connection-manager. |
| bugfix/ | Bug fix, e.g. bugfix/division-by-zero. |
| release/ | Branch for a special release, e.g. release/3.0. |

### Pull Requests

Expand All @@ -57,10 +57,10 @@ After work on a feature is finished, a pull request based on the branch where th

Every branch is automatically compiled and tested in Jenkins.

| Test Name | Requirement |
|--|--|
| Compilation | All code must compile without errors. |
| Unit Tests | All tests (prefixed with `#[cfg(test)]`) should pass. |
| Test Name | Requirement |
| --------------- | ------------------------------------------------------ |
| Compilation | All code must compile without errors. |
| Unit Tests | All tests (prefixed with `#[cfg(test)]`) should pass. |
| Code Formatting | All code must be formatted with rustfmt (`cargo fmt`). |

### Code Review and Merging
Expand Down Expand Up @@ -106,6 +106,10 @@ https://doc.rust-lang.org/1.0.0/style/README.html
To enforce formatting, code should be formatted using rustfmt tool (https://github.com/rust-lang/rustfmt).
To install it, run `rustup component add rustfmt` command. To format files in the working dictory, please run `cargo fmt` command.

### Logging Guidelines

The logging guidelines for Yagna component development can be found here: logging-guidelines.md.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not render as link


### Code Repositories

Most Rust crates used in the project should be located in one repository.
Expand Down
166 changes: 166 additions & 0 deletions docs/logging-guidelines.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
# Lightweight Golem Logging Guidelines

## Objective

The purpose of this article is to provide a set of prescriptive guidelines for developers to follow in order to achieve consistent content of execution logs.
Execution logs serve two main purposes, depending on "audience":
- **Core system developers** - Efficient troubleshooting and debugging during development of Golem modules.
- **Integrator developers** - Troubleshooting and debugging of applications which use Golem as platform.
- **Users** (including **Node owners/administrators**) - Diagnostics of issues appearing on owned Golem nodes (eg. setup, infrastructural or maintenance-related issues).

shadeofblue marked this conversation as resolved.
Show resolved Hide resolved
Ideal logs contain the right amount of information for the situation and audience. This implies that "too much logs" may be as useless as "no logs" - therefore focus is put on recording appropriate information at each log level.

The article includes also **generic requirements** referring to the **logging framework** chosen by developers for specific platform they are working on - this takes into account that Golem is a multi-platform software ecosystem, and however many programming languages can (in theory) be used to develop Golem components, their developers should maintain consistent approach to logs & audit trails, to provide uniform level of "user & developer experience".

## Scope

These guidelines adhere to software published by Golem Factory as parts of Lightweight Golem/Yagna ecosystem. This includes among others, the following repos:
- [yagna](https://github.com/golemfactory/yagna)
- [ya-client](https://github.com/golemfactory/ya-client)
- [yapapi](https://github.com/golemfactory/yapapi)
- [yagna-integration](https://github.com/golemfactory/yagna-integration)

## Logging library requirements

Must have:
- Redirect log stream to **console** (STDOUT/STDERR) and/or **files**
- Filter log entries by log level
- Filter log entries by entry subsets (like namespace subtrees)

Nice to have:
- ...

## Log entry content

The log entries should record following aspects/attributes:

**Must have:**
- **Timestamp** (with ms granularity, in UTC or with TZ information) - must be possible to correlate entries from nodes in different timezones
- Log **level**
- Log **area/topic** (eg. namespace or module of code which records the log entry)
- **Grouping or correlating** attribute (attribute which allows to filter events related to a single command, activity, API request, etc.)
- Thread id
- Bespoke correlation id
- Log entry **description** (human readable, preferably fit in one line, or properly laid-out if a multiline log entry is essential, eg. for message content)

**Nice to have/where applicable:**

- Log entry **context information**, variables, parameters
- Low level technical details (eg. network traffic content, API message bodies)
- **Error code**
- **Error resolution hints** - very important for high level errors / warns that actually can be resolved by some action. In some cases this is true also for low level errors / warns
- **Code location** of the log statement

### Generic guidelines
- Use descriptive messages and proper casing/punctuation, ie. instead of:
```
[2020-08-27T07:56:22Z DEBUG yagna] init gnt drv
```
do this:
```
[2020-08-27T07:56:22Z DEBUG yagna] Initializing GNT payment driver
```

### Data confidentiality

Care must be taken when confidential or personal data need to be recorded in logs:
- When sensitive data must be logged for dignostic purposes, obfuscate, eg. output only leading and/or trailing characters of a sensitive string instead of the full value.

## Log level guidelines

### CRITICAL/FATAL
**Purpose:**
- Indicate the app is unable to continue, it should exit after this message.

**Audience:**
- Users

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is also very important for node owners/admins.

**Examples:**
- Uncaught exceptions/unhandled errors
- Resources exhausted (eg. out of memory, out of storage space)

### ERROR
**Purpose:**
- Indicate that app is unable to perform the requested **action** and stops trying.

**Audience:**
- Users
- Integrator developers
- Core system developers

**Examples:**
- REST API and CLI command processing errors
- Payment-related (eg. blockchain interaction) errors

### WARN
**Purpose:**
- Indicate that app is gracefully handling an erratic situation and is able to continue with the requested action.

**Audience:**
- Users
- Integrator developers
- Core system developers

**Examples:**
- Errors in actions which will be retried
- Invalid parameters, attributes, addresses which will be ignored/superseded by eg. default values, etc.

### INFO
**Purpose:**
- Inform about successful initialization and shutdown of app module/feature.
- Indicate that app is performing a requested action/command/request.

**Audience:**
- Users
- Node owners/admins
- Integrator developers
- Core system developers

**Examples:**
- Startup event of a significant module - including fundamental parameters of execution, eg.
- URLs/port numbers for services listening or depending on network connectivity,
- Working directories, data directories
- Shutdown event of a significant module
- Command requested from the module
- Indication of a CLI command sent to Yagna daemon
- Indication of REST API request received

### DEBUG
**Purpose:**
- Provide additional context of performed actions, additional steps or any info which may be useful for troubleshooting.

**Audience:**
- Integrator developers
- Core system developers
- Node owners/admins

**Examples:**
- Low level processing steps, especially those dependent on files, resources, connectivity, with dependency info (URLs, addresses, parameter values)
-

### TRACE
**Purpose:**
- Record low-level/technical details of performed actions, like net traffic content (API requests, responses, Golem Net messages, etc.)

**Audience:**
- Core system developers

**Examples:**
- Environment variable/config parameter value snapshots
- API request/response body content
- Golem net message routing info and content

## Error handling & logging guidelines

As low-level, 3rd party library errors are encountered during execution, their error messages are usually useless without context information. It is vital to wrap the low-level error messages with additional info to indicate details of performed activity that would aid in troubleshooting. Eg. instead of:

```
[2020-08-27T07:56:22Z ERROR yagna] File IO error: Path not found
```

log this:

```
[2020-08-27T07:56:22Z ERROR yagna] E00342 - WASM ExeUnit DEPLOY: Error downloading remote file to temp folder '/tmp/yagna_data': File IO error: Path not found
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my experience of debugging yagna: the file name and line number where the error occurred is very important as it's ofter easier to just read the source code to find the problem. (I know you mentioned it in "Log entry content" section.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, albeit the error code indicated in this example serves this purpose: with a specific error code it should be fairly instantaneous to identify the location responsible for a particular issue (I'm conscious that providing file/line number may not be feasible for many dev platforms).


```