Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logging: Log the cause behind agent startup failure #11353

Merged
merged 3 commits into from
Oct 27, 2021
Merged

Conversation

notnoop
Copy link
Contributor

@notnoop notnoop commented Oct 20, 2021

Log the failure error when the agent fails to start. Previously, the
agent startup failure error would be emitted to the command UI but not
logged. So it doesn't get emitted to syslog or log_file if they are
set, and it makes debugging much harder. Also, logging the error again
before exit makes the error more visible: previously, the operator
needed to scroll to the top to find the error.

On a sample failure, the output will look like:

==> WARNING: Bootstrap mode enabled! Potentially unsafe operation.
==> Loaded configuration from sample-configs/config-bad
==> Starting Nomad agent...
==> Error starting agent: setting up server node ID failed: mkdir /path-without-permission: read-only file system
    2021-10-20T14:38:51.179-0400 [WARN]  agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/path-without-permission/plugins
    2021-10-20T14:38:51.181-0400 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=/path-without-permission/plugins
    2021-10-20T14:38:51.181-0400 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=/path-without-permission/plugins
    2021-10-20T14:38:51.181-0400 [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
    2021-10-20T14:38:51.181-0400 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
    2021-10-20T14:38:51.181-0400 [INFO]  agent: detected plugin: name=mock_driver type=driver plugin_version=0.1.0
    2021-10-20T14:38:51.181-0400 [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
    2021-10-20T14:38:51.181-0400 [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
    2021-10-20T14:38:51.181-0400 [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
    2021-10-20T14:38:51.181-0400 [ERROR] agent: error starting agent: error="setting up server node ID failed: mkdir /path-without-permission: read-only file system"

This change adds the final ERROR message. It's easy to miss the ==> Error starting agent above.

This commit addresses two usability issues with our logging.

First, log the failure error when the agent fails to start. Previously, the
agent startup failure error would be emitted to the command UI but not
logged. So it doesn't get emitted to syslog or `log_file` if they are
set, and it makes debugging much harder. Also, logging the error again
before exit makes the error more visible: previously, the operator
needed to scroll to the top to find the error.

On a sample failure, the output will look like:
```
==> WARNING: Bootstrap mode enabled! Potentially unsafe operation.
==> Loaded configuration from sample-configs/config-bad
==> Starting Nomad agent...
==> Error starting agent: setting up server node ID failed: mkdir /path-without-permission: read-only file system
    2021-10-20T14:38:51.179-0400 [WARN]  agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/path-without-permission/plugins
    2021-10-20T14:38:51.181-0400 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=/path-without-permission/plugins
    2021-10-20T14:38:51.181-0400 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=/path-without-permission/plugins
    2021-10-20T14:38:51.181-0400 [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
    2021-10-20T14:38:51.181-0400 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
    2021-10-20T14:38:51.181-0400 [INFO]  agent: detected plugin: name=mock_driver type=driver plugin_version=0.1.0
    2021-10-20T14:38:51.181-0400 [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
    2021-10-20T14:38:51.181-0400 [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
    2021-10-20T14:38:51.181-0400 [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
    2021-10-20T14:38:51.181-0400 [ERROR] agent: error starting agent: error="setting up server node ID failed: mkdir /path-without-permission: read-only file system"
```

This change adds the final `ERROR` message. It's easy to miss the `==>
Error starting agent` above.

Also, ensure that all logs emitted by the golang standard logger (e.g. `log.Printf`), are routed through hclog. Some dependencies still use `log.Printf` and we'd like their output to be timestampped and jsonified as appropriate.
Copy link
Contributor

@DerekStrickland DerekStrickland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice enhancement!

@vercel vercel bot temporarily deployed to Preview – nomad October 20, 2021 18:55 Inactive
@notnoop notnoop changed the title logging: Pipe all logs through hclog logging: Log the cause behind agent startup failure Oct 20, 2021
@notnoop
Copy link
Contributor Author

notnoop commented Oct 20, 2021

Looks like @lgfa29 already beat me to standard logger through hclog issue in #11291. Backed out the duplicate snippet and updated PR description.

Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget the changelog.

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants