Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON agent logs buffered on startup #13015

Closed
wjordan opened this issue May 13, 2022 · 6 comments · Fixed by #13076
Closed

JSON agent logs buffered on startup #13015

wjordan opened this issue May 13, 2022 · 6 comments · Fixed by #13076
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/config theme/logging type/enhancement

Comments

@wjordan
Copy link
Contributor

wjordan commented May 13, 2022

Issue

I recently discovered that the Nomad agent buffers all of its logs on startup until it reaches a certain point in the initialization process:

// Output the header that the server has started
c.Ui.Output("Nomad agent started! Log data will stream in below:\n")
// Enable log streaming
logGate.Flush()

This behavior made a recent attempt to debug issues with a slow-loading custom task driver plugin confusing and difficult. I wasn't able to view the logs until after the initialization was completed, and log lines were tagged with incorrect timestamps (reflecting when the buffer was flushed instead of when the log lines were originally recorded).

I couldn't find any mention of this behavior in the documentation, and only figured out what was happening by reading through the source code. (In hindsight, it seems that the Log data will stream in below message hints at the buffer being flushed, but this meaning not clear enough from the text alone.)

The original commit introducing this logGate / GatedWriter (2165576) was very early in the project, and there is no commit message or discussion around this behavior, so I still don't understand why this log buffering was implemented in the first place or what purpose it continues to serve. I can't think of any scenario where this kind of buffering behavior would actually be helpful.

Could this buffering behavior be simply removed, or at least made configurable, so that agent logs (including logs generated by task-driver plugins) can be flushed immediately? If there is a reason why this buffering can't or shouldn't be removed, could some additional documentation / comments detailing the reason for this behavior be added to the project?

@tgross
Copy link
Member

tgross commented May 13, 2022

Hi @wjordan! The reasoning behind the buffering is that Nomad's setup involves a bunch of concurrent goroutines all spinning up various bits of work, but the log output is intended to start with the "banner message" once the configuration has been validated. If we didn't buffer, that banner would be interleaved with the rest of the logs.

But obviously that's not a desirable behavior when you're trying to debug the startup itself. It looks like it'd be possible to make this feature configurable in the logging config, as that's read early enough that we could probably set everything up the same way and then just flush the gate right away. I'll mark this for roadmapping, but if you're up for opening a PR for it we'd also be happy to review it I'm sure.

@tgross tgross added stage/accepted Confirmed, and intend to work on. No timeline committment though. type/enhancement theme/config and removed type/bug labels May 13, 2022
@tgross tgross added this to Needs Triage in Nomad - Community Issues Triage via automation May 13, 2022
@tgross tgross moved this from Needs Triage to Needs Roadmapping in Nomad - Community Issues Triage May 13, 2022
@wjordan
Copy link
Contributor Author

wjordan commented May 16, 2022

If we didn't buffer, that banner would be interleaved with the rest of the logs.

Writes to logger implementations are typically protected by mutexes to prevent any such interleaving from occurring. However, one exception is Nomad's use of cli.BasicUi, without the ConcurrentUi wrapper that uses a mutex to make the library concurrency-safe:

nomad/main.go

Lines 94 to 98 in be7ec8d

agentUi := &cli.BasicUi{
Reader: os.Stdin,
Writer: os.Stdout,
ErrorWriter: os.Stderr,
}

Is this the underlying issue you're referring to? In this case, a better solution would be to use the ConcurrentUi wrapper to solve the underlying issue of interleaved logs without the undesirable side-effect of having logs buffered on agent startup. I can prepare a PR along these lines if that sounds good.

@tgross
Copy link
Member

tgross commented May 17, 2022

Writes to logger implementations are typically protected by mutexes to prevent any such interleaving from occurring. However, one exception is Nomad's use of cli.BasicUi, without the ConcurrentUi wrapper that uses a mutex to make the library concurrency-safe:
...
Is this the underlying issue you're referring to? In this case, a better solution would be to use the ConcurrentUi wrapper

Yes, and the "obvious" fix here if we'd used ConcurrentUi would be to send out everything in command.go#L763-L776 in a single c.Ui.Output() call to avoid interleaving. But unfortunately the ConcurrentUi wrapper only locks the Ui object and not access to stdout/stderr. Which implicitly means that mitchell/cli expects that it owns stdout/stderr. That's only the case if we're not writing the logs there. (While that's an option in Nomad, it's not typical on your typical contemporary Linux running journald.)

So that's why I think we want this to be part of the logging configuration.

@wjordan
Copy link
Contributor Author

wjordan commented May 17, 2022

Ohh, you're not referring to character-level interleaving on an underlying log stream, you're just referring to the ordering of log-lines in the agent startup, which is a very different story.

I was completely confused by what you were saying about the buffering preventing the banner from being interleaved, until I finally discovered through further experimentation that the log_json = true setting I have enabled behaves differently from the non-json logging output. Here is what the json log output produces (all of the plugin-setup lines appear before Nomad agent started:

[...output paused here until agent finishes booting...]

{"@level":"info","@message":"Loaded configuration from [...].hcl","@module":"agent","@timestamp":"2022-05-17T10:04:42.499989-07:00"}
{"@level":"info","@message":"Starting Nomad agent...","@module":"agent","@timestamp":"2022-05-17T10:04:42.500136-07:00"}
{"@level":"info","@message":"detected plugin","@module":"agent","@timestamp":"2022-05-17T10:04:42.502818-07:00","name":"raw_exec","plugin_version":"0.1.0","type":"driver"}
[...]
{"@level":"info","@message":"Nomad agent configuration:\n","@module":"agent","@timestamp":"2022-05-17T10:04:42.566731-07:00"}
[...]
{"@level":"info","@message":"Nomad agent started! Log data will stream in below:\n","@module":"agent","@timestamp":"2022-05-17T10:04:42.566769-07:00"}

Here's what the non-json log outputs:

==> Loaded configuration from [...].hcl
==> Starting Nomad agent...
[...output paused here until agent finishes booting...]
==> Nomad agent configuration:
[...]
==> Nomad agent started! Log data will stream in below:

    2022-05-17T10:21:03.803-0700 [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
    [...]

The json case buffers the logs until the agent finishes starting up and also still interleaves the written log lines, so the buffering is doing nothing other than delaying the output.

So one narrow way to fix this issue for json logging would be to just add a logGate.Flush() in the same clause that swaps out the UI implementation (which is what causes the logs to be interleaved anyway and makes the buffering useless):

// Swap out UI implementation if json logging is enabled
if config.LogJson {
c.Ui = &logging.HcLogUI{Log: logger}
}

Does this approach sound reasonable?

@tgross
Copy link
Member

tgross commented May 18, 2022

So one narrow way to fix this issue for json logging would be to just add a logGate.Flush() in the same clause that swaps out the UI implementation (which is what causes the logs to be interleaved anyway and makes the buffering useless):
...
Does this approach sound reasonable?

Yes, sounds good to me! Tag me for review on the PR and I'll be happy to get that merged. Thanks!

@wjordan wjordan changed the title Agent logs buffered on startup JSON agent logs buffered on startup May 19, 2022
Nomad - Community Issues Triage automation moved this from Needs Roadmapping to Done May 19, 2022
@github-actions
Copy link

github-actions bot commented Oct 7, 2022

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/config theme/logging type/enhancement
Projects
Development

Successfully merging a pull request may close this issue.

2 participants