Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Idea: Better Support for 3rd party logging solutions. #9211

Open
the-maldridge opened this issue Oct 28, 2020 · 13 comments
Open

Feature Idea: Better Support for 3rd party logging solutions. #9211

the-maldridge opened this issue Oct 28, 2020 · 13 comments

Comments

@the-maldridge
Copy link

Logs are something that should just be automagic in a clustered environment. Teams shouldn't need to think about them and they should be managed by the infrastructure. Supporting these systems also shouldn't break nomad's built-in log management, so changing things like the global docker logging configuration isn't really a solution.

I propose a patch to logmon to allow it to dupe the log streams to a second well-known location. This location should name the files something like {{.NodeName}}-{{.Job}}-{{.Group}}-{{.Task}}-{{.AllocID}}. Bonus points for making this a templatable string!

I don't think this is a particularly complex feature to support, but it would open up running log management solutions like loki as one-click options within nomad. More importantly, it would generally allow other log indexing to work without breaking build in logs commands.

@apollo13
Copy link
Contributor

This would indeed be a very welcome addition. I do not care if it is a simple duplication via files or a plugin (?) which gets a copy of all the logs and can do whatever it wants.

If it is/were a fd duplication there are probably a few more things to consider:

  • What about retention of those files?
  • Maybe even support / in the template and automagically create the folders
  • Since the filename should really be unique the namespace and whatever else is needed to make it unique should be part of it by default.

A plugin would also be nice, but the question is how the interface would look like and how nomad would start it. Similar to CSI plugins or via configuration in the config files (ie just specify a socket endpoint?)

@the-maldridge
Copy link
Author

I was trying to keep this as simple as possible, so I assumed the retention would be the same as the other copies of the log files. if I task needs to slurp them it needs to be using inotify or some other means to notice them.

@apollo13
Copy link
Contributor

apollo13 commented Oct 28, 2020

To be honest I do not know the current retention times. But if it survives alloc shutdown + x minutes that would be fine for most systems.

@towe75
Copy link
Contributor

towe75 commented Jun 6, 2021

I stumbled upon this while working on podman driver logging options. What would you think about a external service to stream the logs from nomad to e.g. a loki server or ELK stack? Nowadays we have the nomad event stream, so we can easily learn about started/stopped allocations and simply stream and transform the logs for each alloc.

IMHO a external service is a better choice here because nomads API allows for a good integration and the log streamer can grow independently. Job meta data could be used to enrich the logs with custom fields/categories and also to select or filter allocs eligible for log forwarding.

Any opinions? Do you think it would be useful?

@sofixa
Copy link
Contributor

sofixa commented Jun 6, 2021

@towe75 a few already exist, like this one https://github.com/sas1024/nomad_follower

@the-maldridge
Copy link
Author

@towe75 I think the idea you suggest of using the event stream is a pretty good one. I'd want to have it as something that could be run as a system level task across all hosts though, since the goal here was to have something that would make cluster wide logging automatic.

@towe75
Copy link
Contributor

towe75 commented Jun 7, 2021

@sofixa : thank you, i will have a look at it.
@the-maldridge : yes, sure. But conceptionally it does not really matter and it can be a event filter by node-id, in example. Performance wise, however, it's surely a good idea to break log shipping into several sinks. I will see if i can come up with some POC if my time allows it.

@apollo13
Copy link
Contributor

apollo13 commented Jun 7, 2021

I wonder if that wouldn't overwhelm the event stream. Logs are probably even noisier than what the event stream usually transports

@towe75
Copy link
Contributor

towe75 commented Jun 7, 2021

@apollo13 To clarify: the eventstream itself does not provide the logs. It only helps to track allocation startup/teardown. The regular allocfs api can then stream the logs.

It might, however, also be interesting to treat parts of the eventstream as structured log on it's own. Loki and other log aggregators can cope pretty well with any structured data, not just logs. So the log viewer could show internal messages from the eventstream (e.g. alloc xzy started) followed by the actual logs and finally some alloc teardown event. This is surely nice for batch jobs.

@apollo13
Copy link
Contributor

apollo13 commented Jun 7, 2021

@apollo13 To clarify: the eventstream itself does not provide the logs. It only helps to track allocation startup/teardown. The regular allocfs api can then stream the logs.

Thanks, I indeed missed that part.

@tgross
Copy link
Member

tgross commented May 31, 2023

Some more context and feature ideas: #17366

@apollo13
Copy link
Contributor

@tgross Did you ever make some progress here aside from the WIP branch (https://github.com/hashicorp/nomad/blob/4667539b31710b0243f30719db3b88e7a7e83b98/plugins/logging/README.md) in 2022?

@tgross
Copy link
Member

tgross commented Sep 20, 2023

That experiment got turned into an internal design document (RFC) that's spurred some good discussion but it hasn't quite made it "over the line" when planning releases yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants