-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace github.com/coreos/go-systemd/v22/sdjournal
by journalctl
#40061
Replace github.com/coreos/go-systemd/v22/sdjournal
by journalctl
#40061
Conversation
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
This pull request is now in conflicts. Could you fix it? 🙏
|
3460771
to
71db39c
Compare
github.com/coreos/go-systemd/v22/sdjournal
by journalctl
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
So, more of a general comment, since I don't want to spam the same review comments: we're missing a lot of godocs, particularly on exported functions. Also, it would be good to add an extended explanation to the journald implementation as to why we're exec'ing out to |
Thanks @fearful-symmetry ! I'll start working on it. |
@fearful-symmetry I added the reasoning for using |
@belimawr did you want to add a description to the code itself? My concern is that someone is going to look at the code |
I was gonna add to the commit message, but I can add to the code itself if you prefer. Where in the code do you believe is the best place to add it? in a |
@belimawr in |
On TestInputSeek set the since option to a time between the first and second entries to avoid issues with different hosts.
The since tests was broken because newer versions of Jouranlctl accepts time stamps on RFC3339, but older versions do not. The Reader now formats the since parameter in a format accepted by older versions of systemd. The since tests ensures the timestamp is in the local timezone to avoid conflicts with jouranlct.
This commit fixes tests that were failing due to an incompatible journal version. The new journal file was generated with journald 249 (249.11-0ubuntu3.12).
This commit adds the missing syscalls to our seccomp policy so Filebeat can start the journalctl process. The syscalls were acquired by running Filebeat with the seccomp default action set to "log", and running Auditbeat to collect those logs and convert the syscall number into a name accepted by our seccomp policy.
b350bf2
to
284f7b1
Compare
This pull request is now in conflicts. Could you fix it? 🙏
|
This pull request is now in conflicts. Could you fix it? 🙏
|
Add the correct syscalls to the changelog and move the entry under breaking changes.
…eats into journalctl-for-journald-input
Fix merge conflicts in the changelog and improve working of one entry.
Re-add a syscall that was mistakenly removes from the default seccomp policy.
Fixes a log entry that was mixing keys with data
Proposed commit message
github.com/coreos/go-systemd/v22/sdjournal
is removed and Filebeat now callsjournalctl
directly to read journald entries.sdjournal
relies on libsystemd to read journal files and the active system journal, however due to a bug (systemd/systemd#29456) in systemd, it crashes during journal rotation. Filebeat is affected by it, if the host has a libsystemd affected. During a journal rotation (usually only on high load) Filebeat will crash with a SIGBUS. There is no way to prevent or recover from this crash, it happens outside of our codebase, the SIGBUS is turned into a panic by the Go runtime and we cannot recover from it.The bug has been fixed in Systemd v255, which is not widely used yet. So in most systems out there Filebeat might crash when reading journal logs.
Because there is no way for Filebeat to avoid the crash, we decided to replace
github.com/coreos/go-systemd/v22/sdjournal
by callingjournalctl
directly and reading it stdout.On hosts where Filebeat crashes when reading from journald,
journalctl
can successfully read all journal files. OpenTelemetry collector also callsjournalctl
and has no issues reading the journal during rotation.Because the reading backend has changed, some configuration options have been removed and behaviours adapted to match
journalctl
.Breaking changes:
Changes that will prevent the journald input from starting:
include_matches.match
does not accept theand
andor
keys any more.Changes in the journald input behaviour:
backoff
,max_backoff
,cursor_seek_fallback
have been removedseek
now has only 3 modes:since
,head
andtail
.seek
option will be ignored.The following syscalls are added to the default seccomp policy:
dup3
,faccessat2
,prctl
andsetrlimit
. This affects all Beats because the policy is global and shared by all Beats.Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Disruptive User Impact
Even though the journald input is not GA yet, which makes breaking changes acceptable, this PR introduces breaking changes that will make certain configurations not work as expected or not to work at all.
Changes that will prevent the journald input from starting:
include_matches.match
does not accept theand
andor
keys any more.Changes in the journald input behaviour:
backoff
,max_backoff
,cursor_seek_fallback
have been removedseek
now has only 3 modes:since
,head
andtail
.seek
option will be ignored.Author's Checklist
How to test this PR locally
Using the following input configuration:
Start Filebeat and assert the journald messages are sent to the configured output.
To manually see the journald messages and compare with what you see in Filebeat's output, you can use:
This will print out all fields Filebeat can read.
Identifying the missing syscalls
This PR adds some syscalls to our seccomp policy, to find the syscalls names required by the seccomp policy, I ran Auditbeat configured to collect seccomp violations and convert numbers to names, then I started Filebeat setting the default behaviour of our seccomp policy to
log
and allowing the default list of syscalls.Here are the configuration files used:
filebeat.yml
auditbeat.yml
First start Auditbeat as root, then run Filebeat. Ensure Filebeat is correct collecting the journal logs. To find out the syscalls names you can use Discover on Kibana or run the following query:
Related issues
system.auth
andsystem.syslog
are not available for AL2023 under Data streams tab. elastic-agent#4250. This will only be fully fixed when the system integration starts using the journald input to read system logs on hosts that have deprecated the traditional log files## Use cases## Screenshots## Logs