-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filbeat stops shipping journald logs when encountered "failed to read message field: bad message" error #32782
Comments
Could be related to the issues discussed in #23627 |
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
I'm having the same problem with filebeat Here is the
From the filebeat log:
After this error message the journald input completey stops, there are no new journald events transmitted. This is a very irritating behavior. Is it possible to implement a fix so that in case of such an error the 'bad mesage' is skipped and the input continues to parse the other messages? |
Hi, any updates on this? |
Sorry, we haven't been able to prioritize this issue yet. |
Any chance this will be prioritized soon, or a timeline on promoting journald out of tech preview? Thanks! |
Seems like the underlying issue is with Any chance such records (or entire |
I'm also desperately hoping for a fix on this issue. I second georgivalentinov's statement: "Any chance such records (or entire journald files) can be skipped by Filebeat and just continue with the rest?" |
We have taking journald out of tech preview as a priority (see #37086) and this needs to be fixed as part of that work. |
I've been investigating this crash, it is reproducible like #34077, however it also happens with the following version of journald from Ubuntu 24.04 LTS
It is coming from
|
Even after merging #40061, I can still reproduce this "cannot allocate memory" error. Here is how the new log error look: {
"log.level": "error",
"@timestamp": "2024-08-08T20:33:26.947Z",
"log.logger": "input.journald",
"log.origin": {
"function": "github.com/elastic/beats/v7/filebeat/input/journald/pkg/journalctl.(*Reader).Close",
"file.name": "journalctl/reader.go",
"file.line": 256
},
"message": "Journalctl wrote to stderr: Failed to iterate through journal: Bad message\n",
"service.name": "filebeat",
"id": "PR-testig",
"input_source": "LOCAL_SYSTEM_JOURNAL",
"path": "LOCAL_SYSTEM_JOURNAL",
"ecs.version": "1.6.0"
} It seems that even journalctl is struggling to read some messages, what makes it hard to debug is that so far I've only managed to reproduce it when the system and the journal are under high load and sometimes I get a memory error first (see #39352). The solution here seems to be the same, make the journald input restart journalctl, however we need to understand what happens the message/cursor. Is the message lost? Will we get stuck on this message? Can we skip it in code and log an error/warning for the user? |
@pierrehilbert I changed the status to 'need technical definition' because we need to decide how to handle this error, at the moment the best option seems to make the journald input resilient to journalctl crashes and then validate whether this is still an issue that needs to be addressed directly. |
Even if this bug didn't exist we should be doing this and automatically recovering from as many problem situations as we can. |
I agree, but we need to be careful on how we implement this to avoid getting stuck on a "bad message", so far I only managed to reproduce this when putting the system under stress to test/investigate the journal rotation related crashes. We already have an issue to track this improvement: #39355. |
This is not fixed by #40558 |
I've just managed to reproduce this issue by calling journalctl directly:
Both, Filebeat and journalctl were able to continue reading the journal using the last know cursor, and I did not see any indication that we'd get stuck in a "Bad message" crash loop, so I'm closing this issue as solved by #40558 |
Filebeat seems to stop sending
journald
logs when encountered "failed to read message field: bad message" errorjournald input:
Logs:
Filebeat 8.3.3
Similar bug is also reported in Loki, not sure if the fix is similar or not:
Bug Report: grafana/loki#2812
PR: grafana/loki#2928
The text was updated successfully, but these errors were encountered: