Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.x](backport #40558) [Journald] Restart journalctl if it exits unexpectedly #40772

Merged
merged 1 commit into from
Sep 13, 2024

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented Sep 11, 2024

Proposed commit message

If journalctl exits unexpectedly the journald input will restart it and set the cursor to the last know position. Any error/non zero return code is logged at level error.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

Author's Checklist

How to test this PR locally

  1. Build Filebeat from this PR and start it with the following configuration file:
filebeat.yml

filebeat.inputs:
  - type: journald
    id: my-unique-input-id
    enabled: true
    syslog_identifiers:
      - "potato"

output:
  console:
    pretty: true

  1. Publish some events with the syslog identfier potato
echo "$(date '+%Y-%m-%d %T') - foo"| systemd-cat -t potato
  1. Wait to see the output in the console
  2. Kill journalctl
killall journalctl
  1. Wait journalctl to restart, look for a log entry like this:
{
  "log.level": "info",
  "@timestamp": "2024-08-20T12:41:29.682-0400",
  "log.logger": "input.journald.reader.journalctl-runner",
  "log.origin": {
    "function": "github.com/elastic/beats/v7/filebeat/input/journald/pkg/journalctl.Factory",
    "file.name": "journalctl/journalctl.go",
    "file.line": 120
  },
  "message": "journalctl started with PID 3119456",
  "service.name": "filebeat",
  "id": "filestream-input-id",
  "input_source": "LOCAL_SYSTEM_JOURNAL",
  "path": "LOCAL_SYSTEM_JOURNAL",
  "ecs.version": "1.6.0"
}
  1. Publish a few more messages and ensure they appear in the console
echo "$(date '+%Y-%m-%d %T') - foo"| systemd-cat -t potato

Related issues

Tests

I run a test overnight where I used a mock to simulate constant failures and restarts of journalclt and monitored the host and Filebeat process to ensure there weren't any goroutine/resouce leaks. The screenshot below shows the CPU and memory usage from Filebeat as well as counters for the log entries stating journalctl crashed and was restarted.

2024-09-10_09-14

## Use cases
## Screenshots
## Logs


This is an automatic backport of pull request #40558 done by [Mergify](https://mergify.com).

If journalctl exits unexpectedly the journald input will restart it and set the cursor to the last know position. Any error/non zero return code is logged at level error. There is an exponential backoff that caps at 1 restart every 2s.

(cherry picked from commit a9fb9fa)
@mergify mergify bot requested a review from a team as a code owner September 11, 2024 20:45
@mergify mergify bot added the backport label Sep 11, 2024
@mergify mergify bot requested review from faec and leehinman and removed request for a team September 11, 2024 20:45
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Sep 11, 2024
@belimawr belimawr enabled auto-merge (squash) September 13, 2024 18:59
@belimawr belimawr added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Sep 13, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Sep 13, 2024
@belimawr belimawr merged commit 8e1fc87 into 8.x Sep 13, 2024
122 of 124 checks passed
@belimawr belimawr deleted the mergify/bp/8.x/pr-40558 branch September 13, 2024 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants