Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for reading from UNIX datagram sockets #22699

Merged
merged 15 commits into from
Nov 30, 2020

Conversation

kvch
Copy link
Contributor

@kvch kvch commented Nov 20, 2020

What does this PR do?

This PR adds support for reading from UNIX datagram sockets both from the unix input and the syslog input. A new option is added to select the type of the socket named socket_type. Available options are: stream and datagram.

Why is it important?

A few applications which send logs over Unix sockets, use datagrams not streams. From now on, Filebeat can accept input from these applications as well.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
    - [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Related issues

Closes #18632

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Nov 20, 2020
@elasticmachine
Copy link
Collaborator

elasticmachine commented Nov 20, 2020

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: kvch commented: /test filebeat

  • Start Time: 2020-11-30T12:35:18.296+0000

  • Duration: 46 min 28 sec

Test stats 🧪

Test Results
Failed 0
Passed 4523
Skipped 565
Total 5088

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 4523
Skipped 565
Total 5088

@andresrc andresrc added the Team:Services (Deprecated) Label for the former Integrations-Services team label Nov 21, 2020
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Nov 21, 2020
@kvch kvch changed the title Feature filebeat unix datagram socket Add support for reading from UNIX datagram sockets Nov 23, 2020
@kvch kvch marked this pull request as ready for review November 23, 2020 14:26
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-services (Team:Services)

filebeat/inputsource/unix/config.go Show resolved Hide resolved
filebeat/inputsource/unix/config.go Outdated Show resolved Hide resolved
filebeat/inputsource/unix/server.go Show resolved Hide resolved
filebeat/inputsource/unix/server.go Outdated Show resolved Hide resolved
filebeat/input/unix/input.go Outdated Show resolved Hide resolved
filebeat/inputsource/common/dgram/server.go Outdated Show resolved Hide resolved
filebeat/inputsource/common/dgram/server.go Outdated Show resolved Hide resolved
return strings.Contains(err.Error(), windowErrBuffer)
}

func (l *Listener) Run(ctx context.Context) error {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Run returns on error the input stops hard. Are there some errors we can handle gracefully? e.g.

  • unix file can not be opened by current user => retry with backoff (user can adjust permissions)
  • another filebeat instance is listening on unix socket => retry with backoff (other FB instance might go down => HA setup?)
  • UDP port is currently take. Similar to unix socket, if it is taken by another filebeat instance we might want to retry.
  • Do want to allow the handler to take down the server? e.g. distinguish between critical or non-critical errors? Or do we mandate that the handler MUST ONLY return an error if it is not Recoverable from within the handler (IO error). On IO error, can we recover by trying to reopen the connection and listen?

All in all, the knowledge if an error is 'fatal' for the input or not lives in the actual inputsource implementation, not here. It looks like you just moved the logic from the current UDP server (refactoring). Maybe let us create an issue instead of fixing/improving error handling here. I would presume that we can improve error handling in the TCP/unix-streaming inputs as well.

}

l.tg.Go(func(ctx unison.Canceler) error {
connCtx, connCancel := ctxtool.WithFunc(ctxtool.FromCanceller(ctx), func() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks similar to run. Do we want 'Start' to call Run or have a common func (l *Listener) readLoop(...)?

What about error handling in Start. If listener fails would it make sense to retry similar to Run and only fail initially if the error is really non-recoverable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about error handling in Start. If listener fails would it make sense to retry similar to Run and only fail initially if the error is really non-recoverable?

To me Start sounds like a function you would expect it try to start something and return if it cannot. Run is more like constantly retrying to do something.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why "non-recoverable". If the error can be recovered from I would expect the internal runLoop to continue trying to connect and collect data if possible. Otherwise I will be forced to monitor the health of each input (not really supported yet) and restart filebeat until it eventually went healthy.

If some component implements Start and Run, I would not expect them to behave differently depending on which method I use (beside Run should block and Start return immediately).

filebeat/inputsource/tcp/server_test.go Show resolved Hide resolved
filebeat/tests/system/test_unix.py Outdated Show resolved Hide resolved
@kvch kvch force-pushed the feature-filebeat-unix-datagram-socket branch 2 times, most recently from 7904cc7 to 29f0f1f Compare November 24, 2020 17:22
@kvch kvch force-pushed the feature-filebeat-unix-datagram-socket branch from 29f0f1f to 8a9f0a8 Compare November 24, 2020 17:24
sock.send(m.encode("utf-8"))
sock = send_over_socket(path,
"<13>Oct 11 22:14:15 wopr.mymachine.co postfix/smtpd[2000]:"
" 'su root' failed for lonvick on /dev/pts/8 {}\n")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we want to send more than one event in this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every test sends three events with the same message ending with the index of that message. You mean that we should send a different message every time we send an event?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But also consider to send very big messages that might not fit the read buffer. Test the edge cases. Plus if something fails it should be more obvious which exact message failed or was missing.

@urso
Copy link

urso commented Nov 24, 2020

The test failure looks like the Listener blocks in the waitgroup. A full stack trace for all go-routines is missing, but the streaming listener loop seems to not unblock which blocks 'Stop' during shutdown.

@kvch kvch force-pushed the feature-filebeat-unix-datagram-socket branch from 0a3f972 to 413cd27 Compare November 26, 2020 13:47
@kvch
Copy link
Contributor Author

kvch commented Nov 26, 2020

jenkins run tests

@kvch
Copy link
Contributor Author

kvch commented Nov 26, 2020

/test filebeat

2 similar comments
@kvch
Copy link
Contributor Author

kvch commented Nov 30, 2020

/test filebeat

@kvch
Copy link
Contributor Author

kvch commented Nov 30, 2020

/test filebeat

@kvch kvch merged commit 3a1d1ae into elastic:master Nov 30, 2020
@kvch kvch added the needs_backport PR is waiting to be backported to other branches. label Nov 30, 2020
@kvch kvch added v7.11.0 and removed needs_backport PR is waiting to be backported to other branches. labels Nov 30, 2020
kvch added a commit to kvch/beats that referenced this pull request Nov 30, 2020
## What does this PR do?

This PR adds support for reading from UNIX datagram sockets both from the `unix` input and the `syslog` input. A new option is added to select the type of the socket named `socket_type`. Available options are: `stream` and `datagram`.

## Why is it important?

A few applications which send logs over Unix sockets, use datagrams not streams. From now on, Filebeat can accept input from these applications as well.

Closes elastic#18632

(cherry picked from commit 3a1d1ae)
kvch added a commit that referenced this pull request Nov 30, 2020
## What does this PR do?

This PR adds support for reading from UNIX datagram sockets both from the `unix` input and the `syslog` input. A new option is added to select the type of the socket named `socket_type`. Available options are: `stream` and `datagram`.

## Why is it important?

A few applications which send logs over Unix sockets, use datagrams not streams. From now on, Filebeat can accept input from these applications as well.

Closes #18632

(cherry picked from commit 3a1d1ae)
v1v added a commit to v1v/beats that referenced this pull request Dec 2, 2020
…-issues

* upstream/master: (41 commits)
  Fix version parser regex for packaging (elastic#22581)
  Fix local_dynamic documentation and add providers inline doc. (elastic#22657)
  fix: use proper param name for e2e tests (elastic#22836)
  [Heartbeat] Fix exit on disabled monitor (elastic#22829)
  Update Golang to 1.14.12 (elastic#22790)
  docs: fix setup.template.overwrite typos (elastic#22804)
  Add docs section for ECS EC2 monitoring (elastic#22784)
  Fixing logic to keep list of unique cluster UUIDs (elastic#22808)
  Skip somewhat flaky UDP system test on Windows (elastic#22810)
  Fix polling node when it is not ready and monitor by hostname (elastic#22666)
  Skip Filebeat test_shutdown on windows 7 (elastic#22797)
  Make monitoring Namespace thread-safe (elastic#22640)
  Drop pkt_dstaddr and pkt_srcaddr when equals to "-" (elastic#22721)
  Add support for reading from UNIX datagram sockets (elastic#22699)
  Fix export dashboard command from Elastic Cloud (elastic#22746)
  Skip flaky winlogbeat test on Windows-7 (elastic#22754)
  Missing `>` (elastic#22763) (elastic#22766)
  Fix k8s watcher issue when node access to list nodes and ns (elastic#22714)
  [Metricbeat/Kibana/stats] Enforce `exclude_usage=true` (elastic#22732)
  Avoid sending non-numeric floats in cloud foundry integrations (elastic#22634)
  ...
v1v added a commit to v1v/beats that referenced this pull request Dec 2, 2020
…dows-7

* upstream/master: (41 commits)
  Fix version parser regex for packaging (elastic#22581)
  Fix local_dynamic documentation and add providers inline doc. (elastic#22657)
  fix: use proper param name for e2e tests (elastic#22836)
  [Heartbeat] Fix exit on disabled monitor (elastic#22829)
  Update Golang to 1.14.12 (elastic#22790)
  docs: fix setup.template.overwrite typos (elastic#22804)
  Add docs section for ECS EC2 monitoring (elastic#22784)
  Fixing logic to keep list of unique cluster UUIDs (elastic#22808)
  Skip somewhat flaky UDP system test on Windows (elastic#22810)
  Fix polling node when it is not ready and monitor by hostname (elastic#22666)
  Skip Filebeat test_shutdown on windows 7 (elastic#22797)
  Make monitoring Namespace thread-safe (elastic#22640)
  Drop pkt_dstaddr and pkt_srcaddr when equals to "-" (elastic#22721)
  Add support for reading from UNIX datagram sockets (elastic#22699)
  Fix export dashboard command from Elastic Cloud (elastic#22746)
  Skip flaky winlogbeat test on Windows-7 (elastic#22754)
  Missing `>` (elastic#22763) (elastic#22766)
  Fix k8s watcher issue when node access to list nodes and ns (elastic#22714)
  [Metricbeat/Kibana/stats] Enforce `exclude_usage=true` (elastic#22732)
  Avoid sending non-numeric floats in cloud foundry integrations (elastic#22634)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Services (Deprecated) Label for the former Integrations-Services team v7.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for ~UDP~ Datagram unix socket input
4 participants