Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Filebeat] Fix reference leak in TCP and Unix socket inputs #19459

Merged
merged 4 commits into from
Jun 29, 2020

Conversation

andrewkroh
Copy link
Member

What does this PR do?

The tcp and unix input sources were leaking references causing a memory leak.
When an accepted connection ended inputsource/common.Closer was
supposed to delete the pointer that it held to the connection, but due to a code
error delete was being called on the wrong map.

Why is it important?

This fixes a memory leak with the tcp and unix inputs.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

@andrewkroh andrewkroh added bug Filebeat Filebeat needs_backport PR is waiting to be backported to other branches. Team:Services (Deprecated) Label for the former Integrations-Services team labels Jun 26, 2020
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Jun 26, 2020
@andrewkroh andrewkroh force-pushed the bugfix/fb/tcp-unix-input-leak branch from 86c05b5 to 11d959c Compare June 26, 2020 14:17
@andrewkroh andrewkroh marked this pull request as ready for review June 26, 2020 14:17
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-services (Team:Services)

@urso
Copy link

urso commented Jun 26, 2020

Good find. Please backport to 7.8 as well before FF.

@elasticmachine
Copy link
Collaborator

elasticmachine commented Jun 26, 2020

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #19459 updated]

  • Start Time: 2020-06-29T21:15:16.314+0000

  • Duration: 25 min 5 sec

Test stats 🧪

Test Results
Failed 0
Passed 543
Skipped 127
Total 670

@andrewkroh andrewkroh requested a review from urso June 26, 2020 20:35
@andrewkroh
Copy link
Member Author

@urso Can you please take another look. My fix triggered race conditions so I took a different approach and dropped in a cancellable context in place of the common.Closer. d77ab50

Copy link

@urso urso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I consider context and WaitGroup quite low level primitives. They require you to 'track' usage and even free resources (cancel functon MUST be called). We've got some helpers here: https://pkg.go.dev/github.com/elastic/go-concert@v0.0.3/ctxtool?tab=doc, and some kind of with cancelation here: https://pkg.go.dev/github.com/elastic/go-concert?tab=doc#OnceSignaler

The context in Listener could be replaced with OnceSignaler and the stop could be replaced handled via 'WithFunc'.

For higher level primitives managing ownership of groups of go-routines check out the Group interface here: https://pkg.go.dev/github.com/elastic/go-concert@v0.0.3/unison?tab=doc#Group
We've got a sample implementation with TaskGroup. If you keep StopOnError false and do not return an error the implementation perfectly fits this use-case. starting the handler routine would look like this:

l.handlerGroup.Go(func(canceler unison.Canceler) error {
  ctx, cancel := ctxtool.WithFunc(ctxtool.FromCanceller(canceler), func() {
    conn.Close()
  })
  defer cancel()

  defer logp.Recover(...)

  ...

  return nil
})

The group combines shutdown signaling and the WaitGroup when calling 'Stop'. Once 'Stop' has been initiated any attempt to start another handler via Go will return an error, because we are about to shutdown.

go func() {
<-ctx.Done()
conn.Close()
}()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The handler go-routine is started always. Let's move cancellation setup local to the that go-routine. Consider to use ctxtool.WithFunc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I updated it to use ctxtool.WithFunc.

The tcp and unix input sources were leaking references causing a memory leak.
When an accepted connection ended inputsource/common.Closer was
supposed to delete the pointer that it held to the connection, but due to a code
error `delete` was being called on the wrong map.
The inputsource/common.Closer was managing bidirectional links between parents and children.
Anytime you closed an instance it would close all of its children and also remove itself from its
parents list of children (this is where the bug was). Every instance has its own mutex. While recursively
closing children it was easy to run into a deadlock because the parent holds a lock while closing
its children and then the child must edit the parent to remove itself so it also tries to acquire
the parent lock.

Instead of modifying the common.Closer I replaced it with a cancellable context.Context which
is designed to propagate signals from parent to children and requires less code.
@andrewkroh andrewkroh force-pushed the bugfix/fb/tcp-unix-input-leak branch from 96a4155 to b524da0 Compare June 29, 2020 20:42
@andrewkroh andrewkroh added v7.9.0 and removed needs_backport PR is waiting to be backported to other branches. labels Jun 29, 2020
@ph
Copy link
Contributor

ph commented Jun 29, 2020

fyi @vjsamuel this might affect you?

@vjsamuel
Copy link
Contributor

Thanks. I will need to redo our internal input during the next rebase. Moving to context is the right thing to do however.

v1v added a commit to v1v/beats that referenced this pull request Jul 2, 2020
…ne-beats

* upstream/master: (105 commits)
  ci: enable packaging job (elastic#19536)
  ci: disable upstream trigger on PRs for the packaging job (elastic#19490)
  Implement memlog on-disk handling (elastic#19408)
  fix go.mod for PR elastic#19423 (elastic#19521)
  [MetricBeat] add param `aws_partition` to support aws-cn, aws-us-gov regions (elastic#19423)
  Input v2 stateless manager (elastic#19406)
  Input v2 compatibility layer (elastic#19401)
  [Elastic Agent] Fix artifact downloading to allow endpoint-security to be downloaded (elastic#19503)
  fix: ignore target changes on scans (elastic#19510)
  Add more helpers to pipeline/testing package (elastic#19405)
  Report dependencies in CSV format (elastic#19506)
  [Filebeat] Fix reference leak in TCP and Unix socket inputs (elastic#19459)
  Cursor input skeleton (elastic#19378)
  Add changelog. (elastic#19495)
  [DOC] Typo in Kerberos (elastic#19265)
  Remove accidentally commited unused NOTICE template (elastic#19485)
  [Elastic Agent] Support the install, control, and uninstall of Endpoint (elastic#19248)
  [Filebeat][httpjson] Add split_events_by config setting (elastic#19246)
  ci: disabling packaging job until we fix it (elastic#19481)
  Fix golang.org/x/tools to release1.13 (elastic#19478)
  ...
andrewkroh added a commit to andrewkroh/beats that referenced this pull request Jul 13, 2020
…19459)

The tcp and unix input sources were leaking references causing a memory leak.
When an accepted connection ended inputsource/common.Closer was
supposed to delete the pointer that it held to the connection, but due to a code
error `delete` was being called on the wrong map.

Instead of modifying the common.Closer I replaced it with a cancellable context.Context which
is designed to propagate signals from parent to children and requires less code.

(cherry picked from commit 61f4846)
andrewkroh added a commit to andrewkroh/beats that referenced this pull request Jul 13, 2020
…19459)

The tcp and unix input sources were leaking references causing a memory leak.
When an accepted connection ended inputsource/common.Closer was
supposed to delete the pointer that it held to the connection, but due to a code
error `delete` was being called on the wrong map.

Instead of modifying the common.Closer I replaced it with a cancellable context.Context which
is designed to propagate signals from parent to children and requires less code.

(cherry picked from commit 61f4846)
andrewkroh added a commit that referenced this pull request Jul 13, 2020
…19501)

The tcp and unix input sources were leaking references causing a memory leak.
When an accepted connection ended inputsource/common.Closer was
supposed to delete the pointer that it held to the connection, but due to a code
error `delete` was being called on the wrong map.

Instead of modifying the common.Closer I replaced it with a cancellable context.Context which
is designed to propagate signals from parent to children and requires less code.

(cherry picked from commit 61f4846)
andrewkroh added a commit that referenced this pull request Jul 16, 2020
…nix socket inputs (#19502)

* [Filebeat] Fix reference leak in TCP and Unix socket inputs (#19459)

The tcp and unix input sources were leaking references causing a memory leak.
When an accepted connection ended inputsource/common.Closer was
supposed to delete the pointer that it held to the connection, but due to a code
error `delete` was being called on the wrong map.

Instead of modifying the common.Closer I replaced it with a cancellable context.Context which
is designed to propagate signals from parent to children and requires less code.

(cherry picked from commit 61f4846)

* Update vendor and notice
melchiormoulin pushed a commit to melchiormoulin/beats that referenced this pull request Oct 14, 2020
…19459)

The tcp and unix input sources were leaking references causing a memory leak.
When an accepted connection ended inputsource/common.Closer was
supposed to delete the pointer that it held to the connection, but due to a code
error `delete` was being called on the wrong map.

Instead of modifying the common.Closer I replaced it with a cancellable context.Context which
is designed to propagate signals from parent to children and requires less code.
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…P and Unix socket inputs (elastic#19502)

* [Filebeat] Fix reference leak in TCP and Unix socket inputs (elastic#19459)

The tcp and unix input sources were leaking references causing a memory leak.
When an accepted connection ended inputsource/common.Closer was
supposed to delete the pointer that it held to the connection, but due to a code
error `delete` was being called on the wrong map.

Instead of modifying the common.Closer I replaced it with a cancellable context.Context which
is designed to propagate signals from parent to children and requires less code.

(cherry picked from commit 8eb8ed7)

* Update vendor and notice
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Filebeat Filebeat review Team:Services (Deprecated) Label for the former Integrations-Services team v7.8.1 v7.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants