Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NETOBSERV-1805: threads are leaking with continous adding and deleting pods [BP 1.7] #427

Merged
merged 1 commit into from
Oct 3, 2024

Conversation

msherif1234
Copy link
Contributor

Signed-off-by: Mohamed Mahmoud mmahmoud@redhat.com
(cherry picked from commit 6111b98)

Description

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist.
  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
    • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
    • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
    • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Oct 1, 2024

@msherif1234: This pull request references NETOBSERV-1805 which is a valid jira issue.

In response to this:

Signed-off-by: Mohamed Mahmoud mmahmoud@redhat.com
(cherry picked from commit 6111b98)

Description

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist.
  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@msherif1234 msherif1234 changed the title NETOBSERV-1805: threads are leaking with continous adding and deleting pods NETOBSERV-1805: threads are leaking with continous adding and deleting pods [BP 1.7] Oct 1, 2024
@msherif1234
Copy link
Contributor Author

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Oct 1, 2024
Copy link

github-actions bot commented Oct 1, 2024

New image:
quay.io/netobserv/netobserv-ebpf-agent:f9c17c7

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=f9c17c7 make set-agent-image

@jotak
Copy link
Member

jotak commented Oct 2, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Oct 2, 2024
Copy link

codecov bot commented Oct 2, 2024

Codecov Report

Attention: Patch coverage is 15.00000% with 17 lines in your changes missing coverage. Please review.

Please upload report for BASE (release-1.7@be16107). Learn more about missing BASE report.
Report is 1 commits behind head on release-1.7.

Files with missing lines Patch % Lines
pkg/ifaces/watcher.go 15.00% 16 Missing and 1 partial ⚠️
Additional details and impacted files
@@              Coverage Diff               @@
##             release-1.7     #427   +/-   ##
==============================================
  Coverage               ?   30.00%           
==============================================
  Files                  ?       50           
  Lines                  ?     4090           
  Branches               ?        0           
==============================================
  Hits                   ?     1227           
  Misses                 ?     2756           
  Partials               ?      107           
Flag Coverage Δ
unittests 30.00% <15.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
pkg/ifaces/watcher.go 54.16% <15.00%> (ø)

@openshift-ci openshift-ci bot removed the lgtm label Oct 3, 2024
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Oct 3, 2024
@msherif1234
Copy link
Contributor Author

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Oct 3, 2024
Copy link

github-actions bot commented Oct 3, 2024

New image:
quay.io/netobserv/netobserv-ebpf-agent:0b40ac4

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=0b40ac4 make set-agent-image

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Oct 3, 2024
@msherif1234
Copy link
Contributor Author

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Oct 3, 2024
Copy link

github-actions bot commented Oct 3, 2024

New image:
quay.io/netobserv/netobserv-ebpf-agent:d71f5c2

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=d71f5c2 make set-agent-image

@@ -31,6 +31,7 @@ type Watcher struct {
linkSubscriberAt func(ns netns.NsHandle, ch chan<- netlink.LinkUpdate, done <-chan struct{}) error
mutex *sync.Mutex
netnsWatcher *fsnotify.Watcher
nsDone map[string]chan struct{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those new channels seem never closed, isn't it another source of leak?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added defer in sendUpdates to close them

@@ -191,9 +195,15 @@ func (w *Watcher) netnsNotify(ctx context.Context, out chan Event) {
}
if event.Op&fsnotify.Create == fsnotify.Create {
ns := filepath.Base(event.Name)
log.WithField("netns", ns).Debug("netns notification")
log.WithField("netns", ns).Debug("netns create notification")
w.nsDone[ns] = make(chan struct{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when creating a new channel for map key ns, is there any risk of collision / leak if there was already previously a channel stored in w.nsDone[ns] ? Might be safer to first check if it isn't empty, close the old channel ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that will be an issue as we suppose to know about netns creation or deletion once

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Oct 3, 2024
@msherif1234 msherif1234 requested a review from jotak October 3, 2024 11:09
@msherif1234
Copy link
Contributor Author

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Oct 3, 2024
Copy link

github-actions bot commented Oct 3, 2024

New image:
quay.io/netobserv/netobserv-ebpf-agent:21e1544

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=21e1544 make set-agent-image

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Oct 3, 2024
@msherif1234
Copy link
Contributor Author

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Oct 3, 2024
Copy link

github-actions bot commented Oct 3, 2024

New image:
quay.io/netobserv/netobserv-ebpf-agent:74d90ea

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=74d90ea make set-agent-image

if event.Op&fsnotify.Remove == fsnotify.Remove {
ns := filepath.Base(event.Name)
log.WithField("netns", ns).Debug("netns delete notification")
w.nsDone[ns] <- struct{}{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a risk that this code is called after the channel was closed and deleted?
ie. w.sendUpdates finishes, and then we get a fsnotify.Remove?
(just asking)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code in sendUpdates is blocking won't finish till done is happen which is triggered via fsnotify

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a check around the remove to be safe

…g pods

Signed-off-by: Mohamed Mahmoud <mmahmoud@redhat.com>
(cherry picked from commit 49d3add)
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Oct 3, 2024
@msherif1234
Copy link
Contributor Author

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Oct 3, 2024
@msherif1234 msherif1234 requested a review from jotak October 3, 2024 12:49
Copy link

github-actions bot commented Oct 3, 2024

New image:
quay.io/netobserv/netobserv-ebpf-agent:fd27950

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=fd27950 make set-agent-image

@jotak
Copy link
Member

jotak commented Oct 3, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Oct 3, 2024
@msherif1234
Copy link
Contributor Author

/approve

Copy link

openshift-ci bot commented Oct 3, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: msherif1234

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Oct 3, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 66df1f8 into netobserv:release-1.7 Oct 3, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved jira/valid-reference lgtm ok-to-test To set manually when a PR is safe to test. Triggers image build on PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants