Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NETOBSERV-1732: add logic to lkup all previous tc filters and remove them #360

Merged
merged 1 commit into from
Jul 1, 2024

Conversation

msherif1234
Copy link
Contributor

Description

Today if ebpf agent pod is killed with SIGKILL the clean up code never get a chance to remove all TC filters and qdisc installed
this PR check all previously installed TC filters and remove them

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist.
  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
    • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
    • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
    • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Jun 27, 2024

@msherif1234: This pull request references NETOBSERV-1732 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.17.0" version, but no target version was set.

In response to this:

Description

Today if ebpf agent pod is killed with SIGKILL the clean up code never get a chance to remove all TC filters and qdisc installed
this PR check all previously installed TC filters and remove them

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist.
  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@msherif1234
Copy link
Contributor Author

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jun 27, 2024
Copy link

New image:
quay.io/netobserv/netobserv-ebpf-agent:29b9188

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=29b9188 make set-agent-image

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jun 27, 2024
@msherif1234
Copy link
Contributor Author

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jun 27, 2024
Copy link

New image:
quay.io/netobserv/netobserv-ebpf-agent:846d29a

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=846d29a make set-agent-image

Copy link

codecov bot commented Jun 27, 2024

Codecov Report

Attention: Patch coverage is 0% with 52 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@484bc41). Learn more about missing BASE report.
Report is 3 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #360   +/-   ##
=======================================
  Coverage        ?   32.67%           
=======================================
  Files           ?       48           
  Lines           ?     3593           
  Branches        ?        0           
=======================================
  Hits            ?     1174           
  Misses          ?     2317           
  Partials        ?      102           
Flag Coverage Δ
unittests 32.67% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
pkg/ebpf/tracer.go 0.00% <0.00%> (ø)

@jotak
Copy link
Member

jotak commented Jun 28, 2024

@msherif1234 I thought it was already managed e.g. here: https://github.com/netobserv/netobserv-ebpf-agent/blob/main/pkg/ebpf/tracer.go#L308 , what's the difference exactly with the existing deletions?

Is it because previously it would only delete for the current interfaces, but not if interfaces have changed in the meantime?

@msherif1234
Copy link
Contributor Author

@msherif1234 I thought it was already managed e.g. here: https://github.com/netobserv/netobserv-ebpf-agent/blob/main/pkg/ebpf/tracer.go#L308 , what's the difference exactly with the existing deletions?

Is it because previously it would only delete for the current interfaces, but not if interfaces have changed in the meantime?

current logic won't work as when the pod restart the FD value won't be the same as old program so the existing code won't ever kick in https://github.com/netobserv/netobserv-ebpf-agent/blob/main/pkg/ebpf/tracer.go#L304

}

for _, f := range filters {
if err := netlink.FilterDel(f); err != nil {
Copy link
Member

@jotak jotak Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how likely it is to get an error here, but to be more resilient you could accumulate all errors in a slice and use errors.Join : that would avoid returning after an error in case there's still filters to delete in the list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can use Aggregrate error too which does the same like errors join

Comment on lines 299 to 309
if strings.HasPrefix(bpfFilter.Name, tcEgressFilterName) {
{
egressDevs = append(egressDevs, l)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: unnecessary {}

Suggested change
if strings.HasPrefix(bpfFilter.Name, tcEgressFilterName) {
{
egressDevs = append(egressDevs, l)
}
}
if strings.HasPrefix(bpfFilter.Name, tcEgressFilterName) {
egressDevs = append(egressDevs, l)
}

Copy link
Member

@jotak jotak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor comments, lgtm otherwise

…them

Signed-off-by: Mohamed Mahmoud <mmahmoud@redhat.com>
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jun 28, 2024
@msherif1234 msherif1234 requested a review from jotak June 28, 2024 15:35
Copy link
Member

@jotak jotak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Jul 1, 2024
@msherif1234
Copy link
Contributor Author

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 1, 2024
Copy link

github-actions bot commented Jul 1, 2024

New image:
quay.io/netobserv/netobserv-ebpf-agent:a705627

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=a705627 make set-agent-image

@memodi
Copy link
Contributor

memodi commented Jul 1, 2024

/label no-qe

@openshift-ci openshift-ci bot added the no-qe This PR doesn't necessitate QE approval label Jul 1, 2024
@msherif1234
Copy link
Contributor Author

/approve

Copy link

openshift-ci bot commented Jul 1, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: msherif1234

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Jul 1, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 285eb25 into netobserv:main Jul 1, 2024
13 checks passed
jotak pushed a commit to jotak/netobserv-agent that referenced this pull request Jul 15, 2024
…them (netobserv#360)

Signed-off-by: Mohamed Mahmoud <mmahmoud@redhat.com>
jotak pushed a commit that referenced this pull request Jul 16, 2024
…them (#360)

Signed-off-by: Mohamed Mahmoud <mmahmoud@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved jira/valid-reference lgtm no-qe This PR doesn't necessitate QE approval ok-to-test To set manually when a PR is safe to test. Triggers image build on PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants