Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NETOBSERV-1091: remove CO-RE file and extensions as that causes douple allocations #133

Merged
merged 1 commit into from
Jun 14, 2023

Conversation

msherif1234
Copy link
Contributor

@msherif1234 msherif1234 commented Jun 9, 2023

it was noticed cpu and memory are higher than 1.2 release
that change was triggered by #110
it seems using co-re results in a significant memory and cpu increase cilium/ebpf#1063 is opened against cilium/ebpf to track this, issue has been seen even with latest version of the lib, we don't use CO-RE apis at all so for now we will remove it and once we have fix we will re-enable

diag on left side with this PR changes the diag on right is with 1.3 baseline
image
image

Signed-off-by: msherif1234 <mmahmoud@redhat.com>
@msherif1234
Copy link
Contributor Author

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jun 9, 2023
@github-actions
Copy link

github-actions bot commented Jun 9, 2023

New image: quay.io/netobserv/netobserv-ebpf-agent:5fee00e. It will expire after two weeks.

@codecov
Copy link

codecov bot commented Jun 9, 2023

Codecov Report

Merging #133 (42e6212) into main (cd4974f) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main     #133   +/-   ##
=======================================
  Coverage   40.60%   40.60%           
=======================================
  Files          31       31           
  Lines        2054     2054           
=======================================
  Hits          834      834           
  Misses       1181     1181           
  Partials       39       39           
Flag Coverage Δ
unittests 40.60% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

@msherif1234
Copy link
Contributor Author

libbpf/libbpf#697

@msherif1234 msherif1234 changed the title remove CO-RE file and extensions as that causes douple allocations NETOBSERV-1091: remove CO-RE file and extensions as that causes douple allocations Jun 9, 2023
@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Jun 9, 2023

@msherif1234: This pull request references NETOBSERV-1091 which is a valid jira issue.

In response to this:

it was noticed cpu and memory are higher than 1.2 release
that change was triggered by #110
it seems using co-re results in a significant memory and cpu increase we will open an issue against libbpf as this issue been seen even with latest version of the lib, we don't use CO-RE apis at all so for now we will remove it and its use to mitigate from this issue

data on left side for this PR, the data on right is 1.3 baseline
image
image

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Jun 10, 2023

@msherif1234: This pull request references NETOBSERV-1091 which is a valid jira issue.

In response to this:

it was noticed cpu and memory are higher than 1.2 release
that change was triggered by #110
it seems using co-re results in a significant memory and cpu increase libbpf/libbpf#697 is opened against libbpf to track this, issue has been seen even with latest version of the lib, we don't use CO-RE apis at all so for now we will remove it and once we have libbpf fix we will re-enable

data on left side for this PR, the data on right is 1.3 baseline
image
image

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Jun 10, 2023

@msherif1234: This pull request references NETOBSERV-1091 which is a valid jira issue.

In response to this:

it was noticed cpu and memory are higher than 1.2 release
that change was triggered by #110
it seems using co-re results in a significant memory and cpu increase libbpf/libbpf#697 is opened against libbpf to track this, issue has been seen even with latest version of the lib, we don't use CO-RE apis at all so for now we will remove it and once we have libbpf fix we will re-enable

diag on left side with this PR changes the diag on right is with 1.3 baseline
image
image

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@@ -13,11 +13,11 @@
until an entry is available.
4) When hash collision is detected, we send the new entry to userpace via ringbuffer.
*/
#define BPF_NO_PRESERVE_ACCESS_INDEX
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment explaining briefly the reason (or just linking to the JIRA and/or libpbf issue) ?

@jotak
Copy link
Member

jotak commented Jun 13, 2023

I'm fine to merge it and also backport on 1.3
@msherif1234 can you open the backport PR as well ?

@msherif1234
Copy link
Contributor Author

I'm fine to merge it and also backport on 1.3 @msherif1234 can you open the backport PR as well ?

I linked the issue as well as the PR that trigger this in the PR description shouldn't that be enough ?

@nathan-weinberg
Copy link

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved QE has approved this pull request label Jun 13, 2023
@jotak
Copy link
Member

jotak commented Jun 14, 2023

/lgtm

@jotak
Copy link
Member

jotak commented Jun 14, 2023

/approve
muchas gracias @msherif1234 !!!

@openshift-ci
Copy link

openshift-ci bot commented Jun 14, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jotak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit db373ec into netobserv:main Jun 14, 2023
openshift-merge-robot pushed a commit that referenced this pull request Jun 23, 2023
* Optimize ebpf agent map memory usage
- switch to use pointer to metric instead of metric
- manuall trigger GC after flow eviction complete

Signed-off-by: msherif1234 <mmahmoud@redhat.com>

* Fix memory and cpu scale issue work around in #133

following up on cilium/ebpf#1063
it seems we have a way to fix resources issues

Signed-off-by: msherif1234 <mmahmoud@redhat.com>
(cherry picked from commit b9c9a03)

---------

Signed-off-by: msherif1234 <mmahmoud@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved jira/valid-reference lgtm ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. qe-approved QE has approved this pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants