Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NETOBSERV-1129: Add kernel version chk to disable hooks not supported on older version #144

Merged
merged 1 commit into from
Jul 6, 2023
Merged

Conversation

msherif1234
Copy link
Contributor

@msherif1234 msherif1234 commented Jul 5, 2023

TCPDrop hook doesn't exists on older kernelwe check for the following version and ignore tcpdrop for any older versions

sh-4.4# uname -r
5.14.0-284.20.1.el9_2.aarch64
  • Testing with OCP 4.13 cluster
    • enable tcpDrop
$ oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.13.0-0.nightly-2023-06-20-224158   True        False         False      31m     

$ oc exec -i -n default dbgtools-deployment-85dff5ff6-92242 -- bpftool perf show
pid 46120  fd 8: prog_id 4  tracepoint  kfree_skb
  • Testing with OCP 4.12 cluster
    • enable tcpdrop
$ oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.12.0-0.nightly-2023-07-03-233301   True        False         False      2m51s   
baremetal                                  4.12.0-0.nightly-2023-07-03-233301   True        False         False      18m     


$ oc logs -n netobserv-privileged netobserv-ebpf-agent-5cbmw 
time="2023-07-05T20:23:05Z" level=info msg="starting NetObserv eBPF Agent"
time="2023-07-05T20:23:05Z" level=info msg="initializing Flows agent" component=agent.Flows
time="2023-07-05T20:23:05Z" level=info msg="older kernel version not all hooks will be supported" component=utils
time="2023-07-05T20:23:06Z" level=info msg="push CTRL+C or send SIGTERM to interrupt execution"
time="2023-07-05T20:23:06Z" level=info msg="starting Flows agent" component=agent.Flows
time="2023-07-05T20:23:06Z" level=info msg="Flows agent successfully started" component=agent.Flows

$ oc exec -i -n default dbgtools-deployment-574c69cfb8-nnnfr -- bpftool perf show
$ 
  • Added unit-test for the new api
  • manual testing on OCP cluster
    • OCP 4.10
    • OCP 4.12
    • OCP 4.13
    • OCP 4.14

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Jul 5, 2023

@msherif1234: This pull request references NETOBSERV-1129 which is a valid jira issue.

In response to this:

TCPDrop hook doesn't exists on older kernel < 5.17 so when deploy on those older release we need to set SUPPORT_OLD_KERNEL to true

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@msherif1234 msherif1234 changed the title NETOBSERV-1129: Add configuration option to disable hooks not supported ob older kernel NETOBSERV-1129: Add configuration option to disable hooks not supported on older kernel Jul 5, 2023
@codecov
Copy link

codecov bot commented Jul 5, 2023

Codecov Report

Merging #144 (8b87e72) into main (27baf29) will decrease coverage by 1.11%.
The diff coverage is 16.66%.

@@            Coverage Diff             @@
##             main     #144      +/-   ##
==========================================
- Coverage   40.11%   39.00%   -1.11%     
==========================================
  Files          31       31              
  Lines        2124     2238     +114     
==========================================
+ Hits          852      873      +21     
- Misses       1227     1314      +87     
- Partials       45       51       +6     
Flag Coverage Δ
unittests 39.00% <16.66%> (-1.11%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/agent/agent.go 37.84% <0.00%> (-1.37%) ⬇️
pkg/ebpf/tracer.go 0.00% <0.00%> (ø)
pkg/flow/record.go 64.10% <28.57%> (-7.33%) ⬇️
pkg/utils/utils.go 46.42% <38.77%> (-53.58%) ⬇️
pkg/exporter/proto.go 82.97% <100.00%> (+0.24%) ⬆️

TCPDrops: cfg.EnableTCPDrops,
DNSTracker: cfg.EnableDNSTracking,
EnableRTT: cfg.EnableRTT,
SupportOldKernel: cfg.SupportOldKernel,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the agent could determine that on its own, without requiring a configuration?

Copy link
Contributor Author

@msherif1234 msherif1234 Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kernel versions upstream and downstream don't align that will make it hard for me to check on special kernel for example ocp latest showing

5.14.0-284.20.1.el9_2.aarch64

while skb_free is actually in 5.17 upstream

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somewhere in comments you wrote: For older kernel (< 5.17) kfree_sbk drop hook doesn't exists so couldn't we just pick the kernel version and compare with 5.17 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@msherif1234 msherif1234 Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood but rhel9 has 5.14 and its supported there, my point is rhel 5.14 isn't the same as fc 5.14 I assume it has contents from newer kernel too so I used 5.14 as base for my check

@jotak
Copy link
Member

jotak commented Jul 5, 2023

Code LGTM, maybe it can be improved by having the agent find the kernel version and determine itself if it must run without some hooks; this way we won't even have to worry about that on the operator side.

It must be tested with several ocp versions (ideally, all supported) - I can try some tomorrow...

@dushyantbehl
Copy link
Contributor

Code LGTM, maybe it can be improved by having the agent find the kernel version and determine itself if it must run without some hooks; this way we won't even have to worry about that on the operator side.

It must be tested with several ocp versions (ideally, all supported) - I can try some tomorrow...

The userspace side of code can potentially run some command to find out kernel version if needed but would we wanna disable it at runtime or fail to run if older kernel is detected but config is set to enable all hooks. I am asking this because certain nodes of a cluster can have older kernel and in that case agent disabling some feature without informing might be a problem for some users.

@msherif1234 msherif1234 changed the title NETOBSERV-1129: Add configuration option to disable hooks not supported on older kernel NETOBSERV-1129: Add kerenl version chk to disable hooks not supported on older kernel Jul 5, 2023
@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Jul 5, 2023

@msherif1234: This pull request references NETOBSERV-1129 which is a valid jira issue.

In response to this:

TCPDrop hook doesn't exists on older kernelwe check for the following version and ignore tcpdrop for any older versions

sh-4.4# uname -r
5.14.0-284.20.1.el9_2.aarch64

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@msherif1234
Copy link
Contributor Author

Code LGTM, maybe it can be improved by having the agent find the kernel version and determine itself if it must run without some hooks; this way we won't even have to worry about that on the operator side.
It must be tested with several ocp versions (ideally, all supported) - I can try some tomorrow...

The userspace side of code can potentially run some command to find out kernel version if needed but would we wanna disable it at runtime or fail to run if older kernel is detected but config is set to enable all hooks. I am asking this because certain nodes of a cluster can have older kernel and in that case agent disabling some feature without informing might be a problem for some users.

w/o doing anything the code as it stands will fail on older releases < 4.13 so the ask was to avoid failing on older release and I displayed warning plus in our doc we state clearly the kernel dependency so users will be aware

@msherif1234
Copy link
Contributor Author

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 5, 2023
@github-actions
Copy link

github-actions bot commented Jul 5, 2023

New image: quay.io/netobserv/netobserv-ebpf-agent:fd08b88. It will expire after two weeks.

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 5, 2023
@msherif1234
Copy link
Contributor Author

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 5, 2023
@github-actions
Copy link

github-actions bot commented Jul 5, 2023

New image: quay.io/netobserv/netobserv-ebpf-agent:ffe1789. It will expire after two weeks.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Jul 5, 2023

@msherif1234: This pull request references NETOBSERV-1129 which is a valid jira issue.

In response to this:

TCPDrop hook doesn't exists on older kernelwe check for the following version and ignore tcpdrop for any older versions

sh-4.4# uname -r
5.14.0-284.20.1.el9_2.aarch64
  • Testing with OCP 4.13 cluster
  • enable tcpDrop
$ oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.13.0-0.nightly-2023-06-20-224158   True        False         False      31m     

$ oc exec -i -n default dbgtools-deployment-85dff5ff6-92242 -- bpftool perf show
pid 46120  fd 8: prog_id 4  tracepoint  kfree_skb
  • Testing with OCP 4.12 cluster
    • enable tcpdrop
$ oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.12.0-0.nightly-2023-07-03-233301   True        False         False      2m51s   
baremetal                                  4.12.0-0.nightly-2023-07-03-233301   True        False         False      18m     


$ oc logs -n netobserv-privileged netobserv-ebpf-agent-5cbmw 
time="2023-07-05T20:23:05Z" level=info msg="starting NetObserv eBPF Agent"
time="2023-07-05T20:23:05Z" level=info msg="initializing Flows agent" component=agent.Flows
time="2023-07-05T20:23:05Z" level=info msg="older kernel version not all hooks will be supported" component=utils
time="2023-07-05T20:23:06Z" level=info msg="push CTRL+C or send SIGTERM to interrupt execution"
time="2023-07-05T20:23:06Z" level=info msg="starting Flows agent" component=agent.Flows
time="2023-07-05T20:23:06Z" level=info msg="Flows agent successfully started" component=agent.Flows

$ oc exec -i -n default dbgtools-deployment-574c69cfb8-nnnfr -- bpftool perf show
$ 

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jotak
Copy link
Member

jotak commented Jul 6, 2023

Tested on 4.10 and works like a charm!

Signed-off-by: msherif1234 <mmahmoud@redhat.com>
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 6, 2023
@msherif1234 msherif1234 requested a review from jotak July 6, 2023 12:24
@msherif1234
Copy link
Contributor Author

Tested on OCP 4.14

$ oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.14.0-0.nightly-2023-07-05-191022   True        False         False      4m10s   


$ oc exec -i -n default dbgtools-deployment-5c8c449466-p2kgf -- bpftool perf show
pid 17628  fd 7: prog_id 4  tracepoint  kfree_skb

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Jul 6, 2023

@msherif1234: This pull request references NETOBSERV-1129 which is a valid jira issue.

In response to this:

TCPDrop hook doesn't exists on older kernelwe check for the following version and ignore tcpdrop for any older versions

sh-4.4# uname -r
5.14.0-284.20.1.el9_2.aarch64
  • Testing with OCP 4.13 cluster
  • enable tcpDrop
$ oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.13.0-0.nightly-2023-06-20-224158   True        False         False      31m     

$ oc exec -i -n default dbgtools-deployment-85dff5ff6-92242 -- bpftool perf show
pid 46120  fd 8: prog_id 4  tracepoint  kfree_skb
  • Testing with OCP 4.12 cluster
    • enable tcpdrop
$ oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.12.0-0.nightly-2023-07-03-233301   True        False         False      2m51s   
baremetal                                  4.12.0-0.nightly-2023-07-03-233301   True        False         False      18m     


$ oc logs -n netobserv-privileged netobserv-ebpf-agent-5cbmw 
time="2023-07-05T20:23:05Z" level=info msg="starting NetObserv eBPF Agent"
time="2023-07-05T20:23:05Z" level=info msg="initializing Flows agent" component=agent.Flows
time="2023-07-05T20:23:05Z" level=info msg="older kernel version not all hooks will be supported" component=utils
time="2023-07-05T20:23:06Z" level=info msg="push CTRL+C or send SIGTERM to interrupt execution"
time="2023-07-05T20:23:06Z" level=info msg="starting Flows agent" component=agent.Flows
time="2023-07-05T20:23:06Z" level=info msg="Flows agent successfully started" component=agent.Flows

$ oc exec -i -n default dbgtools-deployment-574c69cfb8-nnnfr -- bpftool perf show
$ 
  • Added unit-test for the new api
  • manual testing on OCP cluster
  • OCP 4.10
  • OCP 4.12
  • OCP 4.13
  • OCP 4.14

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@msherif1234 msherif1234 changed the title NETOBSERV-1129: Add kerenl version chk to disable hooks not supported on older kernel NETOBSERV-1129: Add kernel version chk to disable hooks not supported on older kernel Jul 6, 2023
@msherif1234 msherif1234 changed the title NETOBSERV-1129: Add kernel version chk to disable hooks not supported on older kernel NETOBSERV-1129: Add kernel version chk to disable hooks not supported on older version Jul 6, 2023
@jotak
Copy link
Member

jotak commented Jul 6, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Jul 6, 2023
@msherif1234
Copy link
Contributor Author

/approve

@openshift-ci
Copy link

openshift-ci bot commented Jul 6, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: msherif1234

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants