Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NETOBSERV-1198: drop_cause enum in vmlinux are based off old kernel #161

Merged
merged 2 commits into from
Jul 24, 2023

Conversation

msherif1234
Copy link
Contributor

@msherif1234 msherif1234 commented Jul 20, 2023

hook was checking for SKB_DROP_REASON_NOT_SPECIFIED value which supposed to be 2
based on https://elixir.bootlin.com/linux/v6.3/source/include/net/dropreason.h#L96
but the value in the current vmlinux is 1 so we were getting unclassified drops
I updated drop_reason_enum based on kernel that supported this hook rhel9

@msherif1234
Copy link
Contributor Author

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 20, 2023
@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Jul 20, 2023

@msherif1234: This pull request references NETOBSERV-1198 which is a valid jira issue.

In response to this:

hook was checking for SKB_DROP_REASON_NOT_SPECIFIED value which supposed to be 2
based on https://elixir.bootlin.com/linux/v6.3/source/include/net/dropreason.h#L96
but the value in the current vmlinux is 1 so we were getting unclassified drops

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@github-actions
Copy link

New image:
quay.io/netobserv/netobserv-ebpf-agent:f8a29d6

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=f8a29d6 make set-agent-image

@codecov
Copy link

codecov bot commented Jul 20, 2023

Codecov Report

Merging #161 (c831a86) into main (d3f035d) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main     #161   +/-   ##
=======================================
  Coverage   39.25%   39.25%           
=======================================
  Files          31       31           
  Lines        2214     2214           
=======================================
  Hits          869      869           
  Misses       1296     1296           
  Partials       49       49           
Flag Coverage Δ
unittests 39.25% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 20, 2023
@msherif1234
Copy link
Contributor Author

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 20, 2023
@github-actions
Copy link

New image:
quay.io/netobserv/netobserv-ebpf-agent:21472c2

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=21472c2 make set-agent-image

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Jul 20, 2023

@msherif1234: This pull request references NETOBSERV-1198 which is a valid jira issue.

In response to this:

hook was checking for SKB_DROP_REASON_NOT_SPECIFIED value which supposed to be 2
based on https://elixir.bootlin.com/linux/v6.3/source/include/net/dropreason.h#L96
but the value in the current vmlinux is 1 so we were getting unclassified drops

I generated vmlinux off rhel9 OCP node

oc exec -i -n default dbgtools-deployment-84b5cd8957-2s9tl -- bpftool btf dump file /sys/kernel/btf/vmlinux format c > ./vmlinux.h

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Amoghrd
Copy link

Amoghrd commented Jul 20, 2023

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved QE has approved this pull request label Jul 20, 2023
@jotak
Copy link
Member

jotak commented Jul 21, 2023

I generated vmlinux off rhel9 OCP node

@msherif1234 is that going to be incompatible with rhel8-based ocp? @Amoghrd perhaps it should be tested on ocp4.12 as well?

@msherif1234
Copy link
Contributor Author

tcpdrop is disabled with older kernels

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 21, 2023
@msherif1234
Copy link
Contributor Author

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 21, 2023
@github-actions
Copy link

New image:
quay.io/netobserv/netobserv-ebpf-agent:7d49017

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=7d49017 make set-agent-image

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 21, 2023
@msherif1234
Copy link
Contributor Author

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 21, 2023
@github-actions
Copy link

New image:
quay.io/netobserv/netobserv-ebpf-agent:14b4af0

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=14b4af0 make set-agent-image

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 21, 2023
sometimes we can don't have socket struct valid
which we use to populate TCP state in that case
we keep state as 0 and we proceed generating drop
flow

Signed-off-by: msherif1234 <mmahmoud@redhat.com>
@msherif1234
Copy link
Contributor Author

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 21, 2023
@github-actions
Copy link

New image:
quay.io/netobserv/netobserv-ebpf-agent:e0ef45c

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=e0ef45c make set-agent-image

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Jul 21, 2023

@msherif1234: This pull request references NETOBSERV-1198 which is a valid jira issue.

In response to this:

hook was checking for SKB_DROP_REASON_NOT_SPECIFIED value which supposed to be 2
based on https://elixir.bootlin.com/linux/v6.3/source/include/net/dropreason.h#L96
but the value in the current vmlinux is 1 so we were getting unclassified drops
I updated drop_reason_enum based on kernel that supported this hook rhel9

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Amoghrd
Copy link

Amoghrd commented Jul 21, 2023

@jotak Yes, could confirm that the eBPF pod comes up fine with older versions of OCP(tested with 4.12) and could view the log line level=info msg="older kernel version not all hooks will be supported" component=utils.

Comment on lines 35487 to 35555
SKB_CONSUMED = 1,
SKB_DROP_REASON_NOT_SPECIFIED = 2,
SKB_DROP_REASON_NO_SOCKET = 3,
SKB_DROP_REASON_PKT_TOO_SMALL = 4,
SKB_DROP_REASON_TCP_CSUM = 5,
SKB_DROP_REASON_SOCKET_FILTER = 6,
SKB_DROP_REASON_UDP_CSUM = 7,
SKB_DROP_REASON_NETFILTER_DROP = 8,
SKB_DROP_REASON_OTHERHOST = 9,
SKB_DROP_REASON_IP_CSUM = 10,
SKB_DROP_REASON_IP_INHDR = 11,
SKB_DROP_REASON_IP_RPFILTER = 12,
SKB_DROP_REASON_UNICAST_IN_L2_MULTICAST = 13,
SKB_DROP_REASON_XFRM_POLICY = 14,
SKB_DROP_REASON_IP_NOPROTO = 15,
SKB_DROP_REASON_SOCKET_RCVBUFF = 16,
SKB_DROP_REASON_PROTO_MEM = 17,
SKB_DROP_REASON_TCP_MD5NOTFOUND = 18,
SKB_DROP_REASON_TCP_MD5UNEXPECTED = 19,
SKB_DROP_REASON_TCP_MD5FAILURE = 20,
SKB_DROP_REASON_SOCKET_BACKLOG = 21,
SKB_DROP_REASON_TCP_FLAGS = 22,
SKB_DROP_REASON_TCP_ZEROWINDOW = 23,
SKB_DROP_REASON_TCP_OLD_DATA = 24,
SKB_DROP_REASON_TCP_OVERWINDOW = 25,
SKB_DROP_REASON_TCP_OFOMERGE = 26,
SKB_DROP_REASON_TCP_RFC7323_PAWS = 27,
SKB_DROP_REASON_TCP_INVALID_SEQUENCE = 28,
SKB_DROP_REASON_TCP_RESET = 29,
SKB_DROP_REASON_TCP_INVALID_SYN = 30,
SKB_DROP_REASON_TCP_CLOSE = 31,
SKB_DROP_REASON_TCP_FASTOPEN = 32,
SKB_DROP_REASON_TCP_OLD_ACK = 33,
SKB_DROP_REASON_TCP_TOO_OLD_ACK = 34,
SKB_DROP_REASON_TCP_ACK_UNSENT_DATA = 35,
SKB_DROP_REASON_TCP_OFO_QUEUE_PRUNE = 36,
SKB_DROP_REASON_TCP_OFO_DROP = 37,
SKB_DROP_REASON_IP_OUTNOROUTES = 38,
SKB_DROP_REASON_BPF_CGROUP_EGRESS = 39,
SKB_DROP_REASON_IPV6DISABLED = 40,
SKB_DROP_REASON_NEIGH_CREATEFAIL = 41,
SKB_DROP_REASON_NEIGH_FAILED = 42,
SKB_DROP_REASON_NEIGH_QUEUEFULL = 43,
SKB_DROP_REASON_NEIGH_DEAD = 44,
SKB_DROP_REASON_TC_EGRESS = 45,
SKB_DROP_REASON_QDISC_DROP = 46,
SKB_DROP_REASON_CPU_BACKLOG = 47,
SKB_DROP_REASON_XDP = 48,
SKB_DROP_REASON_TC_INGRESS = 49,
SKB_DROP_REASON_UNHANDLED_PROTO = 50,
SKB_DROP_REASON_SKB_CSUM = 51,
SKB_DROP_REASON_SKB_GSO_SEG = 52,
SKB_DROP_REASON_SKB_UCOPY_FAULT = 53,
SKB_DROP_REASON_DEV_HDR = 54,
SKB_DROP_REASON_DEV_READY = 55,
SKB_DROP_REASON_FULL_RING = 56,
SKB_DROP_REASON_NOMEM = 57,
SKB_DROP_REASON_HDR_TRUNC = 58,
SKB_DROP_REASON_TAP_FILTER = 59,
SKB_DROP_REASON_TAP_TXFILTER = 60,
SKB_DROP_REASON_ICMP_CSUM = 61,
SKB_DROP_REASON_INVALID_PROTO = 62,
SKB_DROP_REASON_IP_INADDRERRORS = 63,
SKB_DROP_REASON_IP_INNOROUTES = 64,
SKB_DROP_REASON_PKT_TOO_BIG = 65,
SKB_DROP_REASON_DUP_FRAG = 66,
SKB_DROP_REASON_FRAG_REASM_TIMEOUT = 67,
SKB_DROP_REASON_FRAG_TOO_FAR = 68,
SKB_DROP_REASON_MAX = 69,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not confortable at all with that... How can we ensure we are up to date on this enum ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we forced to mention values for each enum ?

Copy link
Contributor Author

@msherif1234 msherif1234 Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is kernel definition I checked with rhel and they normally append to the end with new causes except for this https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0e84afe8ebfbb9eade3f4f6de4720887bf908e26 which cause teh disconnect in general I will be watching this vmlinux and make sure updates are compatible with older releases even if its at the cost of maintaining custom version for it, this is expected overhead of relying on kernel headers

bpf/headers/vmlinux.h Outdated Show resolved Hide resolved
Signed-off-by: msherif1234 <mmahmoud@redhat.com>
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 24, 2023
Copy link
Contributor

@jpinsonneau jpinsonneau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @msherif1234

@msherif1234
Copy link
Contributor Author

/approve

@openshift-ci
Copy link

openshift-ci bot commented Jul 24, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: msherif1234

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 3955ce8 into netobserv:main Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants