Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf: New approach for BPF MTU handling and enforcement #179

Closed
wants to merge 7 commits into from

Conversation

kernel-patches-bot
Copy link

Pull request for series with
subject: bpf: New approach for BPF MTU handling and enforcement
version: 1
url: https://patchwork.kernel.org/project/bpf/list/?series=360359

@kernel-patches-bot
Copy link
Author

Master branch: dca4121
series: https://patchwork.kernel.org/project/bpf/list/?series=360359
version: 1

@kernel-patches-bot
Copy link
Author

Master branch: fd08f94
series: https://patchwork.kernel.org/project/bpf/list/?series=360359
version: 1

kernel-patches-bot and others added 7 commits October 6, 2020 11:49
Multiple BPF-helpers that can manipulate/increase the size of the SKB uses
__bpf_skb_max_len() as the max-length. This function limit size against
the current net_device MTU (skb->dev->mtu).

When a BPF-prog grow the packet size, then it should not be limited to the
MTU. The MTU is a transmit limitation, and software receiving this packet
should be allowed to increase the size. Further more, current MTU check in
__bpf_skb_max_len uses the MTU from ingress/current net_device, which in
case of redirects uses the wrong net_device.

Keep a sanity max limit of IP_MAX_MTU which is 64KiB.

In later patches we will enforce the MTU limitation when transmitting
packets.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
(imported from commit 37f8552786cf46588af52b77829b730dd14524d3)
The BPF-helpers for FIB lookup (bpf_xdp_fib_lookup and bpf_skb_fib_lookup)
can perform MTU check and return BPF_FIB_LKUP_RET_FRAG_NEEDED.  The BPF-prog
don't know the MTU value that caused this rejection.

If the BPF-prog wants to implement PMTU (Path MTU Discovery) (rfc1191) it
need to know this MTU value for the ICMP packet.

Patch change lookup and result struct bpf_fib_lookup, to contain this MTU
value as output via a union with 'tot_len' as this is the value used for
the MTU lookup.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
FIXME: add description.

FIXME: IMHO we can create a better BPF-helper named bpf_mtu_check()
instead of bpf_mtu_lookup(), because a flag can be used for requesting
GRO segment size checking.  The ret value of bpf_mtu_check() says
if MTU was violoated, but also return MTU via pointer arg to allow
BPF-progs to do own logic.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
This change makes it possible to identify SKBs that have been redirected
by TC-BPF (cls_act). This is needed for a number of cases.

(1) For collaborating with driver ifb net_devices.
(2) For avoiding starting generic-XDP prog on TC ingress redirect.
(3) Next MTU check patches need ability to identify redirected SKBs.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
The MTU should only apply to transmitted packets.

When TC-ingress redirect packet to egress on another netdev, then the
normal netstack MTU checks are skipped (and driver level will not catch
any MTU violation, checked ixgbe).

This patch choose not to add MTU check in the egress code path of
skb_do_redirect() prior to calling dev_queue_xmit(), because it is still
possible to run another BPF egress program that will shrink/consume
headers, which will make packet comply with netdev MTU. This use-case
might already be in production use (if ingress MTU is larger than egress).

Instead do the MTU check after sch_handle_egress() step, for the cases
that require this.

The cases need a bit explaining. Ingress to egress redirected packets
could be detected via skb->tc_at_ingress bit, but it is not reliable,
because sch_handle_egress() could steal the packet and redirect this
(again) to another egress netdev, which will then have the
skb->tc_at_ingress cleared. There is also the case of TC-egress prog
increase packet size and then redirect it egress. Thus, it is more
reliable to do the MTU check for any redirected packet (both ingress and
egress), which is available via skb_is_redirected() in earlier patch.
Also handle case where egress BPF-prog increased size.

One advantage of this approach is that it ingress-to-egress BPF-prog can
send information via packet data. With the MTU checks removed in the
helpers, and also not done in skb_do_redirect() call, this allows for an
ingress BPF-prog to communicate with an egress BPF-prog via packet data,
as long as egress BPF-prog remove this prior to transmitting packet.

Troubleshooting: MTU violations are recorded in TX dropped counter, and
kprobe on dev_queue_xmit() have retval -EMSGSIZE.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
The use-case for dropping the MTU check when TC-BPF ingress redirecting a
packet, is described by Eyal Birger in email[0]. The summary is the
ability to increase packet size (e.g. with IPv6 headers for NAT64) and
ingress redirect packet and let normal netstack fragment packet as needed.

[0] https://lore.kernel.org/netdev/CAHsH6Gug-hsLGHQ6N0wtixdOa85LDZ3HNRHVd0opR=19Qo4W4Q@mail.gmail.com/

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
@kernel-patches-bot
Copy link
Author

Master branch: 67ed375
series: https://patchwork.kernel.org/project/bpf/list/?series=360359
version: 1

@kernel-patches-bot kernel-patches-bot deleted the series/360359=>bpf-next branch October 7, 2020 01:45
kernel-patches-bot pushed a commit that referenced this pull request Oct 4, 2022
Add a big batch of selftest to extend test_progs with various tc link,
attach ops and old-style tc BPF attachments via libbpf APIs. Also test
multi-program attachments including mixing the various attach options:

  # ./test_progs -t tc_link
  #179     tc_link_base:OK
  #180     tc_link_detach:OK
  #181     tc_link_mix:OK
  #182     tc_link_opts:OK
  #183     tc_link_run_base:OK
  #184     tc_link_run_chain:OK
  Summary: 6/0 PASSED, 0 SKIPPED, 0 FAILED

All new and existing test cases pass.

Co-developed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants