-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpf: New approach for BPF MTU handling #270
Commits on Oct 27, 2020
-
Configuration menu - View commit details
-
Copy full SHA for de20c63 - Browse repository at this point
Copy the full SHA de20c63View commit details -
bpf: Remove MTU check in __bpf_skb_max_len
Multiple BPF-helpers that can manipulate/increase the size of the SKB uses __bpf_skb_max_len() as the max-length. This function limit size against the current net_device MTU (skb->dev->mtu). When a BPF-prog grow the packet size, then it should not be limited to the MTU. The MTU is a transmit limitation, and software receiving this packet should be allowed to increase the size. Further more, current MTU check in __bpf_skb_max_len uses the MTU from ingress/current net_device, which in case of redirects uses the wrong net_device. Patch V4 keeps a sanity max limit of SKB_MAX_ALLOC (16KiB). The real limit is elsewhere in the system. Jesper's testing[1] showed it was not possible to exceed 8KiB when expanding the SKB size via BPF-helper. The limiting factor is the define KMALLOC_MAX_CACHE_SIZE which is 8192 for SLUB-allocator (CONFIG_SLUB) in-case PAGE_SIZE is 4096. This define is in-effect due to this being called from softirq context see code __gfp_pfmemalloc_flags() and __do_kmalloc_node(). Jakub's testing showed that frames above 16KiB can cause NICs to reset (but not crash). Keep this sanity limit at this level as memory layer can differ based on kernel config. [1] https://github.com/xdp-project/bpf-examples/tree/master/MTU-tests V3: replace __bpf_skb_max_len() with define and use IPv6 max MTU size. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Configuration menu - View commit details
-
Copy full SHA for fa114a5 - Browse repository at this point
Copy the full SHA fa114a5View commit details -
bpf: bpf_fib_lookup return MTU value as output when looked up
The BPF-helpers for FIB lookup (bpf_xdp_fib_lookup and bpf_skb_fib_lookup) can perform MTU check and return BPF_FIB_LKUP_RET_FRAG_NEEDED. The BPF-prog don't know the MTU value that caused this rejection. If the BPF-prog wants to implement PMTU (Path MTU Discovery) (rfc1191) it need to know this MTU value for the ICMP packet. Patch change lookup and result struct bpf_fib_lookup, to contain this MTU value as output via a union with 'tot_len' as this is the value used for the MTU lookup. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Configuration menu - View commit details
-
Copy full SHA for db9e414 - Browse repository at this point
Copy the full SHA db9e414View commit details -
bpf: add BPF-helper for MTU checking
This BPF-helper bpf_check_mtu() works for both XDP and TC-BPF programs. The API is designed to help the BPF-programmer, that want to do packet context size changes, which involves other helpers. These other helpers usually does a delta size adjustment. This helper also support a delta size (len_diff), which allow BPF-programmer to reuse arguments needed by these other helpers, and perform the MTU check prior to doing any actual size adjustment of the packet context. It is on purpose, that we allow the len adjustment to become a negative result, that will pass the MTU check. This might seem weird, but it's not this helpers responsibility to "catch" wrong len_diff adjustments. Other helpers will take care of these checks, if BPF-programmer chooses to do actual size adjustment. V4: Lot of changes - ifindex 0 now use current netdev for MTU lookup - rename helper from bpf_mtu_check to bpf_check_mtu - fix bug for GSO pkt length (as skb->len is total len) - remove __bpf_len_adj_positive, simply allow negative len adj V3: Take L2/ETH_HLEN header size into account and document it. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Configuration menu - View commit details
-
Copy full SHA for 97b82a2 - Browse repository at this point
Copy the full SHA 97b82a2View commit details -
bpf: drop MTU check when doing TC-BPF redirect to ingress
The use-case for dropping the MTU check when TC-BPF does redirect to ingress, is described by Eyal Birger in email[0]. The summary is the ability to increase packet size (e.g. with IPv6 headers for NAT64) and ingress redirect packet and let normal netstack fragment packet as needed. [0] https://lore.kernel.org/netdev/CAHsH6Gug-hsLGHQ6N0wtixdOa85LDZ3HNRHVd0opR=19Qo4W4Q@mail.gmail.com/ V4: - Keep net_device "up" (IFF_UP) check. - Adjustment to handle bpf_redirect_peer() helper Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Configuration menu - View commit details
-
Copy full SHA for 22fe606 - Browse repository at this point
Copy the full SHA 22fe606View commit details -
bpf: make it possible to identify BPF redirected SKBs
This change makes it possible to identify SKBs that have been redirected by TC-BPF (cls_act). This is needed for a number of cases. (1) For collaborating with driver ifb net_devices. (2) For avoiding starting generic-XDP prog on TC ingress redirect. It is most important to fix XDP case(2), because this can break userspace when a driver gets support for native-XDP. Imagine userspace loads XDP prog on eth0, which fallback to generic-XDP, and it process TC-redirected packets. When kernel is updated with native-XDP support for eth0, then the program no-longer see the TC-redirected packets. Therefore it is important to keep the order intact; that XDP runs before TC-BPF. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Configuration menu - View commit details
-
Copy full SHA for dc79787 - Browse repository at this point
Copy the full SHA dc79787View commit details