Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf: New approach for BPF MTU handling #270

Closed
wants to merge 6 commits into from

Commits on Oct 27, 2020

  1. adding ci files

    kernel-patches-bot committed Oct 27, 2020
    Configuration menu
    Copy the full SHA
    de20c63 View commit details
    Browse the repository at this point in the history
  2. bpf: Remove MTU check in __bpf_skb_max_len

    Multiple BPF-helpers that can manipulate/increase the size of the SKB uses
    __bpf_skb_max_len() as the max-length. This function limit size against
    the current net_device MTU (skb->dev->mtu).
    
    When a BPF-prog grow the packet size, then it should not be limited to the
    MTU. The MTU is a transmit limitation, and software receiving this packet
    should be allowed to increase the size. Further more, current MTU check in
    __bpf_skb_max_len uses the MTU from ingress/current net_device, which in
    case of redirects uses the wrong net_device.
    
    Patch V4 keeps a sanity max limit of SKB_MAX_ALLOC (16KiB). The real limit
    is elsewhere in the system. Jesper's testing[1] showed it was not possible
    to exceed 8KiB when expanding the SKB size via BPF-helper. The limiting
    factor is the define KMALLOC_MAX_CACHE_SIZE which is 8192 for
    SLUB-allocator (CONFIG_SLUB) in-case PAGE_SIZE is 4096. This define is
    in-effect due to this being called from softirq context see code
    __gfp_pfmemalloc_flags() and __do_kmalloc_node(). Jakub's testing showed
    that frames above 16KiB can cause NICs to reset (but not crash). Keep this
    sanity limit at this level as memory layer can differ based on kernel
    config.
    
    [1] https://github.com/xdp-project/bpf-examples/tree/master/MTU-tests
    
    V3: replace __bpf_skb_max_len() with define and use IPv6 max MTU size.
    
    Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
    netoptimizer authored and kernel-patches-bot committed Oct 27, 2020
    Configuration menu
    Copy the full SHA
    fa114a5 View commit details
    Browse the repository at this point in the history
  3. bpf: bpf_fib_lookup return MTU value as output when looked up

    The BPF-helpers for FIB lookup (bpf_xdp_fib_lookup and bpf_skb_fib_lookup)
    can perform MTU check and return BPF_FIB_LKUP_RET_FRAG_NEEDED.  The BPF-prog
    don't know the MTU value that caused this rejection.
    
    If the BPF-prog wants to implement PMTU (Path MTU Discovery) (rfc1191) it
    need to know this MTU value for the ICMP packet.
    
    Patch change lookup and result struct bpf_fib_lookup, to contain this MTU
    value as output via a union with 'tot_len' as this is the value used for
    the MTU lookup.
    
    Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
    netoptimizer authored and kernel-patches-bot committed Oct 27, 2020
    Configuration menu
    Copy the full SHA
    db9e414 View commit details
    Browse the repository at this point in the history
  4. bpf: add BPF-helper for MTU checking

    This BPF-helper bpf_check_mtu() works for both XDP and TC-BPF programs.
    
    The API is designed to help the BPF-programmer, that want to do packet
    context size changes, which involves other helpers. These other helpers
    usually does a delta size adjustment. This helper also support a delta
    size (len_diff), which allow BPF-programmer to reuse arguments needed by
    these other helpers, and perform the MTU check prior to doing any actual
    size adjustment of the packet context.
    
    It is on purpose, that we allow the len adjustment to become a negative
    result, that will pass the MTU check. This might seem weird, but it's not
    this helpers responsibility to "catch" wrong len_diff adjustments. Other
    helpers will take care of these checks, if BPF-programmer chooses to do
    actual size adjustment.
    
    V4: Lot of changes
     - ifindex 0 now use current netdev for MTU lookup
     - rename helper from bpf_mtu_check to bpf_check_mtu
     - fix bug for GSO pkt length (as skb->len is total len)
     - remove __bpf_len_adj_positive, simply allow negative len adj
    
    V3: Take L2/ETH_HLEN header size into account and document it.
    
    Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
    netoptimizer authored and kernel-patches-bot committed Oct 27, 2020
    Configuration menu
    Copy the full SHA
    97b82a2 View commit details
    Browse the repository at this point in the history
  5. bpf: drop MTU check when doing TC-BPF redirect to ingress

    The use-case for dropping the MTU check when TC-BPF does redirect to
    ingress, is described by Eyal Birger in email[0]. The summary is the
    ability to increase packet size (e.g. with IPv6 headers for NAT64) and
    ingress redirect packet and let normal netstack fragment packet as needed.
    
    [0] https://lore.kernel.org/netdev/CAHsH6Gug-hsLGHQ6N0wtixdOa85LDZ3HNRHVd0opR=19Qo4W4Q@mail.gmail.com/
    
    V4:
     - Keep net_device "up" (IFF_UP) check.
     - Adjustment to handle bpf_redirect_peer() helper
    
    Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
    netoptimizer authored and kernel-patches-bot committed Oct 27, 2020
    Configuration menu
    Copy the full SHA
    22fe606 View commit details
    Browse the repository at this point in the history
  6. bpf: make it possible to identify BPF redirected SKBs

    This change makes it possible to identify SKBs that have been redirected
    by TC-BPF (cls_act). This is needed for a number of cases.
    
    (1) For collaborating with driver ifb net_devices.
    (2) For avoiding starting generic-XDP prog on TC ingress redirect.
    
    It is most important to fix XDP case(2), because this can break userspace
    when a driver gets support for native-XDP. Imagine userspace loads XDP
    prog on eth0, which fallback to generic-XDP, and it process TC-redirected
    packets. When kernel is updated with native-XDP support for eth0, then the
    program no-longer see the TC-redirected packets. Therefore it is important
    to keep the order intact; that XDP runs before TC-BPF.
    
    Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
    netoptimizer authored and kernel-patches-bot committed Oct 27, 2020
    Configuration menu
    Copy the full SHA
    dc79787 View commit details
    Browse the repository at this point in the history