Skip to content

Latest commit

 

History

History
230 lines (150 loc) · 8.22 KB

brainstorm.org

File metadata and controls

230 lines (150 loc) · 8.22 KB

Brainstorm notes

This doc is for brainstorming. Watch out for likely duplicates, clean them up progressively.

Turn these notes into real issue or bug descriptions and move them into other docs like xdp_issues.org

ip-util choose between generic/native-XDP

The ip utility (from iproute2) does not support specifying, that an XDP prog need to sue generic-XDP.

I need this when testing veth, which now got native-XDP.

General: Make Generic-XDP complient with Native-XDP features

XDP on ARM-boards

See areas/arm64/README.org

Crazy-idea: XDP-redirect re-handle about-to-be-dropped packets

Ref slides: http://people.netfilter.org/hawk/presentations/NetConf2017_Seoul/XDP_devel_update_NetConf2017_Seoul.pdf

Idea slide 31: On XDP-redirect packets can be dropped, after the XDP prog have run, and today we have a ‘drop-tracepoint’ to handle/debug this. It would be more interesting to call a new XDP-type program that get the about-to-be-dropped packet, and allow it to take a new decision. Wondering what this can be used for, click the youtube link in the slide, and blow-your-mind! (https://youtu.be/BO0QhaxBRr0)

Tracking and troubleshooting eBPF

Want to track when BPF progs gets loaded/unloaded

  • Both for system audit and troubleshooting purposes

BPF code do have LSM/SElinux hooks

  • LSM maintainers request, use Audit subsystem instead
  • Jiri Olsa currently working on this

Usability: XDP redirect not impl by driver gives WARN_ON

E.g. AF_XDP via xdpsock uses redirect, and there is not feedback when running prog on driver without redirect-support. Need to look in dmesg to see big splast.

Usability: HW offload on non-supported driver/firmware return EINVAL

E.g. Loading XDP/ebpf prog with HW offload on Netronome NIC, but wrong firmware, just sats EINVAL. No help or hints that firmware needs to be upgraded.

Same EINVAL when loading on a NIC driver other than Netronome.

xdp_buff need data_hard_end

ixgbe and i40e choose to violate one-page-per-packet

Two reasons: #1 - extend frame tail, for DNS/other reply use-cases

  • NDIV needs this feature to convert to XDP

#2 - simplify XDP convert to SKB (else truesize is a lie)

DMA mapping xdp_frame

Keeping DMA mapping in XDP return API.

Add samples/bpf/ using raw-tracepoints

Convert xdp_monitor to use raw-tracepoints

Convert samples/bpf/ to use libbpf as elf-loader

sample/bpf output XDP-mode (as QA forget to report)

sample/bpf output Driver (as QA forget to report)

GROUP: CPU-map redirect things

CPUMAP: Impl. OoO mechanism

Ref slides: http://people.netfilter.org/hawk/presentations/NetConf2017_Seoul/XDP_devel_update_NetConf2017_Seoul.pdf

Idea on slide 29 + 30: Implement in BPF-code an Out-of-Order safe way to use cpumap-redirect to dynamically load-balance IP-flows.

BUG: CPUMAP not working for generic-xdp

Implement CPUMAP redirect with connection hashing over CPUs

c4c202175424 (“Merge branch ‘bpf-sample-cpumap-lb’”) 1bca4e6b1863 (“samples/bpf: xdp_redirect_cpu load balance like Suricata”) 11395686586b (“samples/bpf: add Paul Hsieh’s (LGPL 2.1) hash function SuperFastHash”)

Feature: CPU map could use netif_receive_skb_list()

cpumap: Implement GRO handling

xdp_redirect_cpu sample output interface-name and ifindex

QA is giving my output from xdp_redirect_cpu, but forget to say what interface this is loaded on. Solve by simply outputting the interface in the output.

samples/bpf redirect example include net_device TX stats

samples/bpf upstream napi_monitor

Missing an ifindex to match on in tracepoint

Generic-xdp how-to assure NAPI protect?

Make bulk work for generic xdp with devmap

Rx+tx ifconfig count update for xdp

Meta data implement missing for many drivers

Should we standardize ethtool stats for xdp?

Adding and removing ethtool channels runtime semantics?

(Jakub question this)

Frame to skb, missing csum hw info

(Plus other info)

Frame to skb, use full headroom

BUG: Generic-XDP does not work for TCP (in certain cases)

Potential-bug: Mlx5 fix dma unmap call (after xdp return frame)

Retpoline performance issue for xdp

Retpoline: mlx5 too many indirect calls

Old list: watch out for dublicates

Old list of stuff I need to work-on/fix for XDP/bpf project:

  • XDP return frame API (needed by AF_XDP ZC)
  • Bulking API for return frame API
  • Bulking API for ndo_xdp_xmit
  • Address massive XDP regression due to CONFIG_RETPOLINE
  • Introduce bulking for generic-XDP (PoC test show +30% perf!!!)
  • Fix CPUMAP to work with generic-XDP
  • Implement ndo_xdp_xmit for macvlan (fast guest delivery)
  • Improve BPF doc
  • Improve XDP doc
  • Work on XDP article with Toke+Alexei+Daniel
  • Better integration of XDP in Suricata (multiple small thing)
  • Find XDP feature/capability API (use in Suricata)
  • Help integrate AF_XDP in Suricata
  • Ship bpftool in distros (start with static linked libbpf)
  • Make libbpf a shared lib in distros (fix lib versioning)
  • Help (Ahern) get XDP route/FIB lookup helper integrated
  • Work on bridge FIB table lookup helper
  • XDP get more info transferred to CPUMAP skb creation time
  • cpumap: GRO support
  • cpumap: RX hash support
  • cpumap: HW csum offload/info
  • Generalize CPUMAP skb creation, move SKB alloc out of driver code
  • Improve XDP cpumap redirect example: flow hashing (fix NIC HW hash)
  • Help get AF_XDP API and performance aligned
  • Help get AF_XDP zero-copy integrated via XDP return API
  • xdp_monitor: improve to show error codes (to easier troubleshoot)
  • convert tracepoint to use ifindex instead of names (strcpy overhead)
  • XDP_REDIRECT: Detect buggy-drivers forgetting to clear per-CPU map
  • Streamline eBPF map-create return codes on errors
  • Upstream xdp_bench01 sample to be standard way to measure XDP perf
  • Fix that TCP traffic with XDP generic on virtual net_devices are broken
  • xdp: avoid leaking info stored in xdp_frame data on page reuse
  • XDP_REDIRECT implemement in every driver
  • XDP data-meta implemement in every driver
  • Improve samples/bpf: XDP progs should take ifconfig/net_device names
  • Improve samples/bpf: Avoid including ./arch/x86/include/asm/cpufeature.h

XDP metadata: dynamic descriptor offloads via BTF

Napatech have descriptors in-front of packet payload

Proprietary commercial companies like NapaTech, which also maps packets into userspace, deliver dynamic descriptor info in-front of the packet, like our data_meta area.

As far as I can see, via their public docs[4], they have 4 different dynamic descriptor formats. With BTF and metadata we should have more flexibility than them :-)

It is a bit interesting to look at what they expose. I recommend looking/clicking at the header-file[5][6][7][8] as it shows they use a lot of C-bit-fields to compress the size. Do BTF support C-bit fields?

[4] https://docs.napatech.com/search/all?query=Dynamic+Packet+Descriptor [5] pktdescr_dyn1.h https://docs.napatech.com/reader/Gtwjm73bddn7nrHz1NxHZw/leAUnFb_t2il~h4y1tNPpw [6] pktdescr_dyn2.h https://docs.napatech.com/reader/GHSQQPQbWLPdJUmxIkO91Q/7cYsE5yb3DLpomSTEeL_bQ [7] pktdescr_dyn3.h https://docs.napatech.com/reader/GHSQQPQbWLPdJUmxIkO91Q/hcQobdatqtY2j2nmrZ577A [8] pktdescr_dyn4.h https://docs.napatech.com/reader/GHSQQPQbWLPdJUmxIkO91Q/2LaoD2p2mvxpkNOVvBPBqg

I find it interesting to see that in (dyn1+2), default decode offsets into the packet of L3 and L4 (and L4 payload), but allows them to be programmable. Also notice the “color” members, which are programmable, and sometimes are use as a 64-bit unique correlation key[8] (e.g. identifying flows).

Maybe I should have looked at their standard format before the dynamic(?) Their pktdescr_std0.h [9] is placed in front of all packets being received by the adapter when the adapter is operating in STANDARD or EXTENDED mode. The Extended descriptors are placed after pktdescr_std0.h [9], which contains a lot of info on the types in different layers, see[10][11][12].

[9] pktdescr_std0.h https://docs.napatech.com/reader/GHSQQPQbWLPdJUmxIkO91Q/mRPP74KNQXIJtSyWBBHMKA [10] pktdescr_ext7.h https://docs.napatech.com/reader/GHSQQPQbWLPdJUmxIkO91Q/GXhjyPfAPJ6Rr7k8KR4KsQ [11] pktdescr_ext8.h https://docs.napatech.com/reader/GHSQQPQbWLPdJUmxIkO91Q/NWaAIAROOdyXvy~OV4pl6A [12] pktdescr_ext9.h https://docs.napatech.com/reader/GHSQQPQbWLPdJUmxIkO91Q/LVIq8m_b0_44QIIAqcdLew