Skip to content

Commit

Permalink
bpf: add meta pointer for direct access
Browse files Browse the repository at this point in the history
This work enables generic transfer of metadata from XDP into skb. The
basic idea is that we can make use of the fact that the resulting skb
must be linear and already comes with a larger headroom for supporting
bpf_xdp_adjust_head(), which mangles xdp->data. Here, we base our work
on a similar principle and introduce a small helper bpf_xdp_adjust_meta()
for adjusting a new pointer called xdp->data_meta. Thus, the packet has
a flexible and programmable room for meta data, followed by the actual
packet data. struct xdp_buff is therefore laid out that we first point
to data_hard_start, then data_meta directly prepended to data followed
by data_end marking the end of packet. bpf_xdp_adjust_head() takes into
account whether we have meta data already prepended and if so, memmove()s
this along with the given offset provided there's enough room.

xdp->data_meta is optional and programs are not required to use it. The
rationale is that when we process the packet in XDP (e.g. as DoS filter),
we can push further meta data along with it for the XDP_PASS case, and
give the guarantee that a clsact ingress BPF program on the same device
can pick this up for further post-processing. Since we work with skb
there, we can also set skb->mark, skb->priority or other skb meta data
out of BPF, thus having this scratch space generic and programmable
allows for more flexibility than defining a direct 1:1 transfer of
potentially new XDP members into skb (it's also more efficient as we
don't need to initialize/handle each of such new members). The facility
also works together with GRO aggregation. The scratch space at the head
of the packet can be multiple of 4 byte up to 32 byte large. Drivers not
yet supporting xdp->data_meta can simply be set up with xdp->data_meta
as xdp->data + 1 as bpf_xdp_adjust_meta() will detect this and bail out,
such that the subsequent match against xdp->data for later access is
guaranteed to fail.

The verifier treats xdp->data_meta/xdp->data the same way as we treat
xdp->data/xdp->data_end pointer comparisons. The requirement for doing
the compare against xdp->data is that it hasn't been modified from it's
original address we got from ctx access. It may have a range marking
already from prior successful xdp->data/xdp->data_end pointer comparisons
though.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
borkmann authored and davem330 committed Sep 26, 2017
1 parent 6aaae2b commit de8f3a8
Show file tree
Hide file tree
Showing 19 changed files with 297 additions and 42 deletions.
1 change: 1 addition & 0 deletions drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,

xdp.data_hard_start = *data_ptr - offset;
xdp.data = *data_ptr;
xdp_set_data_meta_invalid(&xdp);
xdp.data_end = *data_ptr + *len;
orig_data = xdp.data;
mapping = rx_buf->mapping - bp->rx_dma_offset;
Expand Down
1 change: 1 addition & 0 deletions drivers/net/ethernet/cavium/thunder/nicvf_main.c
Original file line number Diff line number Diff line change
Expand Up @@ -523,6 +523,7 @@ static inline bool nicvf_xdp_rx(struct nicvf *nic, struct bpf_prog *prog,

xdp.data_hard_start = page_address(page);
xdp.data = (void *)cpu_addr;
xdp_set_data_meta_invalid(&xdp);
xdp.data_end = xdp.data + len;
orig_data = xdp.data;

Expand Down
1 change: 1 addition & 0 deletions drivers/net/ethernet/intel/i40e/i40e_txrx.c
Original file line number Diff line number Diff line change
Expand Up @@ -2107,6 +2107,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
if (!skb) {
xdp.data = page_address(rx_buffer->page) +
rx_buffer->page_offset;
xdp_set_data_meta_invalid(&xdp);
xdp.data_hard_start = xdp.data -
i40e_rx_offset(rx_ring);
xdp.data_end = xdp.data + size;
Expand Down
1 change: 1 addition & 0 deletions drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
Original file line number Diff line number Diff line change
Expand Up @@ -2326,6 +2326,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
if (!skb) {
xdp.data = page_address(rx_buffer->page) +
rx_buffer->page_offset;
xdp_set_data_meta_invalid(&xdp);
xdp.data_hard_start = xdp.data -
ixgbe_rx_offset(rx_ring);
xdp.data_end = xdp.data + size;
Expand Down
1 change: 1 addition & 0 deletions drivers/net/ethernet/mellanox/mlx4/en_rx.c
Original file line number Diff line number Diff line change
Expand Up @@ -762,6 +762,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud

xdp.data_hard_start = va - frags[0].page_offset;
xdp.data = va;
xdp_set_data_meta_invalid(&xdp);
xdp.data_end = xdp.data + length;
orig_data = xdp.data;

Expand Down
1 change: 1 addition & 0 deletions drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
Original file line number Diff line number Diff line change
Expand Up @@ -794,6 +794,7 @@ static inline int mlx5e_xdp_handle(struct mlx5e_rq *rq,
return false;

xdp.data = va + *rx_headroom;
xdp_set_data_meta_invalid(&xdp);
xdp.data_end = xdp.data + *len;
xdp.data_hard_start = va;

Expand Down
1 change: 1 addition & 0 deletions drivers/net/ethernet/netronome/nfp/nfp_net_common.c
Original file line number Diff line number Diff line change
Expand Up @@ -1583,6 +1583,7 @@ static int nfp_net_run_xdp(struct bpf_prog *prog, void *data, void *hard_start,

xdp.data_hard_start = hard_start;
xdp.data = data + *off;
xdp_set_data_meta_invalid(&xdp);
xdp.data_end = data + *off + *len;

orig_data = xdp.data;
Expand Down
1 change: 1 addition & 0 deletions drivers/net/ethernet/qlogic/qede/qede_fp.c
Original file line number Diff line number Diff line change
Expand Up @@ -1004,6 +1004,7 @@ static bool qede_rx_xdp(struct qede_dev *edev,

xdp.data_hard_start = page_address(bd->data);
xdp.data = xdp.data_hard_start + *data_offset;
xdp_set_data_meta_invalid(&xdp);
xdp.data_end = xdp.data + *len;

/* Queues always have a full reset currently, so for the time
Expand Down
1 change: 1 addition & 0 deletions drivers/net/tun.c
Original file line number Diff line number Diff line change
Expand Up @@ -1468,6 +1468,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,

xdp.data_hard_start = buf;
xdp.data = buf + pad;
xdp_set_data_meta_invalid(&xdp);
xdp.data_end = xdp.data + len;
orig_data = xdp.data;
act = bpf_prog_run_xdp(xdp_prog, &xdp);
Expand Down
2 changes: 2 additions & 0 deletions drivers/net/virtio_net.c
Original file line number Diff line number Diff line change
Expand Up @@ -554,6 +554,7 @@ static struct sk_buff *receive_small(struct net_device *dev,

xdp.data_hard_start = buf + VIRTNET_RX_PAD + vi->hdr_len;
xdp.data = xdp.data_hard_start + xdp_headroom;
xdp_set_data_meta_invalid(&xdp);
xdp.data_end = xdp.data + len;
orig_data = xdp.data;
act = bpf_prog_run_xdp(xdp_prog, &xdp);
Expand Down Expand Up @@ -686,6 +687,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
data = page_address(xdp_page) + offset;
xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len;
xdp.data = data + vi->hdr_len;
xdp_set_data_meta_invalid(&xdp);
xdp.data_end = xdp.data + (len - vi->hdr_len);
act = bpf_prog_run_xdp(xdp_prog, &xdp);

Expand Down
1 change: 1 addition & 0 deletions include/linux/bpf.h
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,7 @@ enum bpf_reg_type {
PTR_TO_MAP_VALUE, /* reg points to map element value */
PTR_TO_MAP_VALUE_OR_NULL,/* points to map elem value or NULL */
PTR_TO_STACK, /* reg == frame_pointer + offset */
PTR_TO_PACKET_META, /* skb->data - meta_len */
PTR_TO_PACKET, /* reg points to skb->data */
PTR_TO_PACKET_END, /* skb->data + headlen */
};
Expand Down
21 changes: 19 additions & 2 deletions include/linux/filter.h
Original file line number Diff line number Diff line change
Expand Up @@ -487,12 +487,14 @@ struct sk_filter {

struct bpf_skb_data_end {
struct qdisc_skb_cb qdisc_cb;
void *data_meta;
void *data_end;
};

struct xdp_buff {
void *data;
void *data_end;
void *data_meta;
void *data_hard_start;
};

Expand All @@ -507,7 +509,8 @@ static inline void bpf_compute_data_pointers(struct sk_buff *skb)
struct bpf_skb_data_end *cb = (struct bpf_skb_data_end *)skb->cb;

BUILD_BUG_ON(sizeof(*cb) > FIELD_SIZEOF(struct sk_buff, cb));
cb->data_end = skb->data + skb_headlen(skb);
cb->data_meta = skb->data - skb_metadata_len(skb);
cb->data_end = skb->data + skb_headlen(skb);
}

static inline u8 *bpf_skb_cb(struct sk_buff *skb)
Expand Down Expand Up @@ -728,8 +731,22 @@ int xdp_do_redirect(struct net_device *dev,
struct bpf_prog *prog);
void xdp_do_flush_map(void);

/* Drivers not supporting XDP metadata can use this helper, which
* rejects any room expansion for metadata as a result.
*/
static __always_inline void
xdp_set_data_meta_invalid(struct xdp_buff *xdp)
{
xdp->data_meta = xdp->data + 1;
}

static __always_inline bool
xdp_data_meta_unsupported(const struct xdp_buff *xdp)
{
return unlikely(xdp->data_meta > xdp->data);
}

void bpf_warn_invalid_xdp_action(u32 act);
void bpf_warn_invalid_xdp_redirect(u32 ifindex);

struct sock *do_sk_redirect_map(void);

Expand Down
68 changes: 66 additions & 2 deletions include/linux/skbuff.h
Original file line number Diff line number Diff line change
Expand Up @@ -489,8 +489,9 @@ int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb,
* the end of the header data, ie. at skb->end.
*/
struct skb_shared_info {
unsigned short _unused;
unsigned char nr_frags;
__u8 __unused;
__u8 meta_len;
__u8 nr_frags;
__u8 tx_flags;
unsigned short gso_size;
/* Warning: this field is not always filled in (UFO)! */
Expand Down Expand Up @@ -3400,6 +3401,69 @@ static inline ktime_t net_invalid_timestamp(void)
return 0;
}

static inline u8 skb_metadata_len(const struct sk_buff *skb)
{
return skb_shinfo(skb)->meta_len;
}

static inline void *skb_metadata_end(const struct sk_buff *skb)
{
return skb_mac_header(skb);
}

static inline bool __skb_metadata_differs(const struct sk_buff *skb_a,
const struct sk_buff *skb_b,
u8 meta_len)
{
const void *a = skb_metadata_end(skb_a);
const void *b = skb_metadata_end(skb_b);
/* Using more efficient varaiant than plain call to memcmp(). */
#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64
u64 diffs = 0;

switch (meta_len) {
#define __it(x, op) (x -= sizeof(u##op))
#define __it_diff(a, b, op) (*(u##op *)__it(a, op)) ^ (*(u##op *)__it(b, op))
case 32: diffs |= __it_diff(a, b, 64);
case 24: diffs |= __it_diff(a, b, 64);
case 16: diffs |= __it_diff(a, b, 64);
case 8: diffs |= __it_diff(a, b, 64);
break;
case 28: diffs |= __it_diff(a, b, 64);
case 20: diffs |= __it_diff(a, b, 64);
case 12: diffs |= __it_diff(a, b, 64);
case 4: diffs |= __it_diff(a, b, 32);
break;
}
return diffs;
#else
return memcmp(a - meta_len, b - meta_len, meta_len);
#endif
}

static inline bool skb_metadata_differs(const struct sk_buff *skb_a,
const struct sk_buff *skb_b)
{
u8 len_a = skb_metadata_len(skb_a);
u8 len_b = skb_metadata_len(skb_b);

if (!(len_a | len_b))
return false;

return len_a != len_b ?
true : __skb_metadata_differs(skb_a, skb_b, len_a);
}

static inline void skb_metadata_set(struct sk_buff *skb, u8 meta_len)
{
skb_shinfo(skb)->meta_len = meta_len;
}

static inline void skb_metadata_clear(struct sk_buff *skb)
{
skb_metadata_set(skb, 0);
}

struct sk_buff *skb_clone_sk(struct sk_buff *skb);

#ifdef CONFIG_NETWORK_PHY_TIMESTAMPING
Expand Down
13 changes: 12 additions & 1 deletion include/uapi/linux/bpf.h
Original file line number Diff line number Diff line change
Expand Up @@ -582,6 +582,12 @@ union bpf_attr {
* @map: pointer to sockmap to update
* @key: key to insert/update sock in map
* @flags: same flags as map update elem
*
* int bpf_xdp_adjust_meta(xdp_md, delta)
* Adjust the xdp_md.data_meta by delta
* @xdp_md: pointer to xdp_md
* @delta: An positive/negative integer to be added to xdp_md.data_meta
* Return: 0 on success or negative on error
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
Expand Down Expand Up @@ -638,6 +644,7 @@ union bpf_attr {
FN(redirect_map), \
FN(sk_redirect_map), \
FN(sock_map_update), \
FN(xdp_adjust_meta),

/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
Expand Down Expand Up @@ -715,14 +722,17 @@ struct __sk_buff {
__u32 data_end;
__u32 napi_id;

/* accessed by BPF_PROG_TYPE_sk_skb types */
/* Accessed by BPF_PROG_TYPE_sk_skb types from here to ... */
__u32 family;
__u32 remote_ip4; /* Stored in network byte order */
__u32 local_ip4; /* Stored in network byte order */
__u32 remote_ip6[4]; /* Stored in network byte order */
__u32 local_ip6[4]; /* Stored in network byte order */
__u32 remote_port; /* Stored in network byte order */
__u32 local_port; /* stored in host byte order */
/* ... here. */

__u32 data_meta;
};

struct bpf_tunnel_key {
Expand Down Expand Up @@ -783,6 +793,7 @@ enum xdp_action {
struct xdp_md {
__u32 data;
__u32 data_end;
__u32 data_meta;
};

enum sk_action {
Expand Down
Loading

0 comments on commit de8f3a8

Please sign in to comment.