Skip to content

Commit

Permalink
ipvlan: Initial check-in of the IPVLAN driver.
Browse files Browse the repository at this point in the history
This driver is very similar to the macvlan driver except that it
uses L3 on the frame to determine the logical interface while
functioning as packet dispatcher. It inherits L2 of the master
device hence the packets on wire will have the same L2 for all
the packets originating from all virtual devices off of the same
master device.

This driver was developed keeping the namespace use-case in
mind. Hence most of the examples given here take that as the
base setup where main-device belongs to the default-ns and
virtual devices are assigned to the additional namespaces.

The device operates in two different modes and the difference
in these two modes in primarily in the TX side.

(a) L2 mode : In this mode, the device behaves as a L2 device.
TX processing upto L2 happens on the stack of the virtual device
associated with (namespace). Packets are switched after that
into the main device (default-ns) and queued for xmit.

RX processing is simple and all multicast, broadcast (if
applicable), and unicast belonging to the address(es) are
delivered to the virtual devices.

(b) L3 mode : In this mode, the device behaves like a L3 device.
TX processing upto L3 happens on the stack of the virtual device
associated with (namespace). Packets are switched to the
main-device (default-ns) for the L2 processing. Hence the routing
table of the default-ns will be used in this mode.

RX processins is somewhat similar to the L2 mode except that in
this mode only Unicast packets are delivered to the virtual device
while main-dev will handle all other packets.

The devices can be added using the "ip" command from the iproute2
package -

	ip link add link <master> <virtual> type ipvlan mode [ l2 | l3 ]

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Laurent Chavey <chavey@google.com>
Cc: Tim Hockin <thockin@google.com>
Cc: Brandon Philips <brandon.philips@coreos.com>
Cc: Pavel Emelianov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
Mahesh Bandewar authored and davem330 committed Nov 24, 2014
1 parent 2bbea0a commit 2ad7bf3
Show file tree
Hide file tree
Showing 9 changed files with 1,678 additions and 0 deletions.
107 changes: 107 additions & 0 deletions Documentation/networking/ipvlan.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@

IPVLAN Driver HOWTO

Initial Release:
Mahesh Bandewar <maheshb AT google.com>

1. Introduction:
This is conceptually very similar to the macvlan driver with one major
exception of using L3 for mux-ing /demux-ing among slaves. This property makes
the master device share the L2 with it's slave devices. I have developed this
driver in conjuntion with network namespaces and not sure if there is use case
outside of it.


2. Building and Installation:
In order to build the driver, please select the config item CONFIG_IPVLAN.
The driver can be built into the kernel (CONFIG_IPVLAN=y) or as a module
(CONFIG_IPVLAN=m).


3. Configuration:
There are no module parameters for this driver and it can be configured
using IProute2/ip utility.

ip link add link <master-dev> <slave-dev> type ipvlan mode { l2 | L3 }

e.g. ip link add link ipvl0 eth0 type ipvlan mode l2


4. Operating modes:
IPvlan has two modes of operation - L2 and L3. For a given master device,
you can select one of these two modes and all slaves on that master will
operate in the same (selected) mode. The RX mode is almost identical except
that in L3 mode the slaves wont receive any multicast / broadcast traffic.
L3 mode is more restrictive since routing is controlled from the other (mostly)
default namespace.

4.1 L2 mode:
In this mode TX processing happens on the stack instance attached to the
slave device and packets are switched and queued to the master device to send
out. In this mode the slaves will RX/TX multicast and broadcast (if applicable)
as well.

4.2 L3 mode:
In this mode TX processing upto L3 happens on the stack instance attached
to the slave device and packets are switched to the stack instance of the
master device for the L2 processing and routing from that instance will be
used before packets are queued on the outbound device. In this mode the slaves
will not receive nor can send multicast / broadcast traffic.


5. What to choose (macvlan vs. ipvlan)?
These two devices are very similar in many regards and the specific use
case could very well define which device to choose. if one of the following
situations defines your use case then you can choose to use ipvlan -
(a) The Linux host that is connected to the external switch / router has
policy configured that allows only one mac per port.
(b) No of virtual devices created on a master exceed the mac capacity and
puts the NIC in promiscous mode and degraded performance is a concern.
(c) If the slave device is to be put into the hostile / untrusted network
namespace where L2 on the slave could be changed / misused.


6. Example configuration:

+=============================================================+
| Host: host1 |
| |
| +----------------------+ +----------------------+ |
| | NS:ns0 | | NS:ns1 | |
| | | | | |
| | | | | |
| | ipvl0 | | ipvl1 | |
| +----------#-----------+ +-----------#----------+ |
| # # |
| ################################ |
| # eth0 |
+==============================#==============================+


(a) Create two network namespaces - ns0, ns1
ip netns add ns0
ip netns add ns1

(b) Create two ipvlan slaves on eth0 (master device)
ip link add link eth0 ipvl0 type ipvlan mode l2
ip link add link eth0 ipvl1 type ipvlan mode l2

(c) Assign slaves to the respective network namespaces
ip link set dev ipvl0 netns ns0
ip link set dev ipvl1 netns ns1

(d) Now switch to the namespace (ns0 or ns1) to configure the slave devices
- For ns0
(1) ip netns exec ns0 bash
(2) ip link set dev ipvl0 up
(3) ip link set dev lo up
(4) ip -4 addr add 127.0.0.1 dev lo
(5) ip -4 addr add $IPADDR dev ipvl0
(6) ip -4 route add default via $ROUTER dev ipvl0
- For ns1
(1) ip netns exec ns1 bash
(2) ip link set dev ipvl1 up
(3) ip link set dev lo up
(4) ip -4 addr add 127.0.0.1 dev lo
(5) ip -4 addr add $IPADDR dev ipvl1
(6) ip -4 route add default via $ROUTER dev ipvl1
18 changes: 18 additions & 0 deletions drivers/net/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,24 @@ config MACVTAP
To compile this driver as a module, choose M here: the module
will be called macvtap.


config IPVLAN
tristate "IP-VLAN support"
---help---
This allows one to create virtual devices off of a main interface
and packets will be delivered based on the dest L3 (IPv6/IPv4 addr)
on packets. All interfaces (including the main interface) share L2
making it transparent to the connected L2 switch.

Ipvlan devices can be added using the "ip" command from the
iproute2 package starting with the iproute2-X.Y.ZZ release:

"ip link add link <main-dev> [ NAME ] type ipvlan"

To compile this driver as a module, choose M here: the module
will be called ipvlan.


config VXLAN
tristate "Virtual eXtensible Local Area Network (VXLAN)"
depends on INET
Expand Down
1 change: 1 addition & 0 deletions drivers/net/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
# Networking Core Drivers
#
obj-$(CONFIG_BONDING) += bonding/
obj-$(CONFIG_IPVLAN) += ipvlan/
obj-$(CONFIG_DUMMY) += dummy.o
obj-$(CONFIG_EQUALIZER) += eql.o
obj-$(CONFIG_IFB) += ifb.o
Expand Down
7 changes: 7 additions & 0 deletions drivers/net/ipvlan/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#
# Makefile for the Ethernet Ipvlan driver
#

obj-$(CONFIG_IPVLAN) += ipvlan.o

ipvlan-objs := ipvlan_core.o ipvlan_main.o
130 changes: 130 additions & 0 deletions drivers/net/ipvlan/ipvlan.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
/*
* Copyright (c) 2014 Mahesh Bandewar <maheshb@google.com>
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License as
* published by the Free Software Foundation; either version 2 of
* the License, or (at your option) any later version.
*
*/
#ifndef __IPVLAN_H
#define __IPVLAN_H

#include <linux/kernel.h>
#include <linux/types.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/rculist.h>
#include <linux/notifier.h>
#include <linux/netdevice.h>
#include <linux/etherdevice.h>
#include <linux/if_arp.h>
#include <linux/if_link.h>
#include <linux/if_vlan.h>
#include <linux/ip.h>
#include <linux/inetdevice.h>
#include <net/rtnetlink.h>
#include <net/gre.h>
#include <net/route.h>
#include <net/addrconf.h>

#define IPVLAN_DRV "ipvlan"
#define IPV_DRV_VER "0.1"

#define IPVLAN_HASH_SIZE (1 << BITS_PER_BYTE)
#define IPVLAN_HASH_MASK (IPVLAN_HASH_SIZE - 1)

#define IPVLAN_MAC_FILTER_BITS 8
#define IPVLAN_MAC_FILTER_SIZE (1 << IPVLAN_MAC_FILTER_BITS)
#define IPVLAN_MAC_FILTER_MASK (IPVLAN_MAC_FILTER_SIZE - 1)

typedef enum {
IPVL_IPV6 = 0,
IPVL_ICMPV6,
IPVL_IPV4,
IPVL_ARP,
} ipvl_hdr_type;

struct ipvl_pcpu_stats {
u64 rx_pkts;
u64 rx_bytes;
u64 rx_mcast;
u64 tx_pkts;
u64 tx_bytes;
struct u64_stats_sync syncp;
u32 rx_errs;
u32 tx_drps;
};

struct ipvl_port;

struct ipvl_dev {
struct net_device *dev;
struct list_head pnode;
struct ipvl_port *port;
struct net_device *phy_dev;
struct list_head addrs;
int ipv4cnt;
int ipv6cnt;
struct ipvl_pcpu_stats *pcpu_stats;
DECLARE_BITMAP(mac_filters, IPVLAN_MAC_FILTER_SIZE);
netdev_features_t sfeatures;
u32 msg_enable;
u16 mtu_adj;
};

struct ipvl_addr {
struct ipvl_dev *master; /* Back pointer to master */
union {
struct in6_addr ip6; /* IPv6 address on logical interface */
struct in_addr ip4; /* IPv4 address on logical interface */
} ipu;
#define ip6addr ipu.ip6
#define ip4addr ipu.ip4
struct hlist_node hlnode; /* Hash-table linkage */
struct list_head anode; /* logical-interface linkage */
struct rcu_head rcu;
ipvl_hdr_type atype;
};

struct ipvl_port {
struct net_device *dev;
struct hlist_head hlhead[IPVLAN_HASH_SIZE];
struct list_head ipvlans;
struct rcu_head rcu;
int count;
u16 mode;
};

static inline struct ipvl_port *ipvlan_port_get_rcu(const struct net_device *d)
{
return rcu_dereference(d->rx_handler_data);
}

static inline struct ipvl_port *ipvlan_port_get_rtnl(const struct net_device *d)
{
return rtnl_dereference(d->rx_handler_data);
}

static inline bool ipvlan_dev_master(struct net_device *d)
{
return d->priv_flags & IFF_IPVLAN_MASTER;
}

static inline bool ipvlan_dev_slave(struct net_device *d)
{
return d->priv_flags & IFF_IPVLAN_SLAVE;
}

void ipvlan_adjust_mtu(struct ipvl_dev *ipvlan, struct net_device *dev);
void ipvlan_set_port_mode(struct ipvl_port *port, u32 nval);
void ipvlan_init_secret(void);
unsigned int ipvlan_mac_hash(const unsigned char *addr);
rx_handler_result_t ipvlan_handle_frame(struct sk_buff **pskb);
int ipvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev);
void ipvlan_ht_addr_add(struct ipvl_dev *ipvlan, struct ipvl_addr *addr);
bool ipvlan_addr_busy(struct ipvl_dev *ipvlan, void *iaddr, bool is_v6);
struct ipvl_addr *ipvlan_ht_addr_lookup(const struct ipvl_port *port,
const void *iaddr, bool is_v6);
void ipvlan_ht_addr_del(struct ipvl_addr *addr, bool sync);
#endif /* __IPVLAN_H */
Loading

0 comments on commit 2ad7bf3

Please sign in to comment.