authors | state |
---|---|
Cody Mello <cody.mello@joyent.com> |
draft |
This proposal lays out the work that needs to be done to add support for IPv6 to SmartOS and Triton (formerly SmartDataCenter or SDC). As the Internet grows and more people come online around the globe, ISPs and online businesses are looking towards a future where they will need to support IPv6. Major ISPs like AT&T, Verizon and Comcast have already started giving customers IPv6 service and IPv6-enabled modems and routers.
Joyent customers and members of the SmartOS community are interested in using IPv6 with their VMs, so this will be a useful feature and selling point to have. (Note that it has been possible for some time to use IPv6 with vmadm(1M) but it required extra effort.) Parts of the stack have made room for future IPv6 support, but almost everything that touches networking will require modifications and testing.
This proposal also suggests a change to the relationship between Network Interface Cards (NICs) and IP addresses in Triton. Instead of always having one address per NIC, it should be possible for multiple IP addresses to be placed on a single NIC. We call this interface-centric provisioning. The idea is that when a machine is being created, it should be possible to request multiple IP addresses for each NIC (where addresses can be picked by the requester or left up to the API based on what's available), as long as they are on the same VLAN or overlay.
This document is approximately sorted in the order in which each step will need to be implemented. Basic tooling will come first, and then the plumbing through Triton. Once that is finished and tested, IPv6 support can be exposed through user interfaces to end users and operators.
vmadm(1M) is the SmartOS tool for creating and managing virtual machines. It makes use of illumos' ZFS and zone management tools to provision instances of the images managed with imgadm(1M). vmadm(1M) will need to be modified to appropriately call out to zonecfg(1M) and set up the zone to use IPv6.
vmadm(1M) creates and updates objects based on an input JSON object. The following fields will need to be modified to accommodate IPv6 and to allow specifying multiple IP addresses:
- nic.*.ips will accept an array of the following items:
- An IPv4 or IPv6 address using CIDR notation to indicate the routing prefix
- One of the strings
dhcp
oraddrconf
; note thatdhcp
andaddrconf
cannot appear multiple times in the list, and there cannot be more than 32 items between ips and allowed_ips since the kernel does not allow theallowed-ips
property of a NIC to contain more entries than this
- nic.*.gateways will accept an array of IPv4 and IPv6 addresses
- nic.*.network_uuids will be a map of IP addresses to the network they're allocated from. See RFD 32 for more.
- routes will accept both IPv4 and IPv6 networks and addresses as keys and values (address family must match for each key-value pair, of course)
- resolvers will accept both IPv4 and IPv6 addresses
The fields nic.*.ip, nic.*.netmask and nic.*.gateway will
continue to exist as is, and will be returned by vmadm get
, but will be
considered deprecated. Note that these new fields allow IPv4 and IPv6 to be
mixed. This is to avoid creating a split in configuration and creating an IPv6
dual for everything, like ping
vs. ping6
or traceroute
vs. traceroute6
in other systems.
Much of this work has been finished in OS-2994.
fwadm(1M) is the SmartOS tool for creating and managing firewall rules. Under the hood, it makes use of ipfilter(5), and converts rules written in the fwrule(5) domain-specific language to the language defined in ipf(4). Like vmadm(1M), fwadm(1M) takes a JSON object as input when creating or modifying rules. Since it passes off the firewall rules field to the fwrule(5) parser, IPv6 will only need to be accounted for in the ips field of remote VMs, and the ip field of NIC objects.
The parser for the firewall rules is generated by the Jison parser generator. The grammar will need to be adjusted to accept and validate IPv6 addresses, and the consumers will need to make sure that it gets fed appropriately to ipf(1M).
This work was finished in FWAPI-225.
The startup scripts and programs for zones and KVM images will need to be updated to make use of their IPv6 networking stack. For SmartOS and LX-branded zones, much of this work has been done in OS-4582 and OS-4741. KVM instances that are assigned static IP addresses currently make use of the DHCPv4 server embedded in QEMU, which is less than ideal. In order to respond appropriately to the guest, QEMU needs to inspect each packet sent over the network. Instead, it would be better to have the images check on boot (or more likely when the system's networking service starts up) for what IP addresses they should be using, and on what NICs. This data can be fetched via mdata-get(1M), and used to configure the system.
The logic in the brand scripts that prepare the NICs for zones will also need to be updated, so that properties like allowed-ips and dhcp-nospoof will get set correctly.
The Networking API is responsible for managing data about networks within Triton. VMAPI and CNAPI make requests to it when adding, removing and updating NICs.
NAPI will need to be modified to support the following:
- Accept and validate new IPv6 networks (see NAPI-308); a new field, subnet_type, will need to be stored in Moray so people will be able to search for specific network types (see NAPI-414)
- Manage and search IPv6 addresses
- Ensure that a NIC only has networks placed on it with the same NIC tag and VLAN ID
- Manage IPv6 network pools, and ensure that they are validated (a network pool is either IPv4 or IPv6); see NAPI-395
Further details on some of this work can be found in RFD 32.
The Firewall API is responsible for creating and managing firewall rules. Since rules are pushed off to the same library used by fwadm(1M), there is not much that should require updating beyond searching. The following changes for the ListRules endpoint will need to be made:
- ip will need to accept IPv6 addresses
- subnet will need to accept IPv6 subnets
Part of this work should also involve moving from storing firewall rules in UFDS
to storing them in Moray, instead, where the ip
and subnet
types can be
used.
This work was finished in FWAPI-212 and FWAPI-225.
The Triton headnode communicates with each of the compute nodes through agents running in their Global Zone. These agents perform a variety of tasks locally, ranging from updating the headnode, to making sure that the state on the compute node matches what the headnode has stored.
The only agent that should need to be updated is the Compute Node Agent (cn-agent), which manipulates NICs and associated information. Currently, it assumes that IP addresses are IPv4 and converts them into 32-bit numbers. After reviewing the source code for the Firewall Agent (firewaller) and for the Networking Agent (net-agent), it looks like neither one should need to be updated, since they don't manipulate IP addresses themselves, but instead pass them off to other services like NAPI.
CNS, described in RFD 1, was built with IPv6 in mind and already supports AAAA records. It shouldn't require any further development to support IPv6 once the rest of Triton is ready.
Work on adding IPv6 support to fabrics will occur during a second phase once standard zones and networks are working. Once support is added, we will assign users /64 subnets located within the fd00::/8 private network. RFC 4193 recommends randomizing allocations within this space. We should probably provide the option of picking or randomizing the prefix to the customer.
Since customers will most likely end up wanting private IP addresses that can access the rest of the internet, we may need to explore implementing IPv6 support in ipnat(1M), and possibly 6to4 options. These will require further evaluation in the future to determine if they're worth implementing, or leaving up to the network operator. Protocols like NAT64 require a lot of configuration, and running a cooperative DNS64 server, which may not be worth investing resources in.
The Operations Portal is the web interface for managing Triton and provisioning new compute nodes and virtual machines. There are several things that will need to be updated here:
- When managing NICs on a VM, the interface will need to allow for assigning multiple IP addresses to the NIC
- Tests for validating input IP addresses and query parameters will need to accommodate IPv6
- The interface for creating new networks should make it clear that IPv6 can be used by giving example input
- When creating new network pools, once the address family is decided, only networks of the same type should be suggested
CloudAPI will need to be extended to allow provisioning with IPv6 addresses, and to also accept multiple addresses per NIC. Currently, it accepts the fields ipv4_uuid and ipv4_count, but they cannot be used to assign multiple IP addresses. We will want it to support the following fields and allow them to be used for assigning multiple IP addresses:
- ipv4_uuid is the UUID of the IPv4 network to use (VLAN/vxlan ID must match IPv6 network)
- ipv4_count specifies how many IPv4 addresses should be selected from the pool and assigned to this NIC; it is currently restricted to only being 1
- ipv4_ips is an array of IPv4 addresses that should be assigned to this NIC
- ipv6_uuid is the UUID of the IPv6 network to use (VLAN/vxlan ID must match IPv4 network)
- ipv6_count specifies how many IPv6 addresses should be selected from the pool and assigned to this NIC
- ipv6_ips is an array of IPv6 addresses that should be assigned to this NIC
We began moving towards this schema and mindset in ZAPI-598. With the work laid out in this proposal, we will finish it up.
CloudAPI will also need to be extended to allow for managing firewall rules that apply to IPv6 networks and addresses.
Docker and our APIs for supporting it will need to gain the appropriate support and plumbing. The Docker Inspect API call only allows a single IP address to be returned. Either we will need to only pick a representative IP address, or the API will need to be improved to allow returning multiple IPv4 or IPv6 addresses.
Making it possible for Triton to have an IPv6 admin network would be a nice feature to offer, but it is not essential. Since the admin network is usually a non-routable private network, there will probably never be a real need for it to support IPv6. As a result, some of these features may be put off for a while. They are enumerated here though as a point of reference. Note that as of OS-4802, SmartOS hosts can use IPv6 from the global zone, but this cannot be used within Triton.
The simplest path to assigning IPv6 addresses to nodes on the admin network would be to run in.ndpd(1M) alongside Booter, and send out Router Advertisements with the autonomous bit set, so that everyone else performs SLAAC. If additional information is to be sent though, or if more control over assigning addresses is needed, then it may be better to use DHCPv6.
Binder is the DNS server used within Triton for locating admin services and compute nodes. Currently, it only serves up IPv4 A records. Before Triton can be run on an IPv6 admin network, Binder will need to gain support for serving IPv6 AAAA records, so that various programs can continue to look up services on the admin network via DNS.
Booter is the DHCP and TFTP server used in Triton for assigning compute nodes IP addresses and PXE booting them. In order for IPv6 to be used on the admin network, Booter will need to gain support for DHCPv6 so that compute nodes can get an IPv6 address and be sent the appropriate options and information to know how to properly boot.