Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Equinix Metal - Add abbility to deploy servers with Layer 2/Hybrid networking #613

Open
lgn87 opened this issue Jul 27, 2023 · 14 comments
Open
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@lgn87
Copy link

lgn87 commented Jul 27, 2023

User Story

As a user of Cluster API it would be great to be able to lunch instances on Equinix Metal with Layer 2/Hybrid networking and to be able to specify custom set of private IPs for L2 interfaces.

Detailed Description

By default all servers that are created on Equinix Metal via Cluster API have Layer 3 networking and there is no option when provisioning Equinix Metal cluster to specify type of networking for instances or to create additional L2 interfaces with specific local IP addresses and VLAN.

It would be great to implement ability to specify following when provisioning Equinix Metal cluster:

- Network type (L2/L3/Hybrid)
- Creating network interfaces with specific VLAN
- IP address range for L2 interfaces
@cprivitere
Copy link
Member

/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 3, 2023
@Lirt
Copy link

Lirt commented Aug 6, 2023

Hi,

I'd like to know what will be next step in the scenario of L2 interface with 1 or multiple VLANs.

Consider scenario where instance is created and networking changed to L2 as was proposed. Then you need to do 2 things:

  1. Configure sub-interface for the VLAN in operating system.
  2. Assign IP address to the interface (doesn't have to be necessary in all use-cases).

The first can be configured inside the KubeadmControlPlane's or KubeadmConfigTemplate's preKubeadmCommands by creating file with network config (netplan for Ubuntu for example - network interface name should be somewhat consistent based on instance type). I can imagine this part as long as there are not much variable data needed.

The second issue is that when creating the network config you need to pick an IP address for the interface (to be useful). This is not very easy to automate, as the IP address is different for each server and must be populated by some IPAM in the background to do properly.

Alternative is to have a DHCP server. This is not available by Equinix services (AFAIK) which makes the only solution to spawn a "support" instance that would run DHCP (fairly expensive only for having DHCP). So here my question is - is there any plan to have a DHCP server functionality in Equinix Metal?

Then I can imagine, we would be able to do something meaningful without having to SSH to the server or use SOS and to have a full automation experience.

I'd like to hear more opinions or corrections of my view in case I missed anything.

@displague
Copy link
Member

Added an issue that will be relevant to this:

@displague
Copy link
Member

displague commented Jan 31, 2024

Linking to a relevant request for Layer2 support in the Autoscaler project:

While CAPP may not be directly related to that, any CAPP changes that depend on CPEM support for Layer2 supporting spec or annotations should be consistent with the Autoscaler experience. (a common set of Equinix Metal, annotations that mean the same thing across kubernetes-sigs controllers, all consumers of the spec.ProviderID equinixmetal:// name space)

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 30, 2024
@displague
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 30, 2024
@rahulii
Copy link

rahulii commented Jun 3, 2024

hi @displague as part of phase 1, here is what I was thinking before jumping on the design doc.
The PacketMachine CR would expose certain options to the user to configure layer 2 VLAN options:

network_type: L3/ L2 hybrid bonded mode / hybrid unbonded mode
ports:
       bond0:
          layer2: true
          vlan: [1001]  // vlan_id that this machine should attach to
          ip_address: // ip_address to assign to the vlan sub-interface
              cidr_notation: "192.168.2.1/24" #example
              type: "private_ipv4"

The CAPP Operators while provisioning the metal node would run the below script as a part of user_data:

modprobe 8021q
echo "8021q" >> /etc/modules

ip link add link bond0 name bond0.<vlan_id> type vlan id <vlan_id>
ip addr add {{ .IpAddress  }} dev bond0.<vlan_id>

ip link set dev bond0.<vlan_id> up

(To make the networking configuration permanent and survive server reboots, we'll add the new subinterface to the /etc/network/interfaces file.

This would be for the Host OS , for containers / kubernetes I think preKubeadmCommands (not yet sure) could be useful to configure networking.
"CAPP managed clusters running without internet connections (complete layer2) would be the end goal ", to start with we would use hybrid bonded mode with VLANs.
In the first phase user would define the Ip_addresses to be assigned for each machine (they could assign it from the vrf ip_ranges), later a IPAM provider would be introduced.

I'd like to hear more opinions or corrections of my view in case I missed anything.

@displague
Copy link
Member

displague commented Jun 6, 2024

@rahulii network_type can be inferred from the port definitions. It may be redundant to offer it as a parameter.
port definitions would need to include the network ports (everything that is not a bond port).

https://deploy.equinix.com/developers/docs/metal/layer2-networking/hybrid-unbonded-mode/ for example would have bond0 with eth0 being part of that bond, and it would have an eth1 outside of the bond.

While I don't think we need all of these modes to be fleshed-out in CAPP's initial Layer2/Hybrid capabilities, it's worth considering how the configuration patterns in the initial release can stand up to known future intention.

@rahulii
Copy link

rahulii commented Jun 11, 2024

@displague

#226 (comment)
This has

ports: # network_ports in the API naming
    bond0: # always comprised of ports eth0+eth1 on 2 port devices, or eth0+eth2 on 4 port devices
      bonded: true # default
      layer2: false # this is default on new devices. changes result in /ports/id/convert/layer-[2|3] API calls
      vlans: [1234, 5678, 1234] # shortened this from the API "virtual_networks"
      native_vlan: 5678 # optional, must be one of the above if set
      ip_addresses:
      - reservation_id: uuid # reserved addresses by uuid, may include global ips
      - cidr_notation: "1.2.3.4/24" # reserved addresses by address, may include global ips
      - type: private_ipv4 # dynamically assigned addresses, available via metadata.platformequinix.net
        cidr: 30 

what does ip_addresses mean here ? I am assuming these are the A) IP addresses range that needs to be assigned to the individual machine on layer2 VLAN tagged sub-interface , right ? or B) are these the Elastic Public IPs ?

@displague
Copy link
Member

displague commented Jun 11, 2024

@rahulii this is a good observation. You are right that (a) for each VLAN defined there should be some IP definitions. The initial intent was to capture (b) IP reservations which may either be elastic public addresses or addresses from a VRF block. VRF definitions are an Equinix Metal networking feature newer than many of the recommendations in that comment thread.

VRF addresses are VLAN scoped.

Elastic IP addresses are Machine scoped and must be bound to eth0 or bond0 in layer3 or hybrid modes.

In the absence of VRF, a Layer2 port may use any addresses. VRF adds forwarding and a carve-out awareness to the routing infrastructure, which is not present without VRF.


I think, for that example snippet, this leads to ip_addresses having a different shape when conditionally applied to a VLAN rather than a port. This may also lead to VLANs being a list of objects rather than an integer list, unless each IP address is made to optionally define a VLAN it applies to.

      - type: vrf # statically assigned addresses, carved out of a VRF
        # vrf_id: optional. cidr_notation will be created within designated VRF. Conflicts (or must match) with cidr, vxlan, vlan_id since these are inferred/known given a vrf_id
        cidr_notation: "1.2.3.4/24" # the range to be used by the cluster
        cidr: 22 # the size of the VRF to create (from which cidr_notation will be reserved)
        vxlan: 1234 # VLAN to create (or perhaps lookup and join?)
        # vlan_id: optional alternative uuid of existing vlan. Conflicts or target VLAN at id must match vxlan.

@rahulii
Copy link

rahulii commented Jun 11, 2024

@displague got it.
Assuming the above configuration will go inside the PacketClusterTemplate, for example:

        - type: vrf # statically assigned addresses, carved out of a VRF
        # vrf_id: optional. cidr_notation will be created within designated VRF. Conflicts (or must match) with cidr, vxlan, vlan_id since these are inferred/known given a vrf_id
          cidr_notation: "192.168.2.0/24" # the range to be used by the cluster
          cidr: 16 # the size of the VRF to create (from which cidr_notation will be reserved)
          vxlan: 1234 # VLAN to create (or perhaps lookup and join?)
       # vlan_id: optional alternative uuid of existing vlan

Now, CAPP needs to assign the port to VLAN 1224 and pick one ip address from 192.168.2.0/24 and assign it on the OS Level.
Let's assume IPAM provider is not yet in place, should CAPP internally manage the assignment of these addresses?

correct me if I am wrong.

@displague
Copy link
Member

@rahulii correct. There is no state for IP assignments on the Equinix Metal API side. Management of which IP addresses are assigned to which nodes will need to be handled in CAPP or an IPAM Provider. I may be approaching this naively, but a ConfigMap may be a way to persist and maintain those assignments. I don't know if that would create a conflict in terms of the owner of the configmap between CAPP and an IPAM provider. Alternatively, the spec and status of the Cluster resource could capture these details. IPAM providers may provide a better way to solve this that doesn't rely on ConfigMaps or modifying the Cluster resource.

In terms of how IP addresses are assigned to either ports or vlans, the Network Config format in Netplan serves as inspiration: https://netplan.readthedocs.io/en/latest/netplan-yaml/

@rahulii
Copy link

rahulii commented Jun 12, 2024

@displague should we also give options to define existing metal gateways / create a new one ? Since VRF IPs are used with metal gateways

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

7 participants