Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement proposal for Equinix Metal IPI #436

Merged
merged 1 commit into from
Sep 28, 2020

Conversation

displague
Copy link
Contributor

@displague displague commented Aug 17, 2020

This initial enhancement proposal recommends the creation of a Packet provider in the IPI installer.

As with any IPI installer, this requires several pull requests throughout the OpenShift ecosystem, documentation, testing, QA, and other resources.

Unique to the Packet IPI, this enhancement points out some of the features and limitations that will need to be expressed in the solutions presented in subsequent openshift/enhancements and openshift/installer project pull requests.

Work has begun for this IPI installer in openshift/installer#3914 (very much a work in progress).

@displague
Copy link
Contributor Author

/assign @mfojtik

@russellb
Copy link
Member

/cc @crawford

Copy link
Member

@russellb russellb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for writing up an enhancement!

I've left a number of comments. Feel free to take some of them and reflect them in an "open questions" section. We don't need to block merging the provisional enhancement on answering every question.

enhancements/installer/packet-ipi.md Outdated Show resolved Hide resolved
enhancements/installer/packet-ipi.md Outdated Show resolved Hide resolved
enhancements/installer/packet-ipi.md Outdated Show resolved Hide resolved
enhancements/installer/packet-ipi.md Outdated Show resolved Hide resolved
- <https://www.packet.com/developers/changelog/custom-image-handling/>
- Packet does not offer a managed DNS solution. Route53, CloudFlare, or RFC 2136
(DNS Update), and NS record based zone forwarding are considerations. Existing
IPI drivers like BareMetal have similar limitations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For baremetal IPI, we do some automation of required DNS records for use within the cluster itself. I think that should be reusable here if multicast is allowed on packet's network. Has this been considered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multicast is not fully supported, it may work enough for our needs. Are you suggesting mDNS?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mdns is used internally for IPI. Both for nodes resolution and for pointing to the api-int entry.
It has nothing to do with external consumption.

In assisted-installer (where we also rely on BM IPI ignitions) we do provide optional route53 based dns to support external api + *.apps resolution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I've been looking for something like https://github.com/openshift/mdns-publisher

(for kubernetes-sigs/external-dns#1604)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/openshift/machine-config-operator/blob/master/templates/common/baremetal/files/baremetal-mdns-publisher.yaml

This is the relevant repo that provides networking related manifests as part of IPI BareMetal platfrom - you'll find there mdns, keepalived and coredns.

IPI drivers like BareMetal have similar limitations.
- LoadBalancing can be achieved with the Packet CCM, which creates and maintains
a MetalLB deployment within the cluster.
- Both Public and Private IPv4 and IPv6 addresses made available for each node.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should discuss how network configuration is to be handled. IIRC, the IPv4 address is handed out via DHCP, so that should work fine without any further changes. There is no DHCPv6 for IPv6, so OpenShift configured for IPv6 will not work without some additional changes. I believe we need the hosts to pull their networking config from a packet API so they can self-configure the IPv6 address?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Packet devices do not use DHCP. Device addresses are configured on the filesystem early in the boot phase.
We can look at how this is implemented to some degree with the opensource Tinkerbell project:

All of this is to say, that the Packet approach to providing images and preparing those images on boot-up could be explored.

believe we need the hosts to pull their networking config from a packet API so they can self-configure the IPv6 address?

There is also a metadata service available. https://www.packet.com/developers/docs/servers/key-features/metadata/
OSIE configures Cloudinit to use EC2 Data sources with the metadata.packet.net/metadata URL. Would it benefit this effort to seek Packet addition to https://github.com/canonical/cloud-init/tree/master/cloudinit/sources, or is there a different project where this would need to be added (Ignition)?

It may also be possible to boot a raw unmodified RHCOS image, but I'm not sure about the process. If this direction were pursued, I assume we would need a Packet presence in the version of CloudInit/Ignition shipped by RHCOS.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarifications!

cloud-init is not used here. Ignition is used, instead.

I do think we should aim for booting an unmodified RHCOS image and allowing it to get initialized at boot time. By default we'll be assuming DHCP is available. If not, it does sound like we'll need to provide some additional ignition configuration to configure networking on first boot.

You may find this enhancement interesting, which describes the levers we have available for configuring networking on the host: #394

At some point I'd like to see a section in this document that describes how we'll handle host network configuration for this platform. While the enhancement is provisional, it can be captured as an open question while we work through the details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, how does host network configuration work for the UPI installation method already implemented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bastion node is created to serve the RHCOS image and act as a DHCP server for other nodes that will be added.

The control plane nodes added after the bastion node get DHCP/ipxe instructions from the bastion node. Their DHCP boot info includes ipxe instructions to boot from the bastion node's hosted copy of the RHCOS image.

That RHCOS image is pulled at provision time from a static address:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certain kargs are automatically whitelisted. console isn't one of them, so you'd need to use --append-karg.

Copy link
Contributor Author

@displague displague Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgilbert is append-karg available in the kernel command line?

In this openshift-installer I think it is fair to have a very opinionated (generated) Ignition file.

For my local testing, and in the general use-cases of launching RHCOS (and related OSes, say terraform or packet-cli), it would be very helpful (a requirement?) to not need to merge values into the data stashed at the Ignition URL. I suspect the format of this content could be unpredictable (json vs yaml, other?) (I'm also thinking of cloud-init, where the value could be a shell script or a cloud-config, a gzipped multi-part mime document, or other).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is append-karg available in the kernel command line?

Nope.

On 4.6, with Ignition spec 3, you might take a look at FCCT, particularly storage.trees. You can create an FCC that points to a local directory tree, then run FCCT against it to produce an Ignition config which embeds 1) the contents of that tree, and 2) the Ignition config generated by openshift-installer.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgilbert is FCCT the preferred (or only) method to append kargs? MachineConfigs looked like a possible alternative, but I wasn't certain if the kargs whitelist applies here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liveaverage FCCT can't append kargs right now; that paragraph was replying to the last paragraph of #436 (comment). The mechanism you linked is separate from the coreos-installer one, and the whitelist doesn't apply there. AIUI the MachineConfig karg mechanism doesn't apply to the very first boot.

enhancements/installer/packet-ipi.md Outdated Show resolved Hide resolved
enhancements/installer/packet-ipi.md Outdated Show resolved Hide resolved

### Risks and Mitigations

The official Packet ClusterAPI, CCM, and CSI drivers are all relatively new and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Links to each of these components would be helpful.

It would also be nice to expand on the integration plans for each of these items.

ClusterAPI -- This is alluded to in the task list earlier in the doc, but I expect we'd have to fork the CAPI repo into OpenShift and then adapt it to the OpenShift machine API. Changes will be needed to the openshift/machine-api-operator to understand how to run this new machine controller.

CCM -- This is going to hit a limitation we have in OpenShift. We don't yet have integration for running the external cloud controller managers. This enhancement is related: #423 That enhancement discusses using the out-of-tree OpenStack CCM. As a prerequisite, we need a new operator to manage it. That operator can be re-used for other out-of-tree CCMs, including this one.

CSI -- I don't know how CSI drivers are integrated. Has this been investigated yet? Are there notes we can include on the integration process and any expected challenges?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Links to each of these components would be helpful.

It would also be nice to expand on the integration plans for each of these items.

Will do. I'm not sure how far out we'll need to plan for these integrations. I suspect, based on your earlier comments, that I may need another enhancement request for each of the components unless we can take advantage of existing features (especially for the CSI and CCM).

I expect we'd have to fork the CAPI repo into OpenShift and then adapt it to the OpenShift machine API.

That sounds reasonable. We could try to host it at the existing repository, but a v1beta1 tag will conflict with an eventual kubernetes-sigs spec and the likely capi-packet release tag.

This is the v1alpha1 based MachineProvider spec:
https://github.com/kubernetes-sigs/cluster-api-provider-packet/blob/v0.1.0/pkg/apis/packetprovider/v1alpha1/packetmachineproviderspec_types.go

I imagine the OpenShift v1beta1 capi-packet will build on this, the existing v1alpha3 enhancements, and changes necessary to adopt O/S capi v1beta1.

Changes will be needed to the openshift/machine-api-operator to understand how to run this new machine controller.

I started a branch for this, https://github.com/openshift/machine-api-operator/compare/master...displague:packet?expand=1. We'll need some coordination to get those quay image tags into existence.

We don't yet have integration for running the external cloud controller managers.

The value of the Packet CCM is in tracking the Packet device ids so the state of k8s nodes can reflect the state of Packet devices (deleted device is a removed node, for example). The other value is in configuring the MetalLB based loadbalancer. If there an alternative (in-tree) baremetal CCM available that could provide LoadBalancer service with the limitations that come from not being Packet API aware?

I don't know how CSI drivers are integrated. Has this been investigated yet?

Is that to say only flex volumes are supported?
What Openshift Baremetal storage solutions are available that don't rely on cloud managed storage?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no other bare-metal CCM.

I'm OK if this enhancement doesn't answer all of the integration questions around these components, but I'd like them to be captured as open questions either to be addressed by follow-up PRs to this doc, or future separate enhancements if the topic is deep enough to deserve its own doc.

enhancements/installer/packet-ipi.md Outdated Show resolved Hide resolved
IPI drivers like BareMetal have similar limitations.
- LoadBalancing can be achieved with the Packet CCM, which creates and maintains
a MetalLB deployment within the cluster.
- Both Public and Private IPv4 and IPv6 addresses made available for each node.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the old CoreOS Container Linux images for Packet, Afterburn (then called coreos-metadata) ran in the initramfs, queried the Packet metadata service for IPv6 and private IPv4 addresses and NIC bonding configuration, and wrote out configuration files for networkd. That code still exists in RHCOS but would need to be updated for NetworkManager: coreos/fedora-coreos-tracker#111

### Implementation Details/Notes/Constraints

- RHCOS must be made available and maintained.
Images available for provisioning may be custom or official.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Packet's official image and custom image mechanisms are both based on filesystem tarballs rather than disk images, and RHCOS is delivered as a disk image.

If we want to get an official RHCOS image into Packet, we should work with the Packet folks to set up a custom deployment flow using coreos-installer. CoreOS Container Linux used a similar arrangement.

Otherwise, we can use Packet's custom iPXE support to implement the same thing ourselves. We'd just boot the RHCOS live PXE image and perform an ordinary bare-metal install, telling the installer to override the platform ID to packet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've captured most of this in the latest push. Thanks for the input and context, @bgilbert!

Custom Images can be hosted on git, as described here:
- <https://www.packet.com/developers/docs/servers/operating-systems/custom-images/>
- <https://www.packet.com/developers/changelog/custom-image-handling/>
- It may be possible to boot using the [Packet Custom iPXE

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be possible

kola, the CoreOS integration testing tool, already has working code to iPXE boot the live image on Packet, perform an install to disk, and boot into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems ore also has Packet awareness. I don't know if that is at all relevant.
https://github.com/coreos/coreos-assembler/tree/master/mantle#ore

([mirror](https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.5/?extIdCarryOver=true&sc_cid=701f2000001OH7iAAG)).

A generic RHCOS Ignition URL could instruct nodes booted on Packet to
configure their network using the EC2-like [metadata

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignition wouldn't be involved in that process. Afterburn would automatically run in the initramfs (ConditionKernelCommandLine=ignition.platform.id=packet) and configure the network.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liveaverage points out that I should include a note about the size constraints of userdata over Packet metadata. I'll dig up the limit.

Can Ignition work with base64 encoded, gzipped, multipart mime (like cloud-init can)?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the userdata has to be an Ignition config without any special encoding. However, Ignition configs have a mechanism (ignition.config.merge) for merging an external Ignition config into the one fetched from metadata. If necessary, the one in userdata can bootstrap the real config.

- Can the sig-cluster-lifecycle ClusterAPI and OpenShift MachineAPI converge to
prevent maintaining two similar but different ClusterAPI implementations?
- How will the RHCOS image be served?
- How will Ignition be started and configure nodes appropriately? (restating - DHCP is not available but a metadata service is)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no bootstrapping problem here. The system comes up, DHCPs for its public IPv4 address, Afterburn configures other addresses by querying the metadata service, then Ignition runs.

enhancements/installer/packet-ipi.md Show resolved Hide resolved
- Device plan (required, defaulting to minimum required)
- Facility (required)
- Virtual Network (optional, defaulted to a new virtual network)
- Control plane variables rely on boostrap variables and add:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "rely on bootstrap variables" mean that it will inherit the same settings, enforcing common settings for the whole cluster? Or is this a shorthand way of saying "all of that stuff plus some more" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking inheritance but I left my wording open in case that is not an option or not the best choice.

enhancements/installer/packet-ipi.md Outdated Show resolved Hide resolved
Copy link
Member

@russellb russellb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for working on this documentation. I left a few more comments. Once we feel comfortable that we have at least documented all of our known open questions, I am OK merging this and iterating on it with follow-up PRs.

- Virtual Network (optional, defaulted to a new virtual network)
- Control plane variables rely on boostrap variables and add:
- Device plan (defaulting to provide a good experience)
- Additional clusters created through the ClusterAPI will require fields
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenShift does not support creating clusters through ClusterAPI. It only uses a subset (Machine, MachineSet) to abstract machine provisioning within a cluster, typically to scale out and add worker nodes.

enhancements/installer/packet-ipi.md Show resolved Hide resolved

- Is the Packet CCM necessary?
- If so, who is responsible for supporting the MetalLB installation the Packet
CCM provides and requires for LoadBalancer definitions?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fair to say that the use of MetalLB with OpenShift on Packet would follow whatever more general MetalLB deployment management we come up with. See #356 or any follow-ups to that same enhancement.

What may still make sense is Packet specific code to automatically configure it with the BGP settings that are appropriate for Packet. Automating that would be a cool feature in Packet IPI support, but is blocked on having MetalLB supported in the first place.

enhancements/installer/packet-ipi.md Show resolved Hide resolved
enhancements/installer/packet-ipi.md Show resolved Hide resolved
@russellb
Copy link
Member

At this point I'm OK merging this and just continuing to follow up with additional PRs focused on different open questions.

I'll hold off since @JoelSpeed just commented to let those questions get resolved first

Exploring initial aspirations and decision points for a Packet (Equinix)
bare metal IPI provider.

Signed-off-by: Marques Johansson <marques@packet.com>
@JoelSpeed
Copy link
Contributor

Happy to merge now if you are @russellb

@russellb
Copy link
Member

/approve
/lgtm

Let's please follow up on this with additional PRs as appropriate. Thanks again for starting this document!

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 28, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: displague, russellb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 28, 2020
@openshift-merge-robot openshift-merge-robot merged commit d95032a into openshift:master Sep 28, 2020
@displague displague changed the title enhancement proposal for Packet IPI enhancement proposal for Equinix Metal IPI Apr 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet