✨ Add support for kube-vip #65

davidspek · 2022-03-08T22:46:03Z

What this PR does / why we need it:
This replaces the binding of an elastic IP to one of the control plane nodes with deploying kube-vip and having it load balance the Kubernetes API between the various control plane nodes.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

templates/cluster-template-crs-cni.yaml

davidspek · 2022-03-09T18:49:20Z

@cprivite I just pushed a commit that should fix the linting errors and I've validated the changes with a fresh cluster deployment.

davidspek · 2022-03-09T22:30:38Z

templates/cluster-template.yaml

      - |
        if [ -f "/run/kubeadm/kubeadm.yaml" ]; then
          export KUBECONFIG=/etc/kubernetes/admin.conf
          export CPEM_YAML=https://raw.githubusercontent.com/detiber/packet-ccm/test/deploy/template/deployment.yaml
-          export SECRET_DATA='cloud-sa.json=''{"apiKey": "{{ .apiKey }}","projectID": "${PROJECT_ID}", "eipTag": "cluster-api-provider-packet:cluster-id:${CLUSTER_NAME}"}'''
+          export SECRET_DATA='cloud-sa.json=''{"apiKey": "{{ .apiKey }}","projectID": "${PROJECT_ID}", "loadbalancer": "kube-vip://", "facility": "${FACILITY}"}'''


@cprivite I've noticed that for the CCM to function correctly and be able to assign IPs to services of type LoadBalancer that either the facility of metro needs to be added to the config. Since the facility is still being used by the cluster api provider, I've added the facility to the CCM config so it will work correctly for services as well.

The eipTag is removed since we don't want the CCM to allocate the EIP to one of the control plane nodes as kube-vip is already load balancing between the control plane nodes.

davidspek · 2022-03-09T22:39:44Z

Would it make sense to bump the default number of control plane nodes to 3 as part of this PR? Since that is likely what most people should be using anyways.

templates/cluster-template-crs-cni.yaml

davidspek · 2022-03-10T16:50:24Z

@detiber Do you think it would make sense to add this for the CAPI v1beta1 support?

detiber

Overall, I think this is great work.

I'm wondering if it would make sense to not swap out the default cluster template, but rather to have an alternative template (cluster-template-kube-vip.yaml) that uses kube-vip vs CPEM for EIP management. This could even be extended to having a kube-vip template configured for bgp and a kube-vip template for EIP based configuration.

The main reason I bring this up is because I don't necessarily want to drop a major change to the default template on existing users and have it be a complete surprise for them. Especially if we don't have a plan for migrating existing clusters from CPEM EIP management to kube-vip bgp based management.

templates/cluster-template.yaml

detiber · 2022-03-10T20:55:31Z

controllers/packetcluster_controller.go

+	if err := r.PacketClient.EnableProjectBGP(packetCluster.Spec.ProjectID); err != nil {
+		log.Error(err, "error enabling bgp for project")
+		return ctrl.Result{}, err
+	}
+


Is this needed because CPEM is not yet running when bootstrapping kube-vip?

CPEM actually isn’t needed for kube-vip to work. If that was the case, the bootstrap would fail. I just added this for general error handling.

detiber · 2022-03-10T20:55:51Z

controllers/packetmachine_controller.go

+	if err := r.PacketClient.EnsureNodeBGPEnabled(dev.ID); err != nil {
+		// Do not treat an error enabling bgp on machine as fatal
+		return ctrl.Result{RequeueAfter: time.Second * 20}, fmt.Errorf("failed to enable bpg on machine %s: %w", machineScope.Name(), err)
+	}


Is this needed because CPEM is not yet running when bootstrapping kube-vip?

CPEM actually isn’t needed for kube-vip to work. If that was the case, the bootstrap would fail. I just added this for general error handling.

detiber · 2022-03-10T20:59:12Z

pkg/cloud/packet/client.go

+	envVarLocalASN  = "METAL_LOCAL_ASN"
+	envVarBGPPass   = "METAL_BGP_PASS" //nolint:gosec
+	DefaultLocalASN = 65000


I'm wondering if it would make sense to add support to kube-vip for enabling BGP on the project and on the host, since it would be possible to introspect these values when running on a host.

I think requiring users to configure would be a bit of a pain, since it's not necessarily going to be the same information for all users or even for different facilities within Equinix Metal.

Kube-vip currently relies on CPEM to enable BGP on the project and nodes. The method I used here to determine/configure these values are derived from CPEM. So I’m not sure if introspection would work here since then CPEM should also be able to introspect these values.

So, I fully agree that with the current scenario we have a bootstrapping problem between CPEM and kube-vip, what I am suggesting is that we could work with the kube-vip project (or contribute the changes needed ourselves) in order to support enabling BGP configuration where needed rather than doing it here.

I’d be happy to contribute the change to kube-vip. However, what I tried to get at is that it seems these values would need to be configured for kube-vip as well, as this is also the case for CPEM.

If I understand correctly, these values are later used by kube-vip to setup the BGP peering. So whatever enabled BGP on the project also needs to set the values, and there wouldn’t be a way to introspect them if BGP isn’t enabled on the project.

If what I’m saying doesn’t make sense I’m eager to understand why.

I think the idea is that if you have kube-vip enable bgp on the project and the device, then the details like ASN and such can be pulled out via curl from the metadata server.

https://metal.equinix.com/developers/docs/bgp/bgp-on-equinix-metal/

@cprivite That's true, but then wouldn't the same process that I'm doing here just move to kube-vip? I guess the main question is if that fits within the scope of what kube-vip does or is expected to do.

davidspek · 2022-03-10T21:44:32Z

Thanks for the review and all your comments. I agree that it is a large change and the consequences that has for existing users. However, with the current changes to the code the CPEM EIP management wouldn’t work anymore. Part of me also believes that kube-vip is the superior solution since it load balances requests tot the API and there should be minimal interruptions compared to having 2 standby control plane nodes. For that reason, I think it would be important for new users to use kube-vip for the control plane HA, and thus for it to be the default template.

If I can confirm the upgrade would be non-disruptive for current users and it is clearly stated in the upgrade/release docs, do you think that would be acceptable?

If not, I could try to make CPEM/kube-vip configurable and possibly make it so that a CPEM template is used existing users that upgrade. Otherwise, a separate template for kube-vip would still be an option.

detiber · 2022-03-10T21:50:51Z

Thanks for the review and all your comments. I agree that it is a large change and the consequences that has for existing users. However, with the current changes to the code the CPEM EIP management wouldn’t work anymore. Part of me also believes that kube-vip is the superior solution since it load balances requests tot the API and there should be minimal interruptions compared to having 2 standby control plane nodes. For that reason, I think it would be important for new users to use kube-vip for the control plane HA, and thus for it to be the default template.

Long term, I agree. However, considering existing users I don't want to have the default behavior change for them in a way that they wouldn't expect when spinning up a new cluster after the upgrade.

As a result, I think it would probably be better to keep the current default template, add a new template flavor (or multiple) for kube-vip and encourage users through documentation to start using the kube-vip template.

After we've been able to successfully migrate existing users over to kube-vip, then it would be easier to go ahead and change the default template without potentially causing users undue problems.

If I can confirm the upgrade would be non-disruptive for current users and it is clearly stated in the upgrade/release docs, do you think that would be acceptable?

I'm not sure there is a non-disruptive way to migrate users, since it would require changing out the KubeadmConfig and likely require a manual change to remove the EIP from the last host before kube-vip could properly configure BGP.

If not, I could try to make CPEM/kube-vip configurable and possibly make it so that a CPEM template is used existing users that upgrade. Otherwise, a separate template for kube-vip would still be an option.

davidspek · 2022-03-10T22:13:11Z

I’ll test tomorrow and try and make the EIP handling configurable then.

In terms of upgrading existing users, as far as I know the controller will add a new control plane node, then remove an old one and repeat this until all have been upgraded. That’s the benefit of using reconciliation for managing cluster lifecycle. This shouldn’t cause interruptions to existing clusters, and also removes the problem of in assigning the EIP from the node, since the node will be removed.

davidspek · 2022-03-11T13:46:14Z

@detiber Do you think configuring the EIP handling through an environment variable would be a sufficient solution?

cprivitere · 2022-03-11T23:12:00Z

Was testing this afternoon and it looks like for some reason kubevip isn't adding the bgp peer routes. I'm not sure why yet.

davidspek · 2022-03-15T10:50:29Z

@cprivite That's strange. I have noticed it takes a while for the routes to show up in the Equinix Metal console, but the EIP for the control plane should come up pretty quickly.

cprivitere · 2022-03-15T22:54:58Z

pluralsh#1
I made a pull request to your branch to fix up a few things. Then I think it's ready for Jason to tell us how he'd like to have us package it up as a flavor template or whatever.

davidspek · 2022-03-17T15:02:08Z

@cprivite Sorry for the delay here, had some other urgent things I needed to get done. I'll get back to this later today or tomorrow so it can move forward.

davidspek · 2022-03-18T11:18:31Z

@detiber @cprivite

I think there is one thing left for this to be ready for merging, based on what was discussed above.

Make kube-vip optional

For that I would need to know how you would like to make it configurable. Do you think configuring the EIP handling through an environment variable would be a sufficient solution? There needs to be some way to enable or disable certain functions in the controller code.

cprivitere · 2022-03-18T15:42:42Z

Yeah I'll be working on that now. Plan is to make it a second template that you'd choose by passing --flavor=kube-vip or something like that to clusterctl.

davidspek · 2022-03-18T15:47:58Z

The extra template shouldn't be too difficult, it's mainly how to best pass that to the controller. I had already started with an implementation using an environment variable that the user would need to set.

detiber · 2022-03-22T19:22:58Z

Apologies for causing the merge conflict on this PR, I did a first pass at rebasing/squashing my branch to clean things up for the upstream PR.

davidspek · 2022-03-22T19:34:21Z

@detiber No worries. I'll do a rebase ASAP to fix up the merge conflicts.

…rvices

Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>

…sters < version 1.23, not just kube-vip ones. Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>

… provider to released upstream version. Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>

…vip, remove systemctl restart networking to avoid networking service error. Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>

Signed-off-by: DavidSpek <vanderspek.david@gmail.com>

…uster Type (#5) * Convert to having the EIP_MANAGEMENT variable as part of the packetcluster type * Make vipmanager field immutable. Signed-off-by: Chris Privitere <23177737+cprivitere@users.noreply.github.com> * Rename field to VIPManager Signed-off-by: Chris Privitere <23177737+cprivitere@users.noreply.github.com> * rename to vipmanager, fix defaults, rm services Signed-off-by: Chris Privitere <23177737+cprivitere@users.noreply.github.com> * Fix typo and make VIPManager an enum Signed-off-by: Chris Privitere <23177737+cprivitere@users.noreply.github.com>

davidspek mentioned this pull request Mar 8, 2022

⚠️ CAPI v1 support kubernetes-sigs/cluster-api-provider-packet#307

Merged

6 tasks

davidspek commented Mar 8, 2022

View reviewed changes

templates/cluster-template-crs-cni.yaml Outdated Show resolved Hide resolved

davidspek mentioned this pull request Mar 9, 2022

Refactor EIP handling kubernetes-sigs/cloud-provider-equinix-metal#230

Merged

davidspek force-pushed the kube-vip branch from 53cf3e2 to 06b2a25 Compare March 9, 2022 15:42

davidspek commented Mar 9, 2022

View reviewed changes

cprivitere added the enhancement New feature or request label Mar 9, 2022

cprivitere reviewed Mar 9, 2022

View reviewed changes

templates/cluster-template-crs-cni.yaml Outdated Show resolved Hide resolved

detiber reviewed Mar 10, 2022

View reviewed changes

cprivitere self-assigned this Mar 15, 2022

davidspek mentioned this pull request Mar 17, 2022

Clean up scripting, fix awk, add ip routes for IBX datacenters pluralsh/cluster-api-provider-packet#1

Merged

davidspek force-pushed the kube-vip branch from 0cea16d to 7e737a3 Compare March 18, 2022 11:09

detiber force-pushed the capi-v1 branch from 7929242 to 85c660c Compare March 22, 2022 19:13

davidspek force-pushed the kube-vip branch 2 times, most recently from 9b4505c to 8da6c98 Compare March 22, 2022 20:40

davidspek force-pushed the kube-vip branch from 1a09ca6 to 7b1f2ba Compare April 8, 2022 13:24

davidspek added 8 commits April 8, 2022 15:25

init

ecc3976

run make generate

843f036

don't have CCM manage the control plane EIP

5d41478

fix linting errors

bf68262

remove comments from template

9751345

Add facility to CCM config so EIPs can be created for LoadBalancer se…

be61a92

…rvices

pin kube-vip version

d95b01a

fix hardcoded project id and make kube-vip version configurable

088bbf9

davidspek force-pushed the kube-vip branch from 7b1f2ba to 7544fc0 Compare April 8, 2022 13:25

cprivitere and others added 13 commits April 11, 2022 17:43

Clean up scripting, fix awk, add ip routes for IBX datacenters

a6c222c

Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>

remove caching of metadata where it isn't used

40c7de0

Remove echo used for debuggin

b68c838

Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>

Adding a space for readability.

a0fc851

Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>

Have kubeadm ignore manifests directory already existing.

b05171b

Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>

run make generate

7d7c0b7

revert to non-kubevip state

f3f5d78

split kube-vip version to separate template

857424f

run make generate

277e4a4

Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>

Generate kube-vip template via kustomize

d1b8a2b

Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>

Add ignore preflight errors to base template as it applies to all clu…

d552aad

…sters < version 1.23, not just kube-vip ones. Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>

Update templates to set up BGP routes in IBX datacenters, update CPEM…

ae0ed29

… provider to released upstream version. Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>

Further refinements to the templates. bgp sections are only for kube-…

f1fb7a9

…vip, remove systemctl restart networking to avoid networking service error. Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>

davidspek force-pushed the kube-vip branch from 7544fc0 to f1fb7a9 Compare April 11, 2022 15:44

davidspek and others added 6 commits April 11, 2022 20:23

make EIP management configurable

84f9d04

add suggested changes

4509155

Signed-off-by: DavidSpek <vanderspek.david@gmail.com>

run make generate

1e09a52

Signed-off-by: DavidSpek <vanderspek.david@gmail.com>

fix lint error

467fadc

Signed-off-by: DavidSpek <vanderspek.david@gmail.com>

remove services from kube-vip config

98c29cd

Signed-off-by: DavidSpek <vanderspek.david@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ Add support for kube-vip #65

✨ Add support for kube-vip #65

davidspek commented Mar 8, 2022

davidspek commented Mar 9, 2022

davidspek Mar 9, 2022 •

edited

Loading

davidspek commented Mar 9, 2022

davidspek commented Mar 10, 2022

detiber left a comment

detiber Mar 10, 2022

davidspek Mar 10, 2022

detiber Mar 10, 2022

davidspek Mar 10, 2022

detiber Mar 10, 2022

davidspek Mar 10, 2022

detiber Mar 10, 2022

davidspek Mar 10, 2022

cprivitere Mar 11, 2022

davidspek Mar 17, 2022

davidspek commented Mar 10, 2022

detiber commented Mar 10, 2022

davidspek commented Mar 10, 2022

davidspek commented Mar 11, 2022

cprivitere commented Mar 11, 2022

davidspek commented Mar 15, 2022

cprivitere commented Mar 15, 2022

davidspek commented Mar 17, 2022

davidspek commented Mar 18, 2022

cprivitere commented Mar 18, 2022

davidspek commented Mar 18, 2022

detiber commented Mar 22, 2022

davidspek commented Mar 22, 2022

✨ Add support for kube-vip #65

Are you sure you want to change the base?

✨ Add support for kube-vip #65

Conversation

davidspek commented Mar 8, 2022

davidspek commented Mar 9, 2022

davidspek Mar 9, 2022 • edited Loading

Choose a reason for hiding this comment

davidspek commented Mar 9, 2022

davidspek commented Mar 10, 2022

detiber left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidspek commented Mar 10, 2022

detiber commented Mar 10, 2022

davidspek commented Mar 10, 2022

davidspek commented Mar 11, 2022

cprivitere commented Mar 11, 2022

davidspek commented Mar 15, 2022

cprivitere commented Mar 15, 2022

davidspek commented Mar 17, 2022

davidspek commented Mar 18, 2022

cprivitere commented Mar 18, 2022

davidspek commented Mar 18, 2022

detiber commented Mar 22, 2022

davidspek commented Mar 22, 2022

davidspek Mar 9, 2022 •

edited

Loading