Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add support for kube-vip #65

Open
wants to merge 29 commits into
base: capi-v1
Choose a base branch
from
Open

Conversation

davidspek
Copy link

What this PR does / why we need it:
This replaces the binding of an elastic IP to one of the control plane nodes with deploying kube-vip and having it load balance the Kubernetes API between the various control plane nodes.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

@davidspek
Copy link
Author

@cprivite I just pushed a commit that should fix the linting errors and I've validated the changes with a fresh cluster deployment.

- |
if [ -f "/run/kubeadm/kubeadm.yaml" ]; then
export KUBECONFIG=/etc/kubernetes/admin.conf
export CPEM_YAML=https://raw.githubusercontent.com/detiber/packet-ccm/test/deploy/template/deployment.yaml
export SECRET_DATA='cloud-sa.json=''{"apiKey": "{{ .apiKey }}","projectID": "${PROJECT_ID}", "eipTag": "cluster-api-provider-packet:cluster-id:${CLUSTER_NAME}"}'''
export SECRET_DATA='cloud-sa.json=''{"apiKey": "{{ .apiKey }}","projectID": "${PROJECT_ID}", "loadbalancer": "kube-vip://", "facility": "${FACILITY}"}'''
Copy link
Author

@davidspek davidspek Mar 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cprivite I've noticed that for the CCM to function correctly and be able to assign IPs to services of type LoadBalancer that either the facility of metro needs to be added to the config. Since the facility is still being used by the cluster api provider, I've added the facility to the CCM config so it will work correctly for services as well.

The eipTag is removed since we don't want the CCM to allocate the EIP to one of the control plane nodes as kube-vip is already load balancing between the control plane nodes.

@davidspek
Copy link
Author

Would it make sense to bump the default number of control plane nodes to 3 as part of this PR? Since that is likely what most people should be using anyways.

@cprivitere cprivitere added the enhancement New feature or request label Mar 9, 2022
@davidspek
Copy link
Author

@detiber Do you think it would make sense to add this for the CAPI v1beta1 support?

Copy link
Owner

@detiber detiber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think this is great work.

I'm wondering if it would make sense to not swap out the default cluster template, but rather to have an alternative template (cluster-template-kube-vip.yaml) that uses kube-vip vs CPEM for EIP management. This could even be extended to having a kube-vip template configured for bgp and a kube-vip template for EIP based configuration.

The main reason I bring this up is because I don't necessarily want to drop a major change to the default template on existing users and have it be a complete surprise for them. Especially if we don't have a plan for migrating existing clusters from CPEM EIP management to kube-vip bgp based management.

templates/cluster-template.yaml Outdated Show resolved Hide resolved
templates/cluster-template.yaml Outdated Show resolved Hide resolved
Comment on lines 140 to 144
if err := r.PacketClient.EnableProjectBGP(packetCluster.Spec.ProjectID); err != nil {
log.Error(err, "error enabling bgp for project")
return ctrl.Result{}, err
}

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed because CPEM is not yet running when bootstrapping kube-vip?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPEM actually isn’t needed for kube-vip to work. If that was the case, the bootstrap would fail. I just added this for general error handling.

Comment on lines 356 to 359
if err := r.PacketClient.EnsureNodeBGPEnabled(dev.ID); err != nil {
// Do not treat an error enabling bgp on machine as fatal
return ctrl.Result{RequeueAfter: time.Second * 20}, fmt.Errorf("failed to enable bpg on machine %s: %w", machineScope.Name(), err)
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed because CPEM is not yet running when bootstrapping kube-vip?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPEM actually isn’t needed for kube-vip to work. If that was the case, the bootstrap would fail. I just added this for general error handling.

Comment on lines +42 to +44
envVarLocalASN = "METAL_LOCAL_ASN"
envVarBGPPass = "METAL_BGP_PASS" //nolint:gosec
DefaultLocalASN = 65000
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it would make sense to add support to kube-vip for enabling BGP on the project and on the host, since it would be possible to introspect these values when running on a host.

I think requiring users to configure would be a bit of a pain, since it's not necessarily going to be the same information for all users or even for different facilities within Equinix Metal.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kube-vip currently relies on CPEM to enable BGP on the project and nodes. The method I used here to determine/configure these values are derived from CPEM. So I’m not sure if introspection would work here since then CPEM should also be able to introspect these values.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I fully agree that with the current scenario we have a bootstrapping problem between CPEM and kube-vip, what I am suggesting is that we could work with the kube-vip project (or contribute the changes needed ourselves) in order to support enabling BGP configuration where needed rather than doing it here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d be happy to contribute the change to kube-vip. However, what I tried to get at is that it seems these values would need to be configured for kube-vip as well, as this is also the case for CPEM.

If I understand correctly, these values are later used by kube-vip to setup the BGP peering. So whatever enabled BGP on the project also needs to set the values, and there wouldn’t be a way to introspect them if BGP isn’t enabled on the project.

If what I’m saying doesn’t make sense I’m eager to understand why.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea is that if you have kube-vip enable bgp on the project and the device, then the details like ASN and such can be pulled out via curl from the metadata server.

https://metal.equinix.com/developers/docs/bgp/bgp-on-equinix-metal/

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cprivite That's true, but then wouldn't the same process that I'm doing here just move to kube-vip? I guess the main question is if that fits within the scope of what kube-vip does or is expected to do.

@davidspek
Copy link
Author

Thanks for the review and all your comments. I agree that it is a large change and the consequences that has for existing users. However, with the current changes to the code the CPEM EIP management wouldn’t work anymore. Part of me also believes that kube-vip is the superior solution since it load balances requests tot the API and there should be minimal interruptions compared to having 2 standby control plane nodes. For that reason, I think it would be important for new users to use kube-vip for the control plane HA, and thus for it to be the default template.

If I can confirm the upgrade would be non-disruptive for current users and it is clearly stated in the upgrade/release docs, do you think that would be acceptable?

If not, I could try to make CPEM/kube-vip configurable and possibly make it so that a CPEM template is used existing users that upgrade. Otherwise, a separate template for kube-vip would still be an option.

@detiber
Copy link
Owner

detiber commented Mar 10, 2022

Thanks for the review and all your comments. I agree that it is a large change and the consequences that has for existing users. However, with the current changes to the code the CPEM EIP management wouldn’t work anymore. Part of me also believes that kube-vip is the superior solution since it load balances requests tot the API and there should be minimal interruptions compared to having 2 standby control plane nodes. For that reason, I think it would be important for new users to use kube-vip for the control plane HA, and thus for it to be the default template.

Long term, I agree. However, considering existing users I don't want to have the default behavior change for them in a way that they wouldn't expect when spinning up a new cluster after the upgrade.

As a result, I think it would probably be better to keep the current default template, add a new template flavor (or multiple) for kube-vip and encourage users through documentation to start using the kube-vip template.

After we've been able to successfully migrate existing users over to kube-vip, then it would be easier to go ahead and change the default template without potentially causing users undue problems.

If I can confirm the upgrade would be non-disruptive for current users and it is clearly stated in the upgrade/release docs, do you think that would be acceptable?

I'm not sure there is a non-disruptive way to migrate users, since it would require changing out the KubeadmConfig and likely require a manual change to remove the EIP from the last host before kube-vip could properly configure BGP.

If not, I could try to make CPEM/kube-vip configurable and possibly make it so that a CPEM template is used existing users that upgrade. Otherwise, a separate template for kube-vip would still be an option.

@davidspek
Copy link
Author

I’ll test tomorrow and try and make the EIP handling configurable then.

In terms of upgrading existing users, as far as I know the controller will add a new control plane node, then remove an old one and repeat this until all have been upgraded. That’s the benefit of using reconciliation for managing cluster lifecycle. This shouldn’t cause interruptions to existing clusters, and also removes the problem of in assigning the EIP from the node, since the node will be removed.

@davidspek
Copy link
Author

@detiber Do you think configuring the EIP handling through an environment variable would be a sufficient solution?

@cprivitere
Copy link
Collaborator

Was testing this afternoon and it looks like for some reason kubevip isn't adding the bgp peer routes. I'm not sure why yet.

@davidspek
Copy link
Author

@cprivite That's strange. I have noticed it takes a while for the routes to show up in the Equinix Metal console, but the EIP for the control plane should come up pretty quickly.

@cprivitere cprivitere self-assigned this Mar 15, 2022
@cprivitere
Copy link
Collaborator

pluralsh#1
I made a pull request to your branch to fix up a few things. Then I think it's ready for Jason to tell us how he'd like to have us package it up as a flavor template or whatever.

@davidspek
Copy link
Author

@cprivite Sorry for the delay here, had some other urgent things I needed to get done. I'll get back to this later today or tomorrow so it can move forward.

@davidspek
Copy link
Author

@detiber @cprivite

I think there is one thing left for this to be ready for merging, based on what was discussed above.

  • Make kube-vip optional

For that I would need to know how you would like to make it configurable. Do you think configuring the EIP handling through an environment variable would be a sufficient solution? There needs to be some way to enable or disable certain functions in the controller code.

@cprivitere
Copy link
Collaborator

Yeah I'll be working on that now. Plan is to make it a second template that you'd choose by passing --flavor=kube-vip or something like that to clusterctl.

@davidspek
Copy link
Author

The extra template shouldn't be too difficult, it's mainly how to best pass that to the controller. I had already started with an implementation using an environment variable that the user would need to set.

@detiber
Copy link
Owner

detiber commented Mar 22, 2022

Apologies for causing the merge conflict on this PR, I did a first pass at rebasing/squashing my branch to clean things up for the upstream PR.

@davidspek
Copy link
Author

@detiber No worries. I'll do a rebase ASAP to fix up the merge conflicts.

@davidspek davidspek force-pushed the kube-vip branch 2 times, most recently from 9b4505c to 8da6c98 Compare March 22, 2022 20:40
cprivitere and others added 13 commits April 11, 2022 17:43
Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>
Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>
Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>
Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>
Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>
Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>
…sters < version 1.23, not just kube-vip ones.

Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>
… provider to released upstream version.

Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>
…vip, remove systemctl restart networking to avoid networking service error.

Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>
davidspek and others added 6 commits April 11, 2022 20:23
Signed-off-by: DavidSpek <vanderspek.david@gmail.com>
Signed-off-by: DavidSpek <vanderspek.david@gmail.com>
Signed-off-by: DavidSpek <vanderspek.david@gmail.com>
Signed-off-by: DavidSpek <vanderspek.david@gmail.com>
…uster Type (#5)

* Convert to having the EIP_MANAGEMENT variable as part of the packetcluster type

* Make vipmanager field immutable.

Signed-off-by: Chris Privitere <23177737+cprivitere@users.noreply.github.com>

* Rename field to VIPManager

Signed-off-by: Chris Privitere <23177737+cprivitere@users.noreply.github.com>

* rename to vipmanager, fix defaults, rm services

Signed-off-by: Chris Privitere <23177737+cprivitere@users.noreply.github.com>

* Fix typo and make VIPManager an enum

Signed-off-by: Chris Privitere <23177737+cprivitere@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
4 participants