-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Add support for kube-vip #65
base: capi-v1
Are you sure you want to change the base?
Conversation
@cprivite I just pushed a commit that should fix the linting errors and I've validated the changes with a fresh cluster deployment. |
templates/cluster-template.yaml
Outdated
- | | ||
if [ -f "/run/kubeadm/kubeadm.yaml" ]; then | ||
export KUBECONFIG=/etc/kubernetes/admin.conf | ||
export CPEM_YAML=https://raw.githubusercontent.com/detiber/packet-ccm/test/deploy/template/deployment.yaml | ||
export SECRET_DATA='cloud-sa.json=''{"apiKey": "{{ .apiKey }}","projectID": "${PROJECT_ID}", "eipTag": "cluster-api-provider-packet:cluster-id:${CLUSTER_NAME}"}''' | ||
export SECRET_DATA='cloud-sa.json=''{"apiKey": "{{ .apiKey }}","projectID": "${PROJECT_ID}", "loadbalancer": "kube-vip://", "facility": "${FACILITY}"}''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cprivite I've noticed that for the CCM to function correctly and be able to assign IPs to services of type LoadBalancer
that either the facility of metro needs to be added to the config. Since the facility is still being used by the cluster api provider, I've added the facility to the CCM config so it will work correctly for services as well.
The eipTag is removed since we don't want the CCM to allocate the EIP to one of the control plane nodes as kube-vip
is already load balancing between the control plane nodes.
Would it make sense to bump the default number of control plane nodes to 3 as part of this PR? Since that is likely what most people should be using anyways. |
@detiber Do you think it would make sense to add this for the CAPI v1beta1 support? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I think this is great work.
I'm wondering if it would make sense to not swap out the default cluster template, but rather to have an alternative template (cluster-template-kube-vip.yaml) that uses kube-vip vs CPEM for EIP management. This could even be extended to having a kube-vip template configured for bgp and a kube-vip template for EIP based configuration.
The main reason I bring this up is because I don't necessarily want to drop a major change to the default template on existing users and have it be a complete surprise for them. Especially if we don't have a plan for migrating existing clusters from CPEM EIP management to kube-vip bgp based management.
if err := r.PacketClient.EnableProjectBGP(packetCluster.Spec.ProjectID); err != nil { | ||
log.Error(err, "error enabling bgp for project") | ||
return ctrl.Result{}, err | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed because CPEM is not yet running when bootstrapping kube-vip?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CPEM actually isn’t needed for kube-vip to work. If that was the case, the bootstrap would fail. I just added this for general error handling.
if err := r.PacketClient.EnsureNodeBGPEnabled(dev.ID); err != nil { | ||
// Do not treat an error enabling bgp on machine as fatal | ||
return ctrl.Result{RequeueAfter: time.Second * 20}, fmt.Errorf("failed to enable bpg on machine %s: %w", machineScope.Name(), err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed because CPEM is not yet running when bootstrapping kube-vip?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CPEM actually isn’t needed for kube-vip to work. If that was the case, the bootstrap would fail. I just added this for general error handling.
envVarLocalASN = "METAL_LOCAL_ASN" | ||
envVarBGPPass = "METAL_BGP_PASS" //nolint:gosec | ||
DefaultLocalASN = 65000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if it would make sense to add support to kube-vip for enabling BGP on the project and on the host, since it would be possible to introspect these values when running on a host.
I think requiring users to configure would be a bit of a pain, since it's not necessarily going to be the same information for all users or even for different facilities within Equinix Metal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kube-vip currently relies on CPEM to enable BGP on the project and nodes. The method I used here to determine/configure these values are derived from CPEM. So I’m not sure if introspection would work here since then CPEM should also be able to introspect these values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I fully agree that with the current scenario we have a bootstrapping problem between CPEM and kube-vip, what I am suggesting is that we could work with the kube-vip project (or contribute the changes needed ourselves) in order to support enabling BGP configuration where needed rather than doing it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d be happy to contribute the change to kube-vip. However, what I tried to get at is that it seems these values would need to be configured for kube-vip as well, as this is also the case for CPEM.
If I understand correctly, these values are later used by kube-vip to setup the BGP peering. So whatever enabled BGP on the project also needs to set the values, and there wouldn’t be a way to introspect them if BGP isn’t enabled on the project.
If what I’m saying doesn’t make sense I’m eager to understand why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the idea is that if you have kube-vip enable bgp on the project and the device, then the details like ASN and such can be pulled out via curl from the metadata server.
https://metal.equinix.com/developers/docs/bgp/bgp-on-equinix-metal/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cprivite That's true, but then wouldn't the same process that I'm doing here just move to kube-vip
? I guess the main question is if that fits within the scope of what kube-vip
does or is expected to do.
Thanks for the review and all your comments. I agree that it is a large change and the consequences that has for existing users. However, with the current changes to the code the CPEM EIP management wouldn’t work anymore. Part of me also believes that If I can confirm the upgrade would be non-disruptive for current users and it is clearly stated in the upgrade/release docs, do you think that would be acceptable? If not, I could try to make CPEM/kube-vip configurable and possibly make it so that a CPEM template is used existing users that upgrade. Otherwise, a separate template for |
Long term, I agree. However, considering existing users I don't want to have the default behavior change for them in a way that they wouldn't expect when spinning up a new cluster after the upgrade. As a result, I think it would probably be better to keep the current default template, add a new template flavor (or multiple) for kube-vip and encourage users through documentation to start using the kube-vip template. After we've been able to successfully migrate existing users over to kube-vip, then it would be easier to go ahead and change the default template without potentially causing users undue problems.
I'm not sure there is a non-disruptive way to migrate users, since it would require changing out the KubeadmConfig and likely require a manual change to remove the EIP from the last host before kube-vip could properly configure BGP.
|
I’ll test tomorrow and try and make the EIP handling configurable then. In terms of upgrading existing users, as far as I know the controller will add a new control plane node, then remove an old one and repeat this until all have been upgraded. That’s the benefit of using reconciliation for managing cluster lifecycle. This shouldn’t cause interruptions to existing clusters, and also removes the problem of in assigning the EIP from the node, since the node will be removed. |
@detiber Do you think configuring the EIP handling through an environment variable would be a sufficient solution? |
Was testing this afternoon and it looks like for some reason kubevip isn't adding the bgp peer routes. I'm not sure why yet. |
@cprivite That's strange. I have noticed it takes a while for the routes to show up in the Equinix Metal console, but the EIP for the control plane should come up pretty quickly. |
pluralsh#1 |
@cprivite Sorry for the delay here, had some other urgent things I needed to get done. I'll get back to this later today or tomorrow so it can move forward. |
I think there is one thing left for this to be ready for merging, based on what was discussed above.
For that I would need to know how you would like to make it configurable. Do you think configuring the EIP handling through an environment variable would be a sufficient solution? There needs to be some way to enable or disable certain functions in the controller code. |
Yeah I'll be working on that now. Plan is to make it a second template that you'd choose by passing --flavor=kube-vip or something like that to clusterctl. |
The extra template shouldn't be too difficult, it's mainly how to best pass that to the controller. I had already started with an implementation using an environment variable that the user would need to set. |
Apologies for causing the merge conflict on this PR, I did a first pass at rebasing/squashing my branch to clean things up for the upstream PR. |
@detiber No worries. I'll do a rebase ASAP to fix up the merge conflicts. |
9b4505c
to
8da6c98
Compare
Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>
Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>
Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>
Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>
Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>
Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>
…sters < version 1.23, not just kube-vip ones. Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>
… provider to released upstream version. Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>
…vip, remove systemctl restart networking to avoid networking service error. Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>
Signed-off-by: DavidSpek <vanderspek.david@gmail.com>
Signed-off-by: DavidSpek <vanderspek.david@gmail.com>
Signed-off-by: DavidSpek <vanderspek.david@gmail.com>
Signed-off-by: DavidSpek <vanderspek.david@gmail.com>
…uster Type (#5) * Convert to having the EIP_MANAGEMENT variable as part of the packetcluster type * Make vipmanager field immutable. Signed-off-by: Chris Privitere <23177737+cprivitere@users.noreply.github.com> * Rename field to VIPManager Signed-off-by: Chris Privitere <23177737+cprivitere@users.noreply.github.com> * rename to vipmanager, fix defaults, rm services Signed-off-by: Chris Privitere <23177737+cprivitere@users.noreply.github.com> * Fix typo and make VIPManager an enum Signed-off-by: Chris Privitere <23177737+cprivitere@users.noreply.github.com>
What this PR does / why we need it:
This replaces the binding of an elastic IP to one of the control plane nodes with deploying
kube-vip
and having it load balance the Kubernetes API between the various control plane nodes.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #