Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add support for a secondary control plane load balancer #4733

Merged
merged 1 commit into from
Feb 4, 2024

Conversation

nrb
Copy link
Contributor

@nrb nrb commented Jan 11, 2024

What type of PR is this?
/kind feature

What this PR does / why we need it:

Certain topologies of clusters may use an internal load balancer for the API server for a few reasons. A primary reason is to keep traffic within the cloud provider in order to reduce egress costs.

This PR adds support for a secondary control plane load balancer, internal to the Kubernetes cluster for such a use case.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #4554

Special notes for your reviewer:

Previously, it was believed that security groups could not be added to a network load balancer, which meant that there were exceptions and exclusions around NLBs in the code. This is only true if the load balancer was created without any security groups - SGs cannot be added after creation. If the network load balancer is created with at least one security group, more can be added or removed.

See the Network Load Balancer docs for more details.

Checklist:

  • squashed commits
  • includes documentation
  • includes emojis
  • adds unit tests
  • adds or updates e2e tests

Release note:

Adds support for a secondary, internal API server load balancer within clusters.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Jan 11, 2024
@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 11, 2024
@nrb
Copy link
Contributor Author

nrb commented Jan 11, 2024

/test pull-cluster-api-provider-aws-test

@nrb
Copy link
Contributor Author

nrb commented Jan 13, 2024

/retest-required

@nrb
Copy link
Contributor Author

nrb commented Jan 18, 2024

I'm now able to create a 2nd load balancer within my AWS clusters.

Screenshot 2024-01-18 at 6 20 06 PM

The target group associated with it is also healthy.

Screenshot 2024-01-18 at 6 21 12 PM

I just have rebasing, writing automated tests, and writing docs left before I can remove the WIP.

@nrb nrb marked this pull request as ready for review January 19, 2024 18:22
@nrb nrb changed the title WIP: ✨ Add support for a secondary control plane load balancer ✨ Add support for a secondary control plane load balancer Jan 19, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 19, 2024
@nrb nrb force-pushed the secondary-lb branch 3 times, most recently from 0e71456 to b08fe0c Compare January 19, 2024 20:17
@nrb
Copy link
Contributor Author

nrb commented Jan 19, 2024

Confirming that nodes within the cluster can reach the load balancer:

✦ ❯ kubectl --kubeconfig $HOME/secondlb.kubeconfig attach curl-nrb -c curl-nrb -it
If you don't see a command prompt, try pressing enter.
[ root@curl-nrb:/ ]$ curl -k https://numbertwo-e4d5b20bb984138d.elb.us-east-2.amazonaws.com:6443
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
  "reason": "Forbidden",
  "details": {},
  "code": 403
}[ root@curl-nrb:/ ]$

Comment on lines 139 to 143
// TODO(nrb): The validate-on-update default test was failing when newlb was nil
// Upon investigation, both values were nil, which is unlikely in a real world scenario
if oldlb == nil && newlb == nil {
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newLoadBalancer := &AWSLoadBalancerSpec{}
	existingLoadBalancer := &AWSLoadBalancerSpec{}

	if r.Spec.ControlPlaneLoadBalancer != nil {
		newLoadBalancer = r.Spec.ControlPlaneLoadBalancer.DeepCopy()
	}

	if oldC.Spec.ControlPlaneLoadBalancer != nil {
		existingLoadBalancer = oldC.Spec.ControlPlaneLoadBalancer.DeepCopy()
	}

This was previously handled this way

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which to be honest, maybe better to restructure, given that it was a bit confusing

controllers/awsmachine_controller.go Outdated Show resolved Hide resolved
controllers/awsmachine_controller.go Outdated Show resolved Hide resolved
controllers/awsmachine_controller.go Outdated Show resolved Hide resolved
docs/book/src/topics/secondary-load-balancer.md Outdated Show resolved Hide resolved
pkg/cloud/scope/elb.go Show resolved Hide resolved
pkg/cloud/scope/elb.go Outdated Show resolved Hide resolved
@richardcase
Copy link
Member

/milestone v2.4.0

@k8s-ci-robot k8s-ci-robot added this to the v2.4.0 milestone Jan 22, 2024
@richardcase richardcase removed this from the v2.4.0 milestone Jan 23, 2024
@richardcase
Copy link
Member

/milestone v2.4.0

@k8s-ci-robot k8s-ci-robot added this to the v2.4.0 milestone Jan 23, 2024
* A secondary control plane load balancer can be added.
* It will be a network load balancer.
* Its scheme must not match that of the ControlPlaneLoadBalancer.

Previously, NLBs could not have security groups attached. This has now
changed, and an NLB can have a security group attached at creation. If a
security group is _not_ present at creation, then the NLB can never have
security groups added.
@nrb
Copy link
Contributor Author

nrb commented Jan 24, 2024

/test pull-cluster-api-provider-aws-e2e
/test pull-cluster-api-provider-aws-e2e-conformance

@vincepri
Copy link
Member

/test ?

@k8s-ci-robot
Copy link
Contributor

@vincepri: The following commands are available to trigger required jobs:

  • /test pull-cluster-api-provider-aws-build
  • /test pull-cluster-api-provider-aws-test
  • /test pull-cluster-api-provider-aws-verify

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-provider-aws-apidiff-main
  • /test pull-cluster-api-provider-aws-e2e
  • /test pull-cluster-api-provider-aws-e2e-blocking
  • /test pull-cluster-api-provider-aws-e2e-clusterclass
  • /test pull-cluster-api-provider-aws-e2e-conformance
  • /test pull-cluster-api-provider-aws-e2e-conformance-with-ci-artifacts
  • /test pull-cluster-api-provider-aws-e2e-eks
  • /test pull-cluster-api-provider-aws-e2e-eks-gc
  • /test pull-cluster-api-provider-aws-e2e-eks-testing

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-provider-aws-apidiff-main
  • pull-cluster-api-provider-aws-build
  • pull-cluster-api-provider-aws-test
  • pull-cluster-api-provider-aws-verify

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vincepri
Copy link
Member

/test pull-cluster-api-provider-aws-apidiff-main
/test pull-cluster-api-provider-aws-e2e
/test pull-cluster-api-provider-aws-e2e-blocking
/test pull-cluster-api-provider-aws-e2e-clusterclass
/test pull-cluster-api-provider-aws-e2e-conformance
/test pull-cluster-api-provider-aws-e2e-conformance-with-ci-artifacts
/test pull-cluster-api-provider-aws-e2e-eks
/test pull-cluster-api-provider-aws-e2e-eks-gc
/test pull-cluster-api-provider-aws-e2e-eks-testing

@nrb
Copy link
Contributor Author

nrb commented Jan 24, 2024

Looks like some sort of account lookup issue with Boskos; it fails here with a 404.

@nrb
Copy link
Contributor Author

nrb commented Jan 24, 2024

/test pull-cluster-api-provider-aws-e2e-blocking

@nrb
Copy link
Contributor Author

nrb commented Jan 25, 2024

/retest

2 similar comments
@vincepri
Copy link
Member

/retest

@nrb
Copy link
Contributor Author

nrb commented Jan 26, 2024

/retest

Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/assign @Ankitasw @richardcase

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 26, 2024
@vincepri
Copy link
Member

cc @JoelSpeed

Copy link
Member

@richardcase richardcase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @nrb . We should try to get this covered via e2e in the future.

@@ -37,6 +37,9 @@ type NetworkStatus struct {
// APIServerELB is the Kubernetes api server load balancer.
APIServerELB LoadBalancer `json:"apiServerElb,omitempty"`

// SecondaryAPIServerELB is the secondary Kubernetes api server load balancer.
SecondaryAPIServerELB LoadBalancer `json:"secondaryAPIServerELB,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would be good to mark it optional

@@ -331,7 +332,8 @@ func (r *AWSMachineReconciler) reconcileDelete(machineScope *scope.MachineScope,
if err := r.reconcileLBAttachment(machineScope, elbScope, instance); err != nil {
// We are tolerating AccessDenied error, so this won't block for users with older version of IAM;
// all the other errors are blocking.
if !elb.IsAccessDenied(err) && !elb.IsNotFound(err) {
// Because we are reconciling all load balancers, attempt to treat the error as a list of errors.
if err = kerrors.FilterOut(err, elb.IsAccessDenied, elb.IsNotFound); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know about kerrors.FilterOut. Excellent.

This secondary control plane load balancer is primarily meant to be used for internal cluster traffic, for use cases where traffic between nodes and pods should be kept internal to the VPC network.
This adds a layer of privacy to traffic, as well as potentially saving on egress costs for traffic to the Kubernetes API server.

A dual load balancer topology is not used as a default in order to maintain backward compatibility with existing CAPA clusters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider adding a template for this. Also, we should think about whether we think having a 2nd internal LB is a good default for the future (i.e. then you opt out).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll make an issue for the template.

One thing I'm looking at upstream is to support a slice of ControlPlaneEndpoints, which I think would tend toward a default set up of two LBs, but I'm not positive. This PR is sort of a trial run, though I think CAPZ already supports internal/external LBs.

@richardcase
Copy link
Member

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: richardcase

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 4, 2024
@k8s-ci-robot k8s-ci-robot merged commit ee2c5f4 into kubernetes-sigs:main Feb 4, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add the ability to add a secondary load balancer to the control plane
7 participants