Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPLB userspace reverse proxy load balancer #5279

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

juanluisvaladas
Copy link
Contributor

@juanluisvaladas juanluisvaladas commented Nov 20, 2024

Description

Implement userspace CPLB reverse proxy load balancer.

Due to the keepalived virtualservers using IPVS for the load balancing, we simply couldn't make it work on some environments and needed to implement a userspace reverse proxy.

Fixes #4700

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

How Has This Been Tested?

  • Manual test
  • Auto test added

Checklist:

  • My code follows the style guidelines of this project
  • My commit messages are signed-off
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

@juanluisvaladas juanluisvaladas changed the title Cplb usrp Cplb userspace reverse proxy load balancer Nov 20, 2024
@juanluisvaladas juanluisvaladas changed the title Cplb userspace reverse proxy load balancer CPLB userspace reverse proxy load balancer Nov 20, 2024
@juanluisvaladas juanluisvaladas force-pushed the cplb-usrp branch 3 times, most recently from ac4e148 to 0910d28 Compare November 20, 2024 21:03
@juanluisvaladas juanluisvaladas marked this pull request as ready for review November 20, 2024 21:05
@juanluisvaladas juanluisvaladas requested review from a team as code owners November 20, 2024 21:05
@juanluisvaladas juanluisvaladas marked this pull request as draft November 21, 2024 11:01
@juanluisvaladas juanluisvaladas marked this pull request as ready for review November 21, 2024 11:07
@juanluisvaladas juanluisvaladas marked this pull request as draft November 22, 2024 11:53
@juanluisvaladas
Copy link
Contributor Author

Got some feedback on the docs from Daniel, so I'm making it a draft temporarily

Copy link
Contributor

This pull request has merge conflicts that need to be resolved.

Copy link
Contributor

github-actions bot commented Dec 5, 2024

This pull request has merge conflicts that need to be resolved.

tcpproxy is a subset of github.com/inetaf/tcpproxy with some
modifications:
1- Implement the method SetRoutes to allow to set routes in bulk. Also
allows deletion of routes which otherwise would be impossible.
2- Implement round robin load balancing (there was no load balancing at
all).
3- Remove unused code.
4- Append Mirantis copyright for the modifications.

Additionaly this PR had to do some modifications in the copyright
linter script and `.golangci.yaml` because this is a file copied and
modified. This is required to Apache 2.0 retribution right.

Signed-off-by: Juan-Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>
@juanluisvaladas juanluisvaladas marked this pull request as ready for review December 11, 2024 17:21
// The order that routes are added in matters; each is matched in the order
// registered.
type Proxy struct {
configs map[string]*config // ip:port => config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: There's the special type netip.AddrPort for IP plus port number in the standard library, so this could be typed accordingly:

Suggested change
configs map[string]*config // ip:port => config
configs map[netip.AddrPort]*config

I guess this would make the public Proxy API clearer as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's a private field, how about we leave the map as it is (a string is more convinient than a struct for the map) and we use netip.AddrPort in the public functions?


// ListenFunc optionally specifies an alternate listen
// function. If nil, net.Dial is used.
// The provided net is always "tcp".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If net is always "tcp", wouldn't it make sense to remove that parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the interface for net.Listen, we could remove the field but we need for the tests. I'd rather leave this as it is.

Comment on lines +160 to +163
for _, c := range p.lns {
c.Close()
}
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The errors are completely ignored here. Let's either remove the error return completely, or collect them instead.

Suggested change
for _, c := range p.lns {
c.Close()
}
return nil
var errs []error
for _, c := range p.lns {
if err := c.Close(); err != nil {
errs = append(errs, err)
}
}
return errors.Join(errs...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to merge this with the other inttest, and call this multiple times, as we're doing it e.g. for the dualstack tests.

if cplbCfg := nodeConfig.Spec.Network.ControlPlaneLoadBalancing; cplbCfg != nil && cplbCfg.Enabled {
if c.SingleNode {
return errors.New("control plane load balancing cannot be used in a single-node cluster")
}

if enableK0sEndpointReconciler {
return errors.New("control plane load balancing requires the component 'endpoint-reconciler' to be disabled")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's some way of making this error message a bit more comprehensive 🤔

The message is technically correct, but might not be too helpful for folks to understand the connection between externalAddress and k0s's understanding of endpoint reconciliation...

inttest/cplb-userspace/cplbuserspace_test.go Outdated Show resolved Hide resolved
docs/cplb.md Outdated Show resolved Hide resolved
@juanluisvaladas juanluisvaladas force-pushed the cplb-usrp branch 2 times, most recently from aed0b18 to 5155c18 Compare December 19, 2024 16:09
IPVS is problematic for many reasons, implement a userspace load
balancer which should get the job done and should be far less
problematic.

Signed-off-by: Juan-Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>
The only reason why we block externalAddres is that we need a disabled
endpoint-reconciler. It doesn't make sense to move all this logic and
these parameters to the config validation, instead do it all in cmd.

Signed-off-by: Juan-Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>
1- Included userspace reverse proxy and made it the default option.
2- Explained better the difference between virtual IP and a load
balancer, it was confusing for many users.
3- Added a whole troubleshooting section.

Signed-off-by: Juan-Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>
1- Simplify tcpproxy by reomivng useless interfaces:
The original tcpproxy allowed different types of routes and needed to do
a bunch of interfacing for it to work. Since we only implement one kind
of route and one kind of target, we remove all the interfaces and merge
both structs into a unique struct.

2- Remove proxy.AddRoute: We only used it once and setRoutes can cover
that use case

3- Lock tcpproxy.Proxy when modifying routes to make it thread safe.
Prior to this we relied on the proxy being called only from one
goroutine. Now it can be called concurrently, not that we expect to do
that but a lock gives us extra safety.

4- Panic if tcpproxy.SetRoutes gets and empty route list. We now check
this in cplb_unix.go.

5- Remove the route feeding goroutine for round robin, since we added a
lock to make proxy.SetRoutes threadsafe we don't need that anymore and
it can be made much simpler by adding a lock.

Signed-off-by: Juan-Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>
@juanluisvaladas
Copy link
Contributor Author

I addressed most of the comments. As for the comments:

1-Add tcpproxy remains unchanged for easier reviews
2- Implement CPLB userspace reverse proxy LB addresses all its comments. That includes adding proxy.Stop() to keepalived.Stop() reverting the changes of errors.As to errors.Is, clearer error messages, fixing the iptables path to use k0sVars.BinDir, and all the comments regarding the test
3- Move the check for CPLB and externalAddress to cmd unifies the tests in one and only changes one error message to make it clearer, everything else remains unchanged (there were no more concerns.
4- Overhaul CPLB docs only changes the code blocks from shell to console
5- Multiple refactors on cplb.tcpproxy addresses most of the concerns regarding the tcpproxy

There are a few things unchanged:
1- Copyright: I know it looks weird but it's literally copied and pasted from an email thread from the legal team. They said this is the right way of doing it 🤷
2- Allowing to disable the userspace proxy: Because this is already far too large we need at least two subsequent PRs, one for more feature (unicast VRRP) and another one to fix a bug (etcd getting the VIP which has a conflict with the current code) how about we get that in one of the next PRs?
3- Having the tcpproxy binary as its own process: I'm not opposed, but I don't think it's a priority and doing it this way is simpler in terms of permissions (we need to query the API). If we think it's worth blocking then I'll take it out.
4-I didn't address this because I don't think it's required, but I'm not too sure: https://github.com/k0sproject/k0s/pull/5279/files#r1888710381

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CPLB hairpining fails on vSphere
2 participants