Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support custom HA proxy config in CAPD #7684

Closed
alexander-demicev opened this issue Dec 5, 2022 · 27 comments · Fixed by #8785
Closed

Support custom HA proxy config in CAPD #7684

alexander-demicev opened this issue Dec 5, 2022 · 27 comments · Fixed by #8785
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@alexander-demicev
Copy link
Contributor

We are planning to use CAPD for testing RKE2 bootstrap provider. RKE2 runs a server alongside k8s API server, that server listens on port 9345 for registering new nodes. We'd like to be able to extend/provide a custom HA proxy config in CAPD.
For rke2 it should look something like this:
Template from https://github.com/kubernetes-sigs/cluster-api/blob/main/test/infrastructure/docker/internal/third_party/forked/loadbalancer/config.go +

...
frontend rke2-join
  bind *:9345
  {{ if .IPv6 -}}
  bind :::9345;
  {{- end }}
  default_backend rke2-servers
backend rke2-servers
  option httpchk GET /v1-rke2/readyz
  http-check expect status 403
  {{range $server, $address := .BackendServers}}
  server {{ $server }} {{ $address }} check check-ssl verify none # 9345 port included in address variable
   {{- end}}

A possible option to achieve this is to read the custom template from a config map, process it using ConfigData structure from kind and append result to the existing configuration.
The downside of this approach is that it only works for our specific use case and won't work if someone would like to provide a completely custom ha proxy config.

cc @richardcase @belgaied2

/kind feature

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 5, 2022
@fabriziopandini
Copy link
Member

fabriziopandini commented Dec 5, 2022

If I got it right what we are discussing is to make the CAPD load balancer acting as a load balancer for some other service.
I understand this is convenient, but at the same time it seems to me something out of scope of Cluster API/CAPD.

Have you considered addressing this with ingress/loadbalancer as documented in the kind book, those approaches should work with CAPD too (I'm assuming the service we are talking about is a regular K8s application)? As an alternative, what about having a runtime extension that hooks in the ClusterLife cycle and creates the additional load balancer?

@fabriziopandini
Copy link
Member

Adding
/triage needs-information
While we continue the discussion about what is the right approach here

@k8s-ci-robot k8s-ci-robot added the triage/needs-information Indicates an issue needs more information in order to work on it. label Dec 28, 2022
@fabriziopandini
Copy link
Member

/close
for lack of follow-up.

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini: Closing this issue.

In response to this:

/close
for lack of follow-up.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@alexander-demicev
Copy link
Contributor Author

alexander-demicev commented May 24, 2023

Hey, @fabriziopandini, my apologies for the delayed response. I wanted to clarify that the service we discussed is not a k8s application. In the case of RKE2 control plane nodes, they run a server to register new nodes, listening on port 9345. Meanwhile, the K8S API server runs on port 6443. This means that both ports 6443 and 9345, need to be accessible for other nodes to connect. https://docs.rke2.io/install/requirements?_highlight=9345#inbound-network-rules.
I believe it would be great if CAPD could handle this requirement, as it's an infrastructure provider aimed at offering supporting infrastructure for clusters. I've already opened a similar PR for CAPAhttps://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/4228.
Currently, we're using a modified version of CAPD for testing RKE2, but we would prefer to avoid that and continue utilizing CAPD for testing our provider.

@alexander-demicev
Copy link
Contributor Author

/reopen

@k8s-ci-robot k8s-ci-robot reopened this May 24, 2023
@k8s-ci-robot
Copy link
Contributor

@alexander-demicev: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sbueringer
Copy link
Member

I always assumed that loadbalancers for workload are the responsibility of cloud providers and not of CAPI (as they are implementing Services of type LoadBalancer).

Of coures CAPD is a special case, CAPA isn't though.

@alexander-demicev
Copy link
Contributor Author

@sbueringer I guess it depends on how you define the workload, is it a process that is running on a control plane machine, a containerized application, or both? Because in our case we are talking about a process that runs on VM/server.

@alexander-demicev
Copy link
Contributor Author

alexander-demicev commented May 24, 2023

To add more context, in CAPA we will configure ingress rules in a way that only worker nodes will be able to access the control plane node using this specific port. This is more a cluster infrastructure thing than a cloud provider.

@sbueringer
Copy link
Member

Ah got it. Yeah "server to register new nodes" does not sound like the kind of workload I meant.

For me it sounds fine to provide some sort of extension point to replace / append to the haproxy config.

@killianmuldoon @chrischdi As you're more familiar with our haproxy setup, wdyt?

@killianmuldoon
Copy link
Contributor

Yeah - currently we define it in a go file - here https://github.com/kubernetes-sigs/cluster-api/blob/f78afa6eec25981b2979130c7f88bb8591429ef8/test/infrastructure/docker/internal/loadbalancer/config.go#L42-L41

Could definitely make this a template that could be fed into CAPD at start-time or from a configmap that another process could write to.

@sbueringer
Copy link
Member

sbueringer commented May 25, 2023

I would prefer not having a separate ConfigMap, but I think either some controller arg or a field in the DockerCluster is reasonable.

I guess a field in DockerCluster would be the "normal" way to support it.

@alexander-demicev
Copy link
Contributor Author

alexander-demicev commented May 25, 2023

something like this should work:

...
backend kube-apiservers
  option httpchk GET /healthz
  # TODO: we should be verifying (!)
  {{range $server, $address := .BackendServers}}
  server {{ $server }} {{ $address }} check check-ssl verify none resolvers docker resolve-prefer {{ if $.IPv6 -}} ipv6 {{- else -}} ipv4 {{- end }}
  {{- end}}
{{range $i, $config := .AdditionalConfig}}
frontend additional-config-front-{{ $i }}
  bind *:{{ $config.Port }}
  {{ if $.IPv6 -}}
  bind :::{{ $config.Port }};
  {{- end }}
  default_backend additional-config-back-{{ $i }}
backend additional-config-back-{{ $i }}
  {{range $option := .Options}}
  {{ $option }}
  {{- end}}
  {{range $server, $address := $.BackendServers}}
  server {{ $server }} {{ $address }}:{{ $config.Port }} check check-ssl verify none resolvers docker resolve-prefer {{ if $.IPv6 -}} ipv6 {{- else -}} ipv4 {{- end }}
  {{- end}}
{{- end}}
// ConfigData is supplied to the loadbalancer config template.
type ConfigData struct {
	ControlPlanePort int
	BackendServers   map[string]string
	IPv6             bool
	AdditionalConfig []AdditionalConfig
}

type AdditionalConfig struct {
	Port    int
	Options []string
}

I can open a PR if there are no objections

@chrischdi
Copy link
Member

From my point of view this seems to be similar to what e.g. CAPO provides at its OpenStackCluster.spec.apiServerLoadBalancer.additionalPorts https://github.com/kubernetes-sigs/cluster-api-provider-openstack/blob/main/api/v1alpha7/types.go#L296 .

I did not research in this case what other providers do here / have similar things.

For CAPD I think this could fit well to somewhere in DockerCluster.spec.loadBalancer: https://github.com/kubernetes-sigs/cluster-api/blob/main/test/infrastructure/docker/api/v1beta1/dockercluster_types.go#L53

Something like

type DockerLoadBalancer struct {
	// ImageMeta allows customizing the image used for the cluster load balancer.
	ImageMeta `json:",inline"`
	
	AdditionalPorts []AdditionalPort `json:"additionalPorts"`
}

type AdditionalPort struct {
	Port int `json:"port"`
	Options []string `json:"options"`
}

I think that could be implemented straight forward in the template then and (except if broken options get provided) won't break the haproxy itself.

@fabriziopandini
Copy link
Member

fabriziopandini commented May 27, 2023

If we are going to go down this that I would prefer to keep things as simple as possible and avoiding to model the HA proxy config file into the CAPD API because then there will be chances that we will be asked to add more knobs in the future, which is something that could create noise for a feature that is not core to CAPD.

Let's simply expose a simple plain text that will be appended to the default config + make sure to clearly document the contract about it: e.g. no validation, use at your own risk, etc.

@alexander-demicev
Copy link
Contributor Author

I don't have a strong opinion on this, any approach works for me

@killianmuldoon
Copy link
Contributor

Agreed with Fabrizio - I think this could be accomplished with a plain-text appended to the default config and by reloading that text from a configmap mounted as a volume to CAPD.

@sbueringer
Copy link
Member

I think I would prefer one additional field in DockerCluster vs. a command-line flag + ConfigMap mount with reload

@killianmuldoon
Copy link
Contributor

IMO we don't enable this in CAPI - i.e. we just introduce the flag and let whoever is using the feature downstream manage the configMap mount.

@sbueringer
Copy link
Member

sbueringer commented May 30, 2023

We still have to implement the file reload if the file changes. Adding a field to the DockerCluster CR seems simpler and more consistent with all the other configuration options that we have in CAPD to me.

@killianmuldoon
Copy link
Contributor

I'm fine either way - but definitely leaning toward allowing reloading plain text config so we don't have to consider this down the line for additional haproxy config.

@alexander-demicev
Copy link
Contributor Author

What about a config map reference in CAPD cluster spec and we can allow setting it only during cluster creation? The controller can read it's content and append it to HA proxy config. LB configuration is updated only once per machine https://github.com/kubernetes-sigs/cluster-api/blob/main/test/infrastructure/docker/internal/controllers/dockermachine_controller.go#L270 so there won't be many request to read configmap and no reload will be needed.

@richardcase
Copy link
Member

Does it have to be the append model? What happens if someone wants to change the behaviour of an existing section?

Could we:

  • Use the existing embedded config by default
  • If a reference is supplied to a configmap then this is expected to contain a full template. If you need to change the LB config you know what you are doing.

@fabriziopandini
Copy link
Member

I'm ok with full replacement as well (it all boils up to being very explicit in documenting the contract)

@richardcase
Copy link
Member

I'm ok with full replacement as well (it all boils up to being very explicit in documenting the contract)

100% agree with you @fabriziopandini that we need to be explicit in documenting the contract. I can help in this area if any extra help is required (@alexander-demicev feel free to ping me).

We've been working on an issue with the RKE2 control plane provider where we need a full replacement (instead of having to use a fork). The append only model wouldn't work for our scenario.

To provide the specifics, we generally make these changes:

  • in the health check for the apiserver we need to check for a specific status code, so we have to replace the existing backend kube-apiservers so it includes http-check expect status.
  • add new frontend and backend sections for the RKE2 registration port
  • add new frontend stats to enable the haproxy stats UI

@fabriziopandini
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants