Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨Make control plane port configurable in CAPD #7548

Merged
merged 1 commit into from
Jan 2, 2023
Merged

✨Make control plane port configurable in CAPD #7548

merged 1 commit into from
Jan 2, 2023

Conversation

alexander-demicev
Copy link
Contributor

@alexander-demicev alexander-demicev commented Nov 16, 2022

What this PR does / why we need it:
Currently, CAPD exposes the host and port of the control plane endpoint but doesn't leverage them.
I made 6443 a default port to keep the behavior consistent.

cc @richardcase @belgaied2

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 16, 2022
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Nov 21, 2022
@alexander-demicev
Copy link
Contributor Author

@sbueringer I updated the tests and all worked with "" host value

Copy link
Member

@sbueringer sbueringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, last nit from my side.

Let's run pull-cluster-api-e2e-full-main and pull-cluster-api-e2e-workload-upgrade-1-24-latest-main afterwards

@alexander-demicev
Copy link
Contributor Author

/test pull-cluster-api-e2e-full-main
/test pull-cluster-api-e2e-workload-upgrade-1-24-latest-main

@alexander-demicev
Copy link
Contributor Author

@sbueringer any chance the pull-cluster-api-e2e-full-main failures might be not related to the PR?

@sbueringer
Copy link
Member

In general, yes.

If you run into the same issue 2-3 times it's very unlikely. Let's see

/test pull-cluster-api-e2e-full-main

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 22, 2022
@@ -85,6 +88,8 @@ spec:
# host.docker.internal is required by kubetest when running on MacOS because of the way ports are proxied.
certSANs: [localhost, 127.0.0.1, 0.0.0.0, host.docker.internal]
initConfiguration:
localAPIEndpoint:
bindPort: 7777
Copy link
Member

@fabriziopandini fabriziopandini Nov 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be I'm missing something 😅
I was assuming we want to make the CP port configurable, which in my mind is the port the LB balancer frontend answer to. But this instead is changing the port the CP backends answer to, which is one layer down the stack

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point. The DockerCluster APIEndpoint port is definitely the external port on the lb.

Not sure if we should also modify the port the apiserver is listening on.

I just remember that the whole situation around the apiserver port is messy and differently implemented across infra providers (if I understood it correctly). xref: #6272

Maybe we should only modify the lb port here and just keep the apiserver port the same to keep it simpler?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to keep the apiserver port the same to keep it simpler (unless there is a strong requirement to make it configurable)

@alexander-demicev
Copy link
Contributor Author

/test pull-cluster-api-e2e-full-main

@alexander-demicev
Copy link
Contributor Author

alexander-demicev commented Nov 24, 2022

@sbueringer @fabriziopandini The tests don't pass if bindPort: 7777 isn't specified but there is another issue. Some tests like KCP adoption require additional configuration to work with a custom port, I was able to make it pass but it seems that it's much easier to have a separate E2E test for the custom port so I added it. Let me know what you think, in my opinion, a separate test is easier than modifying a bunch of existing tests.

@sbueringer
Copy link
Member

sbueringer commented Nov 25, 2022

I'm not sure if we want an additional e2e test for this very-CAPD specific feature that doesn't cover any code that also gives us coverage for things provider are doing (or not significant enough coverage to warrant a separate test). That is essentially the main reason why CAPD exists.

Our full e2e tests are already up to 1,5 hours (we were at 50-55 minutes a few months back afaik) before this PR which is starting to become a significant problem in my opinion (more resources required and when you need the full e2e tests you have to wait forever).

We could instead use the custom port in e.g. the ignition or ipv6 template. They are only used in the quickstart test. This should have relatively low impact. There are a few alternatives to that by adding it to other templates, but I think that's the easiest option.

But let's see what Fabrizio thinks about it.

@fabriziopandini
Copy link
Member

Trying to catch up...
If I got it right from https://github.com/kubernetes-sigs/cluster-api/pull/7548?notification_referrer_id=NT_kwDOAEx-RLI0ODUxMTU5MTUxOjUwMTMwNjA&notifications_query=is%3Aunread+repo%3Akubernetes-sigs%2Fcluster-api#pullrequestreview-1190721569, there is an agreement on scoping down this PR in order to make only the load balancer address configurable, not the API server port.
If this is correct, probably the need for a custom test for the localAPIEndpoint: 7777 naturally goes away.

Taking a step back, I agree with @sbueringer we should avoid adding a dedicated E2E test for a CAPD-specific feature, a unit/integration test should be enough (but at the same time not blocking IMO)

@alexander-demicev
Copy link
Contributor Author

/test pull-cluster-api-e2e-full-main

@alexander-demicev
Copy link
Contributor Author

All should be fixed now, can you take a look again @sbueringer @fabriziopandini?

HostPort: port,
ContainerPort: KubeadmContainerPort,
HostPort: hostPort,
ContainerPort: cpPort,
Copy link
Member

@sbueringer sbueringer Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry probably a bit late for this question. What ports are we actually changing with this PR?

It looks to me like:

  • we are changing the internal port of the haproxy (while the hostPort is still random)
  • we are also changing this port mapping here which means the still random hostPort is now mapping to a changed internal port? (but what is listening on this port?)

I"m probably misinterpreting a few things. What's your point of view looking at haproxy (including its config) + control plane nodes + their corresponding port mappings? Which ports through the entire chain are now controlled with the DockerCluster controlPlane endpoint port setting?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RKE2 has to open a specific port for cluster initialization: 9345. This port is used by the first control plane node, called server, to listen to new node registrations.
The way CAPD is implemented right now prevents our bootstrap provider to register new nodes after the first one. CAPD would need to be configurable for the ports that need to be open.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about it more, I would prefer to only configure the frontend haproxy port. Seems like changing the port mapping on the control plane node might not be necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@belgaied2 Actually we also want to be able to append additional haproxy configuration here https://github.com/kubernetes-sigs/cluster-api/blob/main/test/infrastructure/docker/internal/third_party/forked/loadbalancer/config.go#L58
According to the docs, the RKE2 server agent runs simultaneously with the k8s API server. This means we need to expose 2 ports - 9345 and 6443.

After more investigation, this PR makes less sense for RKE2 provider but the spec.ControlPlaneEnpoint field is still not usable in CAPD.

Copy link
Member

@sbueringer sbueringer Dec 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm absolutely open to useful improvements. I just realized that I don't really understand which port or port mapping we now can configure :)

Some sort of illustration would be great to understand what the plan is, e.g.

Components: kubectl (e.g.)  =>    ha proxy (lb)        =>   backends (control plane nodes)
Ports & port mappings:            hostport : ha proxy port  hostport : apiserver port
Example (main version):           39433 (random) : 6443     40343 (random) : 6443
Corresponding API fields: ??

(not sure if it's a good/correct way to illustrate it)

Bonus info, the haproxy config today uses a config like this:

frontend control-plane
  bind *:6443

  default_backend kube-apiservers
backend kube-apiservers
  ...
  server capi-quickstart-control-plane-jnm6j 172.18.0.5:6443 check check-ssl verify none

(which means it only uses "internal" ports and 172.18.0.5 which is the internal Node IP and the control plane container IP)

I think usually controlPlaneEndpoint host/port is what stands for the endpoint reachable by consumers (e.g. kubectl). So this would be the hostport of the haproxy (?).
So it's probably good to keep this consistent. If there are reasons to change other ports we can discuss them and probably find a way to make them configurable as well.

(cc @fabriziopandini)

Copy link
Member

@sbueringer sbueringer Dec 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl;dr

I was thinking about it more, I would prefer to only configure the frontend haproxy port. Seems like changing the port mapping on the control plane node might not be necessary.

I agree (at least it seems like that's what the controlPlaneEndpoint should control)

This means we need to expose 2 ports - 9345 and 6443.

Yup sounds like you want to be able to expose an additional port in addition to 6443 (let's open a separate issue for that?)

Copy link
Member

@fabriziopandini fabriziopandini Dec 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have commented on #7684 about using the CAPD load balancer for other services different than the controlPlane load balancer.

If you want we can keep this PR for making the only the load balancer port configurable or close it, I leave this to you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabriziopandini @sbueringer I pushed some changes here, now only lb port is configurable

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 7, 2022
Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reworking the PR
A couple of nits, otherwise lgtm for me

@alexander-demicev
Copy link
Contributor Author

@fabriziopandini I updated the PR, thanks for the review

@fabriziopandini
Copy link
Member

/lgtm
over to @sbueringer for a final pass

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 30, 2022
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: b974b02a4f413204ed28e206903d398fb7779587

@sbueringer
Copy link
Member

Thank you very much!!

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbueringer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 2, 2023
@k8s-ci-robot k8s-ci-robot merged commit 06c6ea7 into kubernetes-sigs:main Jan 2, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.4 milestone Jan 2, 2023
@alexander-demicev alexander-demicev deleted the controlplaneport branch January 2, 2023 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants