Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC: Move cluster-specific code out of the manager #950

Conversation

alvaroaleman
Copy link
Member

This PR is a POC that aims to improve the multi cluster story with controller-runtime. While multi-cluster controllers are already possible today, they are awkward, because they require to either:

  • Construct one manager.Manager per cluster and add all of them except one to the main manager.Manager
  • Add a container type that holds client and cache for each additional cluster and basically copies the initialization logic from the manager for those, then make that container type a Runnable and add it to the manager

Additionally, startup is incorrect as we don't start secondary caches ahead of time.

So the goal here is basically:

  • Don't change anything for the single-cluster controller usecase
  • Do not encourage more than one manager for the multi-cluster usecase
  • Make sure that startup works correctly and caches get started before anything else

The commits are:

  1. Move cluster-specific code from pkg/manager to pkg/clusterconnector (Better name ideas welcome!)
  2. Make sure the manager always starts caches before anything else
  3. Add an example multicluster controller

The PR is huge because there was so much code to move. The most important parts to look at are:

  • pkg/manager.Manager that now embedds the cluster-specifics
  • pkg/clusterconnector.Clusterconnector which contains the cluster-specifics
  • examples/multicluster/reconciler.Add to see how such a multi cluster controller looks like

If we can agree on the idea, I will create separate PRs for each of the commits.

Would love to hear opinions on this!

/assign @estroz @DirectXMan12 @vincepri @alexeldeib

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 15, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: alvaroaleman
To complete the pull request process, please assign vincepri
You can assign the PR to them by writing /assign @vincepri in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 15, 2020
@k8s-ci-robot
Copy link
Contributor

@alvaroaleman: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-controller-runtime-apidiff-master aede84b link /test pull-controller-runtime-apidiff-master

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@kramerul
Copy link
Contributor

We integrated your POC into our controller and found it quite helpful to have support for additional cluster connections without creating additional managers.

  • The required changes were quite small (about 30 lines of code)
  • Your POC would be even better if the API would be designed in a way that would reflect the real world. In your example, the controller has connections to 2 Clusters. Therefore, I would expect to have 2 clusterconnector.ClusterConnector. But there is one ctrl.Manager and one clusterconnector.ClusterConnector. Perhaps it would be possible to introduce some kind of a ctrl.BaseManager without a connection. This ctrl.BaseManager could have functions to open new connections to cluster.

image

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 21, 2020
@k8s-ci-robot
Copy link
Contributor

@alvaroaleman: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vincepri
Copy link
Member

/assign

@mengqiy
Copy link
Member

mengqiy commented Jul 22, 2020

After seeing more and more use cases similar to #745 (comment), IMO this proposal is useful and promising. I haven't really dived into the implementation details, but I'd like to see this get finalized and merged.

@alvaroaleman
Copy link
Member Author

@mengqiy so what do you think is the best way forward for this, should I write a design doc that outlines the changes to make it more digestible?

@vincepri
Copy link
Member

+1 to have a design doc, it'd also be helpful to visualize and document the structure of the project when we merge this in

@mengqiy
Copy link
Member

mengqiy commented Jul 23, 2020

You may want to write a design doc in the designs folder.

@DirectXMan12 @joelanford @estroz Does anyone have concerns or objection?

@estroz
Copy link
Contributor

estroz commented Jul 23, 2020

@mengqiy sgtm, will take time to review the doc once its up. Great stuff so far.

@alvaroaleman
Copy link
Member Author

Created a PR for a design doc in #1075

@alvaroaleman
Copy link
Member Author

Closing as the design doc got merged and describes what we want to do here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants