Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement OpenStack Cloud Controller Manager support #423

Closed

Conversation

Fedosin
Copy link
Contributor

@Fedosin Fedosin commented Aug 6, 2020

No description provided.

@Fedosin
Copy link
Contributor Author

Fedosin commented Aug 6, 2020

/assign @enxebre


**Note:** Example of manual testing: https://asciinema.org/a/303399?speed=2

#### Write Cluster Cloud Controller Manager Operator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which team should be owning this long term? Would it make sense for the cloud-team to own this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the cloud team. Because I expect there will be more cloud controller manager for other platforms.
But OpenStack team can maintain some related submodules by placing OWNERS file in appropriate places.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a clear understanding of how the code structure will be laid out to support generic CCCMO vs platform specific so that we can properly partition the work and have the correct OWNERS file?

@derekwaynecarr
Copy link
Member

Can you share any thoughts on the intersection of this component with the Infrastructure type?
https://github.com/openshift/api/blob/master/config/v1/types_infrastructure.go#L25

I assume there is no additional configurability needed for OpenStack?


Actions:

- Make sure Cinder CSI driver is supported in OpenShift.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this CSI is related to CCM? seems those are different operator/repo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we use in-tree cloud provider, that manages Cinder volumes as well. Cloud controller manager can't do it, since this functionality was moved to Cinder CSI driver. So, to maintain feature parity with the in-tree cloud provider we need to enable Cloud Controller Manager and deploy Cinder CSI driver.


- Manually install CCM's [daemonset](https://github.com/kubernetes/cloud-provider-openstack/blob/master/manifests/controller-manager/openstack-cloud-controller-manager-ds.yaml) on a working OCP cluster deployed on OpenStack.

- Update configuration of `kube-apiserver`, `kube-controller-manager` and `kubelet` by replacing `--cloud-provider openstack` with `--cloud-provider external` and removing `--cloud-config` parameters.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this operator installed optionally or mandatory?
I think you need have CCM running in order to make node ready, which meaning this is not installed from OperateHub?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The operator will be installed mandatory. But depending on the platform, it will either deploy Cloud Controller Manager or not.

@Fedosin Fedosin force-pushed the openstack-cloud-controller-manager branch from b048501 to 8f8c198 Compare September 2, 2020 13:18
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Fedosin
To complete the pull request process, please assign enxebre
You can assign the PR to them by writing /assign @enxebre in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment


- Implement the operator, using [library-go](https://github.com/openshift/library-go) primitives.

#### Build the operator image by OCP automation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know which group in OCP will be responsible for the image building automation? Have we engaged them yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ART team, because it will be a cluster operator. They are not involved yet.


#### Write Cluster Cloud Controller Manager Operator

The operator should be able to create, configure and manage different CCMs (including one for OpenStack) in OpenShift.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these operations the scope of the unit/CI tests? Are there more/others?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main goal is to reach the feature parity with the in-tree cloud provider. So, if e2e-openstack tests pass, then we are good.
For sure, unit tests for the operator are required.

Comment on lines 124 to 126
- Add CCM operator support in Cluster Version Operator. CCM operator will be deployed on early stages of installation and then it deploys and configures OpenStack CCM itself.

- Change Kubelet configuration for OpenStack in [Machine Config Operator templates](https://github.com/openshift/machine-config-operator/blob/d044c74ea4b9900c269ee8de8131ed49ba6fedc8/templates/master/01-master-kubelet/openstack/units/kubelet.service.yaml#L30) to adopt external cloud provider.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have someone from CVO and MCO review?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, 3CMO (cluster cloud controller manager operator) will be managed by CVO so their opinion is really appreciated.

There won't be so much interaction with MCO, I believe. We will need to change just one line of code there:
https://github.com/openshift/machine-config-operator/blob/master/templates/master/01-master-kubelet/openstack/units/kubelet.service.yaml#L30
But the more people look, the better :)

@enxebre
Copy link
Member

enxebre commented Sep 17, 2020

A lot of the transition for all out of tree providers is being worked with detail here #463. I'd suggest we conflate any Openstack specific info into 463 and close this PR to avoid confusion and scatter knowledge for the same topic. We can then later in time follow up on 463 as we add more providers. cc @Danil-Grigorev @Fedosin

A: If CCM is not available, new nodes in the cluster will be left unschedulable.

Q: Who talks to CCM?
A: From architectural point of view, only Kube API server is able to communicate with CCM. [Source](https://kubernetes.io/docs/concepts/architecture/cloud-controller/#design). All requests to CCM are sent by `kubelet`. KCM doesn't interract with CCM.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kube-apiserver or kubelet (have not yet read the link) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kube-apiserver as written below.

A: From architectural point of view, only Kube API server is able to communicate with CCM. [Source](https://kubernetes.io/docs/concepts/architecture/cloud-controller/#design). All requests to CCM are sent by `kubelet`. KCM doesn't interract with CCM.

Q: Does CCM provide an API? How is it hosted? HTTPS endpoint? Through load balancer?
A: CCM provides a gRPC interface on port [10258](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/cloud-provider/ports.go#L22). Like KCM, CCM supports high available setup using leader election (which is on by default). [Source](https://kubernetes.io/docs/tasks/administer-cluster/running-cloud-controller/#requirements).
Copy link
Contributor

@sttts sttts Sep 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this means that clients of ccm (kubelet? kube-apiserver ?) have to find the leader? I.e. you need a load balancer? Or is it a kubernetes service?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the former case you have to integrate with the cloud API LB, and also integrate with MCO and it gcp/azure-routes.sh mechanism. Those might even be duplicated because API server and ccm life-cycles don't match.

In the latter case, you depend on SDN. What would that mean? No scheduling, no SDN. No SDN, no ccm, no scheduling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you tell kube-apiserver about all the IPs similarly to etcd-operator today. Grpc can handle multiple endpoints and find one that works.

A: CCM provides a gRPC interface on port [10258](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/cloud-provider/ports.go#L22). Like KCM, CCM supports high available setup using leader election (which is on by default). [Source](https://kubernetes.io/docs/tasks/administer-cluster/running-cloud-controller/#requirements).

Q: What are the thoughts about certificate management?
A: Certificate management should be performed by the CCM operator, like it's done in the KCM operator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The consumers of ccm have to know the CA. How should that work? Note, that kcm has no API beyond the metrics and health endpoints, i.e. a very different and simpler setup. What you describe sounds more like kube-apiserver.

A: No. Control plane nodes only. [Source](https://kubernetes.io/docs/concepts/overview/components/#cloud-controller-manager)

Q: How does SDN depend on CCM?
A: Most likely they are not related.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the LB requirement above.


2. How to migrate PVs created by the in-tree cloud provider?
[CSIMigration](https://kubernetes.io/blog/2019/12/09/kubernetes-1-17-feature-csi-migration-beta/) looks like the best option, especially if it is GA in 1.19.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need a section about bootstrapping. Probably we need a bootstrap instance of ccm.

@Fedosin Fedosin force-pushed the openstack-cloud-controller-manager branch from 994cf2b to 9c7789e Compare September 29, 2020 10:28
@Fedosin
Copy link
Contributor Author

Fedosin commented Sep 29, 2020

I close this PR now in favor of #463

@Fedosin Fedosin closed this Sep 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants