Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPEDGE-1147: feat!: init kubesan csi driver integration proposal #1620

Merged
merged 12 commits into from
Jul 16, 2024

Conversation

jakobmoellerdev
Copy link
Contributor

@jakobmoellerdev jakobmoellerdev commented May 2, 2024

KubeSAN is a CSI plugin for Kubernetes that enables you to provision Block volumes
backed by a single, cluster-wide, shared block device (e.g., a single big LUN on a SAN).

Logical Volume Manager Storage (LVMS) uses the TopoLVM CSI driver to dynamically provision local storage on the OpenShift Container Platform clusters.

This proposal is about integrating the Subprovisioner CSI driver into the LVMS operator to enable the provisioning of
shared block devices on the OpenShift Container Platform clusters.

This enhancement will significantly increase scope of LVMS, but allows LVMS to gain the unique value proposition
of serving as a valid layered operator that offers LUN synchronization and provisioning capabilities.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 2, 2024
Copy link
Contributor

openshift-ci bot commented May 2, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

[Subprovisioner](https://gitlab.com/subprovisioner/subprovisioner)
is a CSI plugin for Kubernetes that enables you to provision Block volumes
backed by a single, cluster-wide, shared block device (e.g., a single big LUN on a SAN).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth calling out that subprovisioner actually takes a shared Volume Group, and that it is therefore up to the cluster admin whether to create that VG on top of a single LUN or alternatively to create the VG by concatenating several smaller LUNs each serving as an lvm PV?

For now, subprovisioner does expect that the entire VG is visible from each node; I don't know if lvm supports a degraded-mode shared VG where only portions of the VG are usable from a given node provided that the shared global lockspace is still visible to all nodes, but if we ever need to get into that level of granularity, the complexity multiplies. So sticking with the simpler "big LUN on a SAN as a single VG" is good enough.

(Alas - the confusion between lvm Physical Volume and kubernetes Persistent Volume will make it harder to use the term PV without qualification)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth calling out that subprovisioner actually takes a shared Volume Group, and that it is therefore up to the cluster admin whether to create that VG on top of a single LUN or alternatively to create the VG by concatenating several smaller LUNs each serving as an lvm PV?

This abstraction is something we should take away from the admin. We will likely have a 90% case (e.g. one big LUN but I dont know) that we should default to and allow a reconfiguration to smaller LUNs manually. Thats how most OpenShift APIs are designed.

This concurs with

So sticking with the simpler "big LUN on a SAN as a single VG" is good enough.

(Alas - the confusion between lvm Physical Volume and kubernetes Persistent Volume will make it harder to use the term PV without qualification)

for LVMS users, a PV is always a PersistentVolume (in our docs), whereas lvm2 physical volumes always get written out. This Enhancement is kind of hacked so if you spot any consistencies please point them out for correction!

- Create an enhancement to the "LVMCluster" CRD that is able to differentiate a deviceClass into a new
type of shared storage that can be provisioned side-by-side or in alternative to regular LVMS device-classes managed by TopoLVM.
- Create a productization for a LUN-backed CSI driver alternative to TopoLVM that allows for shared vg usage, especially in the context of virtualization.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. While subprovisioner itself wants a VG, I could see it being useful at the LVMS layer to want a bare block device visible to all nodes in the cluster, which LVMS will then format into the shared VG that subprovisioner wants. That is, while subprovisioner doesn't want to create the VG, LVMS might be the right place to set it up so that the cluster admin doesn't have to do any work for the VG either because LVMS does it under the hood.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LVMS itself consists of 3 layers

  1. The Operator handling the LVMCluster and deploying high-level resources
  2. The vgmanager, operating on each node and preparing the nodes for the CSI driver where necessary
  3. The CSI driver, which takes a configuration prepared by operator+vgmanager and reacts to CSI calls made

For this setup, subprovisioner would need to be able to take care of formatting calls, because Filesystem Formatting on Mount is handled in the CSI driver and the operator or vgmanager cannot know when formatting / wiping must be run.


type ThinPoolConfig struct {
// Name of the thin pool to be created. Will only be used for node-local storage,
// since shared volume groups will create a thin pool with the same name as the volume group.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subprovisioner creates its own thin pools within the shared volume group, but it is more along the lines of one thin pool per PVC rather than one for the entire volume group. Is this comment trying to give too much internal detail about what subprovisioner will do, in relation to the higher-level notion that a client of LVMS merely needs to choose between local (existing TopoLVM) and shared (new subprovisioner) implementations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would eventually also result in #1620 (comment)


// NodeSelector chooses nodes on which to create the deviceclass
// +optional
NodeSelector *corev1.NodeSelector `json:"nodeSelector,omitempty"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will node selection work properly with the Subprovisioner CSI plugin? My understanding is that TopoLVM has implemented support to make pod scheduling work when volumes are available only on specific nodes. Subprovisioner has no scheduler awareness and just runs a CSI Node Plugin DaemonSet on all nodes, expecting any node to be able to run pods that access any volume.

I see three possibilities:

  1. NodeSelector already works with Subprovisioner, somehow. There's no work to do.
  2. NodeSelector doesn't work with Subprovisioner and must be nil for now.
  3. NodeSelector doesn't work with Subprovisioner yet but we figure out how to make Subprovisioner support it, so this syntax is allowed with Subprovisioner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way to implement this is to implement a Node accessibility constraint in the CSI driver and to start supporting node topologies via CSI and at the same time use the node selector to only deploy on the nodes we want.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that users would want the ability to use a subset of nodes, so I guess Subprovisioner should implement this.

@stefanha
Copy link

Looks good to me.

@jakobmoellerdev jakobmoellerdev changed the title [WIP] feat!: init subprovisioner csi driver integration proposal [WIP] OCPEDGE-1107: feat!: init subprovisioner csi driver integration proposal Jun 3, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 3, 2024

@jakobmoellerdev: This pull request references OCPEDGE-1107 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the spike to target the "4.17.0" version, but no target version was set.

In response to this:

Subprovisioner is a CSI plugin for Kubernetes that enables you to provision Block volumes
backed by a single, cluster-wide, shared block device (e.g., a single big LUN on a SAN).

Logical Volume Manager Storage (LVMS) uses the TopoLVM CSI driver to dynamically provision local storage on the OpenShift Container Platform clusters.

This proposal is about integrating the Subprovisioner CSI driver into the LVMS operator to enable the provisioning of
shared block devices on the OpenShift Container Platform clusters.

This enhancement will significantly increase scope of LVMS, but allows LVMS to gain the unique value proposition
of serving as a valid layered operator that offers LUN synchronization and provisioning capabilities.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 3, 2024
@jakobmoellerdev jakobmoellerdev changed the title [WIP] OCPEDGE-1107: feat!: init subprovisioner csi driver integration proposal [WIP] OCPEDGE-1147: feat!: init subprovisioner csi driver integration proposal Jun 18, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 18, 2024

@jakobmoellerdev: This pull request references OCPEDGE-1147 which is a valid jira issue.

In response to this:

Subprovisioner is a CSI plugin for Kubernetes that enables you to provision Block volumes
backed by a single, cluster-wide, shared block device (e.g., a single big LUN on a SAN).

Logical Volume Manager Storage (LVMS) uses the TopoLVM CSI driver to dynamically provision local storage on the OpenShift Container Platform clusters.

This proposal is about integrating the Subprovisioner CSI driver into the LVMS operator to enable the provisioning of
shared block devices on the OpenShift Container Platform clusters.

This enhancement will significantly increase scope of LVMS, but allows LVMS to gain the unique value proposition
of serving as a valid layered operator that offers LUN synchronization and provisioning capabilities.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jakobmoellerdev jakobmoellerdev changed the title [WIP] OCPEDGE-1147: feat!: init subprovisioner csi driver integration proposal OCPEDGE-1147: feat!: init subprovisioner csi driver integration proposal Jun 19, 2024
@jakobmoellerdev jakobmoellerdev marked this pull request as ready for review June 19, 2024 07:34
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 19, 2024
@jakobmoellerdev jakobmoellerdev changed the title OCPEDGE-1147: feat!: init subprovisioner csi driver integration proposal OCPEDGE-1147: feat!: init kubesan csi driver integration proposal Jun 20, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 20, 2024

@jakobmoellerdev: This pull request references OCPEDGE-1147 which is a valid jira issue.

In response to this:

KubeSAN is a CSI plugin for Kubernetes that enables you to provision Block volumes
backed by a single, cluster-wide, shared block device (e.g., a single big LUN on a SAN).

Logical Volume Manager Storage (LVMS) uses the TopoLVM CSI driver to dynamically provision local storage on the OpenShift Container Platform clusters.

This proposal is about integrating the Subprovisioner CSI driver into the LVMS operator to enable the provisioning of
shared block devices on the OpenShift Container Platform clusters.

This enhancement will significantly increase scope of LVMS, but allows LVMS to gain the unique value proposition
of serving as a valid layered operator that offers LUN synchronization and provisioning capabilities.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@suleymanakbas91
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 25, 2024

- **Downgrade**:
- Allow safe downgrades by maintaining backward compatibility. Downgrading from a kubesan enabled version to a purely topolvm enabled version should be a no-break operation for the topolvm part. For the kubesan part, the operator should ensure that the shared VGs can be cleaned up manually
- Provide rollback mechanisms and detailed instructions to revert to previous versions. Ensure that downgrades do not result in data loss or service interruptions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we had a more detailed description of how data loss during rollbacks will be avoided (this is just stating that we will do so). I don't think we necessarily need to have a complete design in this EP, but I think that having a complete story for rollback and DR should be part of the graduation criteria for GA.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a rollback routine along with both automated as well as manual expected rollback procedures and impacts on data in the design details section.


#### Hypershift / Hosted Control Planes

N/A
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this technology might be useful for Hypershift. Aside from the additional complexity involved, is there any reason that it's excluded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think I follow. Hypershift as of now uses LVMS for its etcd, however the only way a shared VG would be useful would be if Hypershift would be deployed on a cluster where etcd would be stored on a LUN, which is kind of counter intuitive to having a distributed database in the first place.

Could you elaborate what you think would be useful for Hypershift?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If youre talking about having local storage access on all virtualized nodes, this is true for any ReadWriteMany CSI drivers, however it doesnt really impact the design of Hypershift in any way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where etcd would be stored on a LUN

Off the top of my head, a SAN might have performance, availability, or cost characteristics which make it more attractive than local storage. Local storage is probably going to win on performance if cost is no object, but a customer already has a high performance SAN, I can see that being an option they might be interested in considering if we made it available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At that point in time, Hypershift becomes yet another cluster for us, so I dont think this will change the design. Of course we can notify them that this is coming but I dont think they need special consideration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess to narrow this down: What would you expect me to add to this section? Possible benefits for usage in Hypershift?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked the hypershift team for their opinion on slack, but I haven't received a response. I don't think this enhancement should be blocked on that though.

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 11, 2024
@jakobmoellerdev
Copy link
Contributor Author

/test markdownlint

@suleymanakbas91
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 16, 2024
@jerpeter1
Copy link
Member

/approve

Copy link
Contributor

openshift-ci bot commented Jul 16, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jerpeter1

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 16, 2024
Copy link
Contributor

openshift-ci bot commented Jul 16, 2024

@jakobmoellerdev: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 2d2b025 into openshift:master Jul 16, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants