Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for growing FlexVolume size. #1700

Merged
merged 1 commit into from
Jun 27, 2018

Conversation

xingzhou
Copy link
Contributor

Proposal for growing FlexVolume size, the feature ticket is at:
kubernetes/enhancements#304

The original google doc for this proposal is at:
https://docs.google.com/document/d/1dwom9xQ3Fg5F_jJrybr0slp-QsO3_CKiisxzNRoMhec/edit?usp=sharing

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 29, 2018
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 29, 2018
@k8s-github-robot k8s-github-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Jan 29, 2018
@cblecker
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 29, 2018
@gnufied
Copy link
Member

gnufied commented Feb 1, 2018

/assign

@k8s-github-robot k8s-github-robot added the kind/design Categorizes issue or PR as related to design. label Feb 6, 2018
@saad-ali
Copy link
Member

/assign @chakri-nelluri @verult

@xingzhou
Copy link
Contributor Author

@gnufied , @chakri-nelluri and @verult , any updates or comments on this proposal?

@gnufied
Copy link
Member

gnufied commented Mar 2, 2018

I have already reviewed this when it was in Google doc. Looks good from resizing API perspective.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 2, 2018
@gnufied
Copy link
Member

gnufied commented Mar 2, 2018

cc @chakri-nelluri @verult to review flex API.

@verult
Copy link
Contributor

verult commented Mar 7, 2018

Adding a separate Expand capability might not be necessary, and because of that the new plugin objects might not be necessary, either.

The reason why we added Attach is because we don't want to require users to install their plugin in master if it doesn't perform attach. Without the capability field, all plugins were assumed to be attachable, so kubelet waits for attach to complete before mounting, but attach fails because the driver doesn't exist in master.

On the other hand, I believe it's OK if we assume all drivers support resize. If the driver without resize support is absent in master, the Expand Controller's call to find expandable plugin fails, and the controller throws an error that it can't find a matching plugin, as it should. Kubelet will take no action on a resize call because PVC's capacity status hasn't changed. I'm not super familiar with the volume resize design so please verify.

As long as the failure of Expand Controller doesn't block anything else, we are OK. We'd need something like attacher-defaults.go for resize, though.

@xingzhou
Copy link
Contributor Author

xingzhou commented Mar 8, 2018

thanks for the review.
For capability Expand , I added it for keeping consistency with Attach capability and with the new plugin type, we don't need to add methods to the old plugin objects to enable resize. I think it can be removed, but if we remove this capability, we are going to use the following work flow:

  1. If flex volume type does not support resizing, the resize function will fail at ExpandVolumeDevice call and flex volume should directly throw out error to stop the following resize actions.

  2. If flex volume type support resizing, we need to install the volume driver on the master node(and also, might need to install on kubelet node, depends on the result of RequiresFSResize call), so that volume expand controller can call RequiresFSResize to decide whether we need to do ExpandFS call on kubelet later.

  3. If we do not install volume driver on master node, expand volume controller can not find any volume plugins to handle the resize request and therefore the resize will fail.

Please correct me if I'm wrong

@verult
Copy link
Contributor

verult commented Mar 8, 2018

Yeah I'm with you for the workflow you described. For (1) I was originally thinking of having something like attacher-defaults.go, but that doesn't work because resize will proceed as if an actual resize happened. I agree with your proposal.

Having Expand as another capability is nice as it clearly indicates which Flex drivers are expandable to Kubernetes, but we end up with a combinatoric explosion of plugin types the Flexvolume plugin has to support (e.g. AttachableExpandableFlexPlugin etc).

@verult
Copy link
Contributor

verult commented Mar 8, 2018

@xingzhou Are there specific storage systems (or anything else that uses a Flexvolume driver) that requires ExpandFS? Just trying to understand why resizefs isn't sufficient

@xingzhou
Copy link
Contributor Author

xingzhou commented Mar 9, 2018

AFAICS, some scenarios, like local storage resizing, instead of resizing on master node, require to install flex driver on worker node and do the resizing work there. So by using ExpandFS, it provides a chance for kubelet to call volume driver to do volume resizing + FS resizing on worker node. To me, it's like attach & mount, as we can also do the volume attach in the mount operation.

In addition, in our (IBM) case, we also want to resize volumes on worker node, because while resizing the volumes, we have to read some configurations which resides on the worker node.

Please also refer to @gnufied comments here

@verult
Copy link
Contributor

verult commented Mar 9, 2018

Thanks for clarifying. I have some more thoughts about ExpandFS, but I'll add them in the dedicated issue.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 15, 2018
@xingzhou
Copy link
Contributor Author

Have updated the doc to remove the Expand capability, please take a look, thanks!

Copy link
Contributor

@verult verult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor comments, LGTM otherwise


`ExpandFS` will call underneath volume driver `expandfs` method to finish FS resize. The sample code looks like:
```
func (plugin *flexVolumeExpandablePlugin) ExpandFS(spec *volume.Spec, newSize resource.Quantity, oldSize resource.Quantity) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too


A sample implementation of `ExpandVolumeDevice` method is like:
```
func (plugin *flexVolumeExpandablePlugin) ExpandVolumeDevice(spec *volume.Spec, newSize resource.Quantity, oldSize resource.Quantity) (resource.Quantity, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*flexVolumePlugin

@chakri-nelluri
Copy link
Contributor

Thanks @xingzhou added few comments. Please feel free to ping me on slack, I will do a quick review.

Proposal for growing FlexVolume size, the feature ticket is at:
kubernetes/enhancements#304

The original google doc for this proposal is at:
https://docs.google.com/document/d/1dwom9xQ3Fg5F_jJrybr0slp-QsO3_CKiisxzNRoMhec/edit?usp=sharing
@xingzhou
Copy link
Contributor Author

reviewers, please take a look at the latest version

@xingzhou
Copy link
Contributor Author

@verult and @chakri-nelluri, do you think we can get this proposal merged?


#### RequiresFSResize

`RequiresFSResize` is a method to implement `ExpandableVolumePlugin` interface. The return value of this method identifies whether or not a file system resize is required once physical volume get expanded. If the return value is `true`, PV resize controller will consider the volume resize operation is done and then update the PV object’s capacity in K8s directly; If the return value is `false`, PV resize controller will leave kubelet to do the file system resize, and kubelet on worker node will call `ExpandFS` method of FlexVolume to finish the file system resize step(at present, only offline FS resize is supportted, online resize support is under community discussion [here](https://github.com/kubernetes/community/pull/1535)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"If the return value is false, PV resize controller will consider the volume resize operation is done..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my mistake, need to exchange the behavior of the two return values. I'll update this in the next patch update.

@verult
Copy link
Contributor

verult commented Apr 10, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 10, 2018
@verult
Copy link
Contributor

verult commented Apr 10, 2018

One more thing: is the default value of requiresFSResize capability false? It shouldn't be a required field


### Admission Control Changes

Whether or not a specific volume plugin supports volume expansion is validated and checked in PV resize admission plugin. In general, we can list FlexVolume as the ones that support volume expansion and leave the actual expansion capability check to the underneath volume driver when PV resize controller calls the `ExpandVolumeDevice` method of FlexVolume.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that we will end up accepting a volume resize request at the Kubernetes API layer, only to possibly later discover (when ExpandVolumeDevice is executed) that the volume is not re-sizable. Leaving it in a bad state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like currently the resize admission controller has a hard-coded list of plugins that support resize. Is it worth augmenting it to dynamically check resize support?

Copy link
Member

@gnufied gnufied Apr 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to do that, but there was a concern raised before that doing so will force us to load all volume plugins in admission controller. It may not be an issue for api-server since it probably already loads all volume plugins but this needs to be carefully considered. Last time I tried something like this - it ended up increasing size of kubectl and few other binaries quite a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other thing is, if we do not add a capability like Resize for flex volume driver, we can not tell whether or not a flex volume driver supports resize.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saad-ali CSI will also likely run into this admission controller issue, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking of removing static checking we are currently doing and not do any checks at all. The reasoning is - if admission controller is enabled, we only allow resizing of PVC created from storageClasses where allowVolumeExpansion property is set to true. Now if somehow - k8s admin enabled that field for a SC that does not support resizing, we will log an PVC event and that will enable k8s admin to fix the SC.

Or to rephrase - I think the fact that, we allow resizing of only dynamically provisioned PVC with allowExpansion set to true should be more than enough. if a k8s admin misconfigures the SC then it will be considered an admin error (and he should know better).

```
type DriverCapabilities struct {
Attach bool `json:"attach"`
RequiresFSResize bool `json:"requiresFSResize"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a capability for the Resize alone? Not just FSResize?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One difficulty is that in order for Kubernetes to distinguish between different Flex driver types, we end up with an exponential explosion of plugin structs in the Kubernetes Flexvolume code - AttachablePlugin, ExpandablePlugin, AttachableExpandablePlugin, and the regular plugin. I'm not sure if there's a better way to do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, if we need to add more capability for Flex Volume in the future, may produce more type of plugin.

@xingzhou
Copy link
Contributor Author

For the default value of requiresFSResize, I remembered @chakri-nelluri used to have a comment that we want to leave the resize to kubelet by default. So in the design, the default value of this capability is true. It's optional, so that if user does not set this capability, by default, flex volume plugin will set true.

@akgunjal
Copy link

akgunjal commented Jun 1, 2018

@chakri-nelluri : Can you please do the review and approve if its fine?

@sandaymin123
Copy link

@saad-ali We need someone to review and approve this

@saad-ali
Copy link
Member

/approve

Let's make this the last feature added to Flex. All new features (e.g. snapshots, etc.) should be added to CSI not Flex. I do not want the Flex API to continue to expand. Continuing to grow both the CSI and Flex interfaces will became a nightmare to manage.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: saad-ali

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 27, 2018
@k8s-ci-robot k8s-ci-robot merged commit a747cf5 into kubernetes:master Jun 27, 2018
@verult
Copy link
Contributor

verult commented Mar 8, 2019

Hey @xingzhou , I noticed there aren't any code changes inside the resize admission controller. Did we decide to remove the admission controller changes in the end? Thanks!

@gnufied
Copy link
Member

gnufied commented Mar 19, 2019

@verult yeah we ended up removing static check from admission controller because it won't work for CSI too.

MadhavJivrajani pushed a commit to MadhavJivrajani/community that referenced this pull request Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/design Categorizes issue or PR as related to design. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants