Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Ultra Disk Enablement for Persistent Volumes #1852

Closed
JoelSpeed opened this issue Nov 11, 2021 · 12 comments · Fixed by #2421
Closed

Allow Ultra Disk Enablement for Persistent Volumes #1852

JoelSpeed opened this issue Nov 11, 2021 · 12 comments · Fixed by #2421
Assignees
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature.

Comments

@JoelSpeed
Copy link
Contributor

/kind feature

Describe the solution you'd like
Further to #1134 which allows the Azure ultra disks to be attached to VMs as data disks, it would be useful to enable the ultra disk capability on the VMs when no data disks are attached.

The Azure CSI driver has the ability to provision ultra disk based persistent volumes, but for this to work, the VMs that the PVs are attached to must have the AdditionalCapabilities.UltraSSDEnabled set to true.

When CAPZ creates a VM today, there is no way to explicitly enable this capability. This is only enabled when a ultra disk data disk is defined within the machine spec.

We should add a new field to the machine spec to allow this feature to be explicitly enabled, and possibly (based on suggestion from @alexeldeib) default this to true using a webhook when an ultra SSD data disk is defined on the machine.

CC @CecileRobertMichon @alexeldeib

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • cluster-api-provider-azure version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):
@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 11, 2021
@CecileRobertMichon
Copy link
Contributor

/help

@k8s-ci-robot
Copy link
Contributor

@CecileRobertMichon:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Jan 25, 2022
@JoelSpeed
Copy link
Contributor Author

@damdo has volunteered to look into this, though not a Kube-sigs member so I don't think I can assign him

I'll take the responsibility for this for now
/assign

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 25, 2022
@damdo
Copy link
Member

damdo commented Apr 26, 2022

I'm now in the process of putting together a design proposal for this.
I will update this issue once that's ready for discussion.
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 26, 2022
@damdo
Copy link
Member

damdo commented May 11, 2022

Here is the API design and behaviour proposal for adding Azure Ultra Disks support via Persistent Volumes to CAPZ.

As a note to this, the design choices behind this proposal and the reason why we are proposing an Enum field with string values rather than a boolean (or boolean pointer) has two roots:

  1. The Kubernetes API conventions have a clear statement discouraging boolean fields: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#primitive-types
  2. This has also been through a round of reviews in downstream avenues by @deads2k and @JoelSpeed

Azure Ultra Disks via Persistent Volumes - Proposal

To allow the mounting of Persistent Volume Claims of StorageClass UltraSSD_LRS to Pods living on an arbitrary Node backed by a CAPI Machine, there is a need for the ability to attach an Ultra Disk to the Azure instance backing that Machine.

To give instances this ability an Azure Additional Capability (AdditionalCapabilities.UltraSSDEnabled) must be specified on the instance. This will allow/disallow the attachment of Ultra Disks to it.

Furthermore the UltraSSDEnabled Azure Additional Capability must be present on instances attaching Ultra Disks as Data Disks. Currently when an Ultra Disk is specified as a Data Disk the provider automatically sets this field to true.

So when coming up with a proposal for extending the API, the fact that the UltraSSDEnabled Additional Capability has the ability to govern both features and how it must change depending on what features the user wants to use, must both be taken into account.

For this purpose a new field will be added to the
AzureMachineSpec struct. The interface will be an optional UltraSSDCapability struct of type AzureUltraSSDCapabilityState which will allow specifying an Enum of states to instruct the provider whether to enable the capability for attaching Ultra Disks to an instance or not:

  • if UltraSSDCapability is Enabled then the UltraSSDEnabled Additional capability will be set to true for the Machine, which will allow Ultra Disks attachments in both ways (via PVCs)
  • if UltraSSDCapability is Disabled then the UltraSSDEnabled Additional capability will be set to false for the Machine, which will disallow Ultra Disks attachements in both ways (via PVCs)
  • if UltraSSDCapability is omitted, the provider may enable the UltraSSDEnabled capability depending on whether any Ultra Disks are specified as Data Disks
type AzureMachineSpec struct {
  // Existing fields will not be modified
  ...

  // UltraSSDCapability enables or disables Azure UltraSSD capability for a virtual machine.
  // When omitted, the platform may enable the capability based on the configuration of data disks.
  // +kubebuilder:validation:Enum:="Enabled";"Disabled"
  // +optional
  UltraSSDCapability AzureUltraSSDCapabilityState `json:"ultraSSDCapability,omitempty"`
}

// AzureUltraSSDCapabilityState defines the different states of an UltraSSDCapability
type AzureUltraSSDCapabilityState string

// These are the valid AzureUltraSSDCapabilityState states.
const (
  // "AzureUltraSSDCapabilityTrue" means the Azure UltraSSDCapability is Enabled
  AzureUltraSSDCapabilityTrue AzureUltraSSDCapabilityState = "Enabled"
  // "AzureUltraSSDCapabilityFalse" means the Azure UltraSSDCapability is Disabled
  AzureUltraSSDCapabilityFalse AzureUltraSSDCapabilityState = "Disabled"
)

PTAL @CecileRobertMichon @alexeldeib

@CecileRobertMichon
Copy link
Contributor

The Kubernetes API conventions have a clear statement discouraging boolean fields:

I want to push back a little bit on this. The API conventions do not discourage booleans, they just encourage being intentional about booleans and planning for future evolution of the fields:

"Think twice about bool fields. Many ideas start as boolean but eventually trend towards a small set of mutually exclusive options. Plan for future expansions by describing the policy options explicitly as a string type alias (e.g. TerminationMessagePolicy)."

In this particular case, do you foresee future options for AzureUltraSSDCapabilityState besides "true" and "false"? Seems like we are using strings to implement a boolean here.

Additionally, the design should take into account future "additional capabilities" that we might want to support for VMs, such as Hibernation (so far the only other exposed option AFAIK). Would we add a separate field in the AzureMachineSpec for each capability? What if Azure eventually supports 10+ "additional" capabilities? This might be a good reason to stay closer to the Azure SDK and add an "AdditionalCapabilities" field as such:

type AzureMachineSpec struct {
  // Existing fields will not be modified
  ...

  // AdditionalCapabilities specifies additional capabilities enabled or disabled on the virtual machine.
  // +optional
  AdditionalCapabilities *AdditionalCapabilities `json:"additionalCapabilities,omitempty"`
}

// AdditionalCapabilities enables or disables a capability on the virtual machine.
type AdditionalCapabilities struct {
  // UltraSSDEnabled enables or disables Azure UltraSSD capability for the virtual machine.
  // Defaults to true if Ultra SSD data disks are specified.
  // +optional
  UltraSSDEnabled *bool `json:"ultraSSDEnabled,omitempty"`
  ...
  // other capabilities can be added later
}

@JoelSpeed
Copy link
Contributor Author

In this particular case, do you foresee future options for AzureUltraSSDCapabilityState besides "true" and "false"? Seems like we are using strings to implement a boolean here.

if UltraSSDCapability is omitted, the provider may enable the UltraSSDEnabled capability depending on whether any Ultra Disks are specified as Data Disks

In a previous discussion I mentioned the fact that the behaviour described in the second point I've quoted above should be behind a user facing option. The way I was looking at this, the UltraSSDCapability does indeed have 4 states already:

  • "": User hasn't set anything and therefore we shouldn't do anything, in case enabling the capability limits other features
  • Enabled: User has asked us to enable it, so we do
  • Disabled: User has asked us to make sure it's disabled, so we do
  • Auto: User wants us to try to work out if we think it should be enabled, ie, if we see an SSD needing the capability, we turn it on, else we leave it off

I'm not sure others felt so strongly about the auto on/off being a feature users should opt into though

This might be a good reason to stay closer to the Azure SDK and add an "AdditionalCapabilities" field as such:

This seems worthwhile to me

@damdo
Copy link
Member

damdo commented May 30, 2022

@CecileRobertMichon thanks a lot for your feedback on this.

This might be a good reason to stay closer to the Azure SDK and add an "AdditionalCapabilities" field

I agree and think it's a good idea to have an "AdditionalCapabilities" struct field to collect and support future additional capabilities.

do you foresee future options for future options for AzureUltraSSDCapabilityState besides "true" and "false"?
Seems like we are using strings to implement a boolean here.

The initial reasoning behind proposing it as a multi-state Enum is basically what @JoelSpeed said. To capture "automatic" behaviour of enabling/disabling the feature based on needs and status quo, which may be useful in situations like this one for example:

default this to true [...] when an ultra SSD data disk is defined on the machine

@damdo
Copy link
Member

damdo commented Jun 20, 2022

@JoelSpeed

I'm not sure others felt so strongly about the auto on/off being a feature users should opt into though

Yeah I think the Auto vs. "" and what default means here could be a good discussion point.

Let's see what @CecileRobertMichon thinks :)

@CecileRobertMichon
Copy link
Contributor

Not sure I see the value in having both "" and "Auto". If a user sets "" but then uses an ultra SSD data disk, would we still let the value be empty and therefore fail the user? That seems confusing to me. Why not keep it simple?

True: explicitly enabled by the user
False: explicitly disabled by the user
nil (default): if we see an SSD needing the capability, we turn it on, else we leave it off (current behavior, backward compatible)

@damdo
Copy link
Member

damdo commented Jun 24, 2022

Thanks for your suggestions @CecileRobertMichon, ok let's go with this design then 👍
I created a PR that introduces this feature with the agreed design, see: #2421.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants