Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: avoid using ' ' in availability zone #1380

Closed
wants to merge 2 commits into from

Conversation

jichenjc
Copy link
Contributor

What this PR does / why we need it:
topology.kubernetes.io/zone will be used as zone label to nodes but it does not allow blank ' '
however, blank is allowed in nova az settings thus this will lead to issue

this PR tries to solve that problem by replacing ' ' with '-'
but not sure this is the right way to go.. so get some opinion through the code change

Which issue this PR fixes(if applicable):
fixes #1379

Special notes for reviewers:

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 19, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign jichenjc after the PR has been reviewed.
You can assign the PR to them by writing /assign @jichenjc in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jichenjc
Copy link
Contributor Author

@seanschneeweiss @lingxiankong @ramineni need some comments ... thanks

@theopenlab-ci
Copy link

theopenlab-ci bot commented Jan 19, 2021

Build succeeded.

@theopenlab-ci
Copy link

theopenlab-ci bot commented Jan 19, 2021

Build succeeded.

@theopenlab-ci
Copy link

theopenlab-ci bot commented Jan 19, 2021

Build succeeded.

@theopenlab-ci
Copy link

theopenlab-ci bot commented Jan 19, 2021

Build succeeded.

@theopenlab-ci
Copy link

theopenlab-ci bot commented Jan 19, 2021

Build succeeded.

@theopenlab-ci
Copy link

theopenlab-ci bot commented Jan 19, 2021

Build succeeded.

@theopenlab-ci
Copy link

theopenlab-ci bot commented Jan 19, 2021

Build succeeded.

@theopenlab-ci
Copy link

theopenlab-ci bot commented Jan 19, 2021

Build succeeded.

@theopenlab-ci
Copy link

theopenlab-ci bot commented Jan 19, 2021

Build succeeded.

@theopenlab-ci
Copy link

theopenlab-ci bot commented Jan 19, 2021

Build succeeded.

@theopenlab-ci
Copy link

theopenlab-ci bot commented Jan 19, 2021

Build succeeded.

@ramineni
Copy link
Contributor

@jichenjc pls check my comment here , #1364 (comment)

Could we raise issue in k/k repo , to allow spaces? Is this issue coming only in 1.20

@jichenjc
Copy link
Contributor Author

https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/util/validation/validation.go#L166 is the code reported the error

I highly suspect it won't be accepted by k/k but anyway I opened kubernetes/kubernetes#98202

@ramineni
Copy link
Contributor

@jichenjc then this should be just be doc update , spaces not allowed in AZ names. replacing space with '-' is a problem , as what if AZ actually do exist with - , it might cause confusion.

And also, lets just keep the k/k issue open and take inputs from others as well.

@jichenjc
Copy link
Contributor Author

I can create a doc PR on this , but if we can find the issue early
for example, if it contains ' ' , how about disable something at creation stage? will dig into

@lingxiankong
Copy link
Contributor

As I commented in #1381, only described in doc that telling the k8s cluster admin that it's not possible to set the label because the az name in openstack cloud contains space. Normally, the cluster admin doesn't have access to the cloud, there is no option for the k8s cluster admins in this case. Additionally, changing az name in the cloud could involve lots of work that should be avoidable.

Being that said, for the label topology.kubernetes.io/zone, I would suggest we replacing ' ' with '-' and document the behavior.

@ramineni
Copy link
Contributor

@lingxiankong We can't change the AZ name in the label , replace ' ' or any special character with '-', that would fail scheduling when topology enabled in any CSI plugins . And we dont know , if there is already AZ named with '-'

eg: topology.kubernetes.io/zone = "test zone"

we replace it with "test-zone" , what if "test-zone" exists or we dont know if it can be created in future in openstack. That will lead to confusion and scheduling on node in wrong zone. IMO, this is used for scheduling , we should not change the value arbitrarily.

@jichenjc
Copy link
Contributor Author

I think test az and test-az is really a rare case, but AZ contains blank is not (we suffered this issue internally and we have to modify the AZ name which brings a lot of trouble)

several options maybe considered

  1. whether we can error early if we init OCCM and noticed the AZ contains blank , we error out
  2. we do a warning and modify the AZ according to this PR ,and we need check there is no conflict after this kind of name change for ALL azs

@lingxiankong
Copy link
Contributor

lingxiankong commented Jan 21, 2021

@lingxiankong We can't change the AZ name in the label , replace ' ' or any special character with '-', that would fail scheduling when topology enabled in any CSI plugins . And we dont know , if there is already AZ named with '-'

eg: topology.kubernetes.io/zone = "test zone"

we replace it with "test-zone" , what if "test-zone" exists or we dont know if it can be created in future in openstack. That will lead to confusion and scheduling on node in wrong zone. IMO, this is used for scheduling , we should not change the value arbitrarily.

Yes, you are right. That's why I mentioned if we go this way we should let the cluster admin know that 'test zone' and 'test-zone' will be treated as the same zone name (as @jichenjc said this is a very rare edge case), otherwise the cluster admin need to notify the cloud admin to change the zone name as required. Personally, I think this is an acceptable option compared to failing to initialize the cluster node. What do you think or what's your suggestion?

@jichenjc
Copy link
Contributor Author

@lingxiankong We can't change the AZ name in the label , replace ' ' or any special character with '-', that would fail scheduling when topology enabled in any CSI plugins . And we dont know , if there is already AZ named with '-'
eg: topology.kubernetes.io/zone = "test zone"
we replace it with "test-zone" , what if "test-zone" exists or we dont know if it can be created in future in openstack. That will lead to confusion and scheduling on node in wrong zone. IMO, this is used for scheduling , we should not change the value arbitrarily.

Yes, you are right. That's why I mentioned if we go this way we should let the cluster admin know that 'test zone' and 'test-zone' will be treated as the same zone name (as @jichenjc said this is a very rare edge case), otherwise the cluster admin need to notify the cloud admin to change the zone name as required. Personally, I think this is an acceptable option compared to failing to initialize the cluster node. What do you think or what's your suggestion?

I am +1 to this option with some document to describe the limtiations

@jichenjc
Copy link
Contributor Author

if folks agree, I will abandon #1381
and keep work on this

@ramineni
Copy link
Contributor

@lingxiankong We can't change the AZ name in the label , replace ' ' or any special character with '-', that would fail scheduling when topology enabled in any CSI plugins . And we dont know , if there is already AZ named with '-'
eg: topology.kubernetes.io/zone = "test zone"
we replace it with "test-zone" , what if "test-zone" exists or we dont know if it can be created in future in openstack. That will lead to confusion and scheduling on node in wrong zone. IMO, this is used for scheduling , we should not change the value arbitrarily.

Yes, you are right. That's why I mentioned if we go this way we should let the cluster admin know that 'test zone' and 'test-zone' will be treated as the same zone name (as @jichenjc said this is a very rare edge case), otherwise the cluster admin need to notify the cloud admin to change the zone name as required. Personally, I think this is an acceptable option compared to failing to initialize the cluster node. What do you think or what's your suggestion?

@lingxiankong If its only the label that is affected , then may be we could tweak it as suggested.. As I mentioned already , scheduling fails based on the value. For instance, once we change , node AZ doesnt match with volume AZ and the scheduling fails that no valid node. So we need to tweak volume AZ also accordingly. There might be other places / plugins tweaking is necessary as the value by other sidecars to do validation. Its better to fail at node initialization , rather than failing it later stage due to some issue. Its a limitation from Kubernetes side , so I think document it clearly and let the admin know beforehand is better IMHO

@lingxiankong
Copy link
Contributor

lingxiankong commented Jan 25, 2021

@lingxiankong If its only the label that is affected , then may be we could tweak it as suggested.. As I mentioned already , scheduling fails based on the value. For instance, once we change , node AZ doesnt match with volume AZ and the scheduling fails that no valid node. So we need to tweak volume AZ also accordingly. There might be other places / plugins tweaking is necessary as the value by other sidecars to do validation. Its better to fail at node initialization , rather than failing it later stage due to some issue. Its a limitation from Kubernetes side , so I think document it clearly and let the admin know beforehand is better IMHO

@ramineni Could you please remind me how we specify volume AZ when creating PV/PVC and where should the user get the volume AZ? Does that support space? If yes, you are right, a PV/PVC of az "test az" won't fit into node az "test-az".

My understanding of the steps is (could be wrong), the user:

  1. Get AZs from the node label (the AZ is set by OCCM, so the user can only see "test-az")

  2. Create PV by specifying

    allowedTopologies:
    - matchLabelExpressions:
      - key: topology.kubernetes.io/zone
        values:
        - test-az
    

Because OCCM already converts "test az" to "test-az", so the user should never use "test az", there is no scheculing issue from what I can see (unless I've missed something).

@ramineni
Copy link
Contributor

ramineni commented Jan 27, 2021

@lingxiankong If its only the label that is affected , then may be we could tweak it as suggested.. As I mentioned already , scheduling fails based on the value. For instance, once we change , node AZ doesnt match with volume AZ and the scheduling fails that no valid node. So we need to tweak volume AZ also accordingly. There might be other places / plugins tweaking is necessary as the value by other sidecars to do validation. Its better to fail at node initialization , rather than failing it later stage due to some issue. Its a limitation from Kubernetes side , so I think document it clearly and let the admin know beforehand is better IMHO

@ramineni Could you please remind me how we specify volume AZ when creating PV/PVC and where should the user get the volume AZ? Does that support space? If yes, you are right, a PV/PVC of az "test az" won't fit into node az "test-az".

My understanding of the steps is (could be wrong), the user:

  1. Get AZs from the node label (the AZ is set by OCCM, so the user can only see "test-az")
  2. Create PV by specifying
    allowedTopologies:
    - matchLabelExpressions:
      - key: topology.kubernetes.io/zone
        values:
        - test-az
    

Because OCCM already converts "test az" to "test-az", so the user should never use "test az", there is no scheculing issue from what I can see (unless I've missed something).

@lingxiankong yes, that can be one scenario and that requires volume to be created in correct AZ and parameters inside volume should be changed as well. The other one can be, it can be specified as storage class parameter , availability which accepts space which we create the volume in specified AZ ,so while create we have to use with space and while displaying we have to show with '-'. And if they specify with '-' instead of space , we have to validate which is correct AZ before creating vol, change it etc. Its to much tweaking and maintenance in code, which I feel is not good.(and this is for only one plugin/one scenario, we have to see what changes needed in other plugins / other scenarios etc.) And above all that, it just adds more confusion I feel, some labels it has space , some it has '-'

@lingxiankong
Copy link
Contributor

The other one can be, it can be specified as storage class parameter , availability which accepts space which we create the volume in specified AZ ,so while create we have to use with space and while displaying we have to show with '-'.

When Kubernetes user is creating StorageClass, where should she look for the correct AZ names? Isn't coming from Node label?

@ramineni
Copy link
Contributor

The other one can be, it can be specified as storage class parameter , availability which accepts space which we create the volume in specified AZ ,so while create we have to use with space and while displaying we have to show with '-'.

When Kubernetes user is creating StorageClass, where should she look for the correct AZ names? Isn't coming from Node label?

I suppose its from cinder availability-zone-list

@lingxiankong
Copy link
Contributor

I suppose its from cinder availability-zone-list

IMHO, the k8s cluster admin should always get zones from the Node label, we shouldn't have the assumption that the k8s cluster admin is able to access openstack cloud. That's the reason I said before that replacing " " with "-" in OCCM is safe because the Node label can never contain " " as that's not allowed.

@ramineni
Copy link
Contributor

I suppose its from cinder availability-zone-list

IMHO, the k8s cluster admin should always get zones from the Node label, we shouldn't have the assumption that the k8s cluster admin is able to access openstack cloud. That's the reason I said before that replacing " " with "-" in OCCM is safe because the Node label can never contain " " as that's not allowed.

Node zones can be different from volume zones ..thats the reason i mentioned it should be retrieved from cinder

@lingxiankong
Copy link
Contributor

I suppose its from cinder availability-zone-list

IMHO, the k8s cluster admin should always get zones from the Node label, we shouldn't have the assumption that the k8s cluster admin is able to access openstack cloud. That's the reason I said before that replacing " " with "-" in OCCM is safe because the Node label can never contain " " as that's not allowed.

Node zones can be different from volume zones ..thats the reason i mentioned it should be retrieved from cinder

Well, yes, for some historical reasons, one of the things that complicates use of availability zones is that each OpenStack project implements them in their own way (if at all). However, although it's possible, I've never heard of such use cases that Nova and Cinder have configured different availability zones, because that would cause lots of issues, e.g when the nova-compute service creates the volumes to attach to the server during its creation, it will request that those volumes are created in the same availability zone as the server, which must exist in the block storage (cinder) service. But I'm happy to know more if you have (or heard of) any particular reason that doing so in real deployments.

Another point is, I noticed you are using cinder command which is deprecated and should be replaced by openstack command. However, neither in openstack command nor on Cinder official API doc describes that the volume availability zone API is supported.

The reason I insist on getting this worked is to make sure OCCM can always be working for the cluster admins without too much hassle to change the underlying openstack cloud which the cluster admins usually don't necessarily have access to, otherwise, it's highly unlikely the OCCM could be deployed.

As you said, If there are use cases that the storageclass already specify space-based az (i.e. test az) in the paramters, how the PVC could be attached to a pod which is running on a node that doesn't support test az in the label? I'm not familiar with kubernetes scheduling but just curious about how that's implemented.

Changing " " to "-" in node label topology.kubernetes.io/zone may bring confusion but hopefully we could carefully describe that in the doc to let the users know the limitations without interrupting the existing functions.

@ramineni
Copy link
Contributor

ramineni commented Jan 29, 2021

@lingxiankong we do have some cases, thats the reason we support ignore-volume-az option in such cases. But I'm afraid thats not my point. All I'm saying is changing space to '-' is not that straight forward as you mentioned as we did in instance-type, it requires lot of tweaking at various parts of code, so we have to carefully consider all scenarios before changing that.

How about we keep this on hold as of now and stick to doc limitation for now, and if we have many such issues reported , and is very common, we could again relook at the permanent solution. As this is the first issue reported out of all releases (as limitation is there from the start, its not newly added one), I'm assuming its not common . wdyt?

@lingxiankong
Copy link
Contributor

Agree.

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 29, 2021
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 11, 2021
@k8s-ci-robot
Copy link
Contributor

@jichenjc: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ramineni
Copy link
Contributor

closing this . As it has been decided we don't solve this issue for now but add a note in the documentation.
We will come back here if we hear more feedback in the future.

/close

@k8s-ci-robot
Copy link
Contributor

@ramineni: Closed this PR.

In response to this:

closing this . As it has been decided we don't solve this issue for now but add a note in the documentation.
We will come back here if we hear more feedback in the future.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Node registration fails if nova AZ value has ' '
4 participants