-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to stage volume to node #2269
Comments
ssh to the worker node, we can see the devices(sdd sde) are there, but not staged.
|
from the error msg, it seems you are missing such udev rule on the node, follow guide here to generate the udev rule on the node: #1631 (comment), Azure VM should have such built-in rule by default, is your node type special? @xiaoping8385 |
There is no such rule, and not all the node have the same issue, for eg, we found one of the nodes can mount pv successfully, and there is also no such built-in-rule,
In problematic node,
Both node are created from the same template |
here is the data disk parsing logic, the reliable way is creating that udev rule on the node, that would create device links under azuredisk-csi-driver/pkg/azuredisk/azure_common_linux.go Lines 120 to 194 in 99a117c
|
Who should be responsible to creating the udev rule? I thought the vm is created by azure cloud. and it can not explain that some of the nodes work and some doesn't |
can you try creating the udev rule on one node and check whether it works? @xiaoping8385 if it's working, then I could provide you a daemonset to creating udev rule on every node. |
So i need to create a script (66-azure-storage.rules )as described in #1631 comment and run that? |
@xiaoping8385 just create that script, and it would work automatically. and then reschedule pod with disk volume onto that node. |
@andyzhangx Still not work, I deployed the script, and delete the pod, and also restart the csi-azuredisk-node, but the pod still not running. Is that possible we have a zoom or chat? |
@xiaoping8385 can you run following command on the node, and try again? This command will reload the udev rules and trigger the creation of any new device nodes that match the rules.
|
it works finally, thanks, and one more question, how should we prevent this issue in future or do we need an extra deployment? what's the root cause for this issue |
@xiaoping8385 per this doc, the linux VM should have udev rules: What is your k8s cluster? are you manually setting up k8s cluster based on Azure Ubuntu VMs? |
@andyzhangx Sorry for late reply, Our k8s cluster is actually created by bosh director, which has a CPI to talk with azure platform to create vms, I don't think there is a udev rules in our envs,as I mentioned before,we have two nodes without such rule, but in one node the volume can be successfully staged, but the other don't. is there other ways other than udev rules that help to stage the volume? |
@xiaoping8385 using udev rules are the most reliable way to detect a data disk correctly. Are you able to create a daemonset to write udev rules on every node? |
sometimes yes, since this issue is happened occasionally. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What happened:
When we run several pod on k8s cluster two of the pods which running on a same node can not work
What you expected to happen:
The pod should running
How to reproduce it:
Happens occasionally, recreate the worker node usually resolve this issue
Anything else we need to know?:
Environment:
kubectl version
):uname -a
):The text was updated successfully, but these errors were encountered: