-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blank disk with block volume mode stuck in ImportScheduled for WFFC storage class #2915
Comments
Hey @lzang, thanks for the issue! This might be a bug in our populators flow, I'll take a look and let you know when I find something. |
I have opened a PR to hopefully address this; though I had no luck reproducing this without Would appreciate if you could attach the CDI CR and confirm the logs in the PR description match the ones in your cdi-deployment pod |
@akalenyu Thank you for looking at this! Sorry I forgot to mention that in CDI, we didn't enable HonorWaitForFirstConsumer feature gate. We basically have an empty CDI config. Is that the reason why we hit this issue?
|
The standard cdi should look something like this apiVersion: cdi.kubevirt.io/v1beta1
kind: CDI
metadata:
name: cdi
spec:
config:
featureGates:
- HonorWaitForFirstConsumer
imagePullPolicy: IfNotPresent
infra:
nodeSelector:
kubernetes.io/os: linux
tolerations:
- key: CriticalAddonsOnly
operator: Exists
workload:
nodeSelector:
kubernetes.io/os: linux |
The only real difference is that any datavolume associated with it, goes into WaitForFirstConsumer phase, instead of pending. This tells KubeVirt it is safe to start the VM, and the population will take place at that point. |
Yes, that is what we have except the feature gate configuration. I do expect it works even without the feature gate enabled. |
But for image download case, I think it makes sense to populate the disk even if no VM uses it. |
Not with WaitForFirstConsumer, we want the disk image to end up in the right location for the scheduler to have maximum flexibility when scheduling the VM. If we populate immediately, then (lets say the WFFC is because of local storage) then the VM will have to end up on whatever node the disk was downloaded to. If we wait with populating then the scheduler can schedule on any node, at which point we download to that node. |
It gets even more interesting with multiple disks on a VM, if we import immediately (or create blank disks immediately on random nodes) then it is entirely possible the VM cannot run because the disks are scattered across nodes. The feature gate enables CDI to pass the WFFC status to KubeVirt, so that CDI won't populate the disks until the VM is scheduled to run on a particular node. And since its the workload determining where the disks end up, they all end up on the same node. |
Note that if for whatever reason, you do want to import the image without a VM using it (because you are going to clone it for an actual VM). You can pass an annotation to the DataVolume, that forces CDI to import it to a random node. |
Yes, we do have a loose end in that regard. You are right. I would just make sure that this wffc override functionality is something you desire. It is usually not the best idea |
We should probably make the feature gate |
I am aware of this issue. But in general we do prefer to use non-local storage for serious use case so that VM can be moved around freely. |
Yes, this is one of the case we want the disk to be prepared instead of waiting to be bound. Also conceptually it makes sense to prepare a disk when we create it instead of deferring its creation as disk and VMs are separate entities. |
Not sure this is correct with lazy (WFFC) binding (pods & PVCs will behave the same)
Hmm, I would get that for stray golden disks. But we're talking about a blank disk that is only of use in a VM |
I would suggest not to debate which behavior is better here, but instead fix the issue, i.e blank disk with block volume doesn't work without the feature gate HonorWaitForFirstConsumer and WFFC storage class. |
/assign @akalenyu |
What happened:
Created a blank DV with block volume mode for WFFC storage class, and found that it is stuck in ImportScheduled phase
What you expected to happen:
Expect the disk to be in Succeeded phase.
How to reproduce it (as minimally and precisely as possible):
Steps to reproduce the behavior.
Additional context:
Adding the annotation
cdi.kubevirt.io/storage.usePopulator: "false"
to the dv allows the import to fall back to the old way and bypass the issue.Environment:
kubectl get deployments cdi-deployment -o yaml
): v1.57.0The text was updated successfully, but these errors were encountered: