Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blank disk with block volume mode stuck in ImportScheduled for WFFC storage class #2915

Closed
lzang opened this issue Oct 4, 2023 · 17 comments · Fixed by #2917
Closed

Blank disk with block volume mode stuck in ImportScheduled for WFFC storage class #2915

lzang opened this issue Oct 4, 2023 · 17 comments · Fixed by #2917
Assignees
Labels

Comments

@lzang
Copy link

lzang commented Oct 4, 2023

What happened:
Created a blank DV with block volume mode for WFFC storage class, and found that it is stuck in ImportScheduled phase

What you expected to happen:
Expect the disk to be in Succeeded phase.

How to reproduce it (as minimally and precisely as possible):
Steps to reproduce the behavior.

  1. Have a storage class that supports block mode and WFFC
  2. Create a blank DV as below:
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  name: empty-disk
spec:
  source:
    blank: {}
  storage:
    accessModes:
    - ReadWriteMany
    resources:
      requests:
        storage: 10Gi
    storageClassName: block-sc
    volumeMode: Block
  1. Observe that the dv is stuck in ImportScheduled phase.

Additional context:

Adding the annotation cdi.kubevirt.io/storage.usePopulator: "false" to the dv allows the import to fall back to the old way and bypass the issue.

Environment:

  • CDI version (use kubectl get deployments cdi-deployment -o yaml): v1.57.0
@lzang lzang added the kind/bug label Oct 4, 2023
@alromeros
Copy link
Collaborator

Hey @lzang, thanks for the issue! This might be a bug in our populators flow, I'll take a look and let you know when I find something.

@akalenyu
Copy link
Collaborator

akalenyu commented Oct 4, 2023

I have opened a PR to hopefully address this; though I had no luck reproducing this without
the bind.immediate annotation

Would appreciate if you could attach the CDI CR and confirm the logs in the PR description match the ones in your cdi-deployment pod

@lzang
Copy link
Author

lzang commented Oct 4, 2023

@akalenyu Thank you for looking at this! Sorry I forgot to mention that in CDI, we didn't enable HonorWaitForFirstConsumer feature gate. We basically have an empty CDI config. Is that the reason why we hit this issue?

I have opened a PR to hopefully address this; though I had no luck reproducing this without the bind.immediate annotation

Would appreciate if you could attach the CDI CR and confirm the logs in the PR description match the ones in your cdi-deployment pod

@awels
Copy link
Member

awels commented Oct 4, 2023

The standard cdi should look something like this

apiVersion: cdi.kubevirt.io/v1beta1
kind: CDI
metadata:
  name: cdi
spec:
  config:
    featureGates:
    - HonorWaitForFirstConsumer
  imagePullPolicy: IfNotPresent
  infra:
    nodeSelector:
      kubernetes.io/os: linux
    tolerations:
    - key: CriticalAddonsOnly
      operator: Exists
  workload:
    nodeSelector:
      kubernetes.io/os: linux

@awels
Copy link
Member

awels commented Oct 4, 2023

The only real difference is that any datavolume associated with it, goes into WaitForFirstConsumer phase, instead of pending. This tells KubeVirt it is safe to start the VM, and the population will take place at that point.

@lzang
Copy link
Author

lzang commented Oct 4, 2023

The standard cdi should look something like this

apiVersion: cdi.kubevirt.io/v1beta1
kind: CDI
metadata:
  name: cdi
spec:
  config:
    featureGates:
    - HonorWaitForFirstConsumer
  imagePullPolicy: IfNotPresent
  infra:
    nodeSelector:
      kubernetes.io/os: linux
    tolerations:
    - key: CriticalAddonsOnly
      operator: Exists
  workload:
    nodeSelector:
      kubernetes.io/os: linux

Yes, that is what we have except the feature gate configuration. I do expect it works even without the feature gate enabled.

@lzang
Copy link
Author

lzang commented Oct 4, 2023

The only real difference is that any datavolume associated with it, goes into WaitForFirstConsumer phase, instead of pending. This tells KubeVirt it is safe to start the VM, and the population will take place at that point.

But for image download case, I think it makes sense to populate the disk even if no VM uses it.

@awels
Copy link
Member

awels commented Oct 4, 2023

The only real difference is that any datavolume associated with it, goes into WaitForFirstConsumer phase, instead of pending. This tells KubeVirt it is safe to start the VM, and the population will take place at that point.

But for image download case, I think it makes sense to populate the disk even if no VM uses it.

Not with WaitForFirstConsumer, we want the disk image to end up in the right location for the scheduler to have maximum flexibility when scheduling the VM. If we populate immediately, then (lets say the WFFC is because of local storage) then the VM will have to end up on whatever node the disk was downloaded to. If we wait with populating then the scheduler can schedule on any node, at which point we download to that node.

@awels
Copy link
Member

awels commented Oct 4, 2023

It gets even more interesting with multiple disks on a VM, if we import immediately (or create blank disks immediately on random nodes) then it is entirely possible the VM cannot run because the disks are scattered across nodes. The feature gate enables CDI to pass the WFFC status to KubeVirt, so that CDI won't populate the disks until the VM is scheduled to run on a particular node. And since its the workload determining where the disks end up, they all end up on the same node.

@awels
Copy link
Member

awels commented Oct 4, 2023

Note that if for whatever reason, you do want to import the image without a VM using it (because you are going to clone it for an actual VM). You can pass an annotation to the DataVolume, that forces CDI to import it to a random node.

@akalenyu
Copy link
Collaborator

akalenyu commented Oct 4, 2023

I do expect it works even without the feature gate enabled.

Yes, we do have a loose end in that regard. You are right.
With the feature gate disabled (or with the bind.immediate annotation, those two are equivalent) we basically
don't provide the wffc override functionality. This differs from other CDI flows where we do.

I would just make sure that this wffc override functionality is something you desire. It is usually not the best idea
like @awels mentions. It makes sense for several use cases like setting up a golden image where no VM will be started from it

@awels
Copy link
Member

awels commented Oct 4, 2023

We should probably make the feature gate HonorWaitForFirstConsumer the default at this point, since all our tests assume this. And it should be a much better experience for the user with it enabled.

@lzang
Copy link
Author

lzang commented Oct 6, 2023

It gets even more interesting with multiple disks on a VM, if we import immediately (or create blank disks immediately on random nodes) then it is entirely possible the VM cannot run because the disks are scattered across nodes. The feature gate enables CDI to pass the WFFC status to KubeVirt, so that CDI won't populate the disks until the VM is scheduled to run on a particular node. And since its the workload determining where the disks end up, they all end up on the same node.

I am aware of this issue. But in general we do prefer to use non-local storage for serious use case so that VM can be moved around freely.

@lzang
Copy link
Author

lzang commented Oct 6, 2023

I do expect it works even without the feature gate enabled.

Yes, we do have a loose end in that regard. You are right. With the feature gate disabled (or with the bind.immediate annotation, those two are equivalent) we basically don't provide the wffc override functionality. This differs from other CDI flows where we do.

I would just make sure that this wffc override functionality is something you desire. It is usually not the best idea like @awels mentions. It makes sense for several use cases like setting up a golden image where no VM will be started from it

Yes, this is one of the case we want the disk to be prepared instead of waiting to be bound. Also conceptually it makes sense to prepare a disk when we create it instead of deferring its creation as disk and VMs are separate entities.

@akalenyu
Copy link
Collaborator

akalenyu commented Oct 8, 2023

Also conceptually it makes sense to prepare a disk when we create it instead of deferring its creation as disk and VMs are separate entities

Not sure this is correct with lazy (WFFC) binding (pods & PVCs will behave the same)

this is one of the case we want the disk to be prepared instead of waiting to be bound

Hmm, I would get that for stray golden disks. But we're talking about a blank disk that is only of use in a VM

@lzang
Copy link
Author

lzang commented Oct 9, 2023

Also conceptually it makes sense to prepare a disk when we create it instead of deferring its creation as disk and VMs are separate entities

Not sure this is correct with lazy (WFFC) binding (pods & PVCs will behave the same)

this is one of the case we want the disk to be prepared instead of waiting to be bound

Hmm, I would get that for stray golden disks. But we're talking about a blank disk that is only of use in a VM

I would suggest not to debate which behavior is better here, but instead fix the issue, i.e blank disk with block volume doesn't work without the feature gate HonorWaitForFirstConsumer and WFFC storage class.

@aglitke
Copy link
Member

aglitke commented Oct 9, 2023

/assign @akalenyu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants