Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Velero Pod Replicas & alternative for emptydir #475

Open
kkavin opened this issue Jul 4, 2023 · 6 comments
Open

Velero Pod Replicas & alternative for emptydir #475

kkavin opened this issue Jul 4, 2023 · 6 comments

Comments

@kkavin
Copy link

kkavin commented Jul 4, 2023

What steps did you take and what happened:
Velero pod was evicted due to disk full in worker nodes in GKE.

We raised a support ticket with Google Cloud regarding the pod eviction due to the storage issue in the worker node. They reported that:

"Our analysis concluded that the pods are using emptyDir for scratch space. As per the product behavior, this uses storage space from the node's disk. It creates emptyDir volumes from the node's local disk, network storage, or memory-backed file system."

"Following up with the conclusion, we recommend using a Persistent Volume Claim (PVC) instead. This seems necessary because the “velero & restic” pods use a lot of storage. This results in the eviction of the pods."

Following their analysis, we have planned to add persistent storage for the velero and restic pods instead of emptyDir.

We need to know if we can use a GCS bucket for the velero and restic pods. By default, the Helm chart comes with 1 replica. Is it possible to add more than 1 replicas? Will Velero work with more than 1 replicas?

Velero-Error
velero-Issue

Environment:

  • helm version (use helm version): v3.7.2
  • Kubernetes version (use kubectl version): 1.23.0
  • Cloud provider or hardware configuration: Google Cloud
@jenting jenting added the velero label Jul 5, 2023
@jenting
Copy link
Collaborator

jenting commented Jul 5, 2023

We need to know if we can use a GCS bucket for the velero and restic pods.

Yes, Velero could work with a GCS bucket.
https://github.com/vmware-tanzu/velero-plugin-for-gcp#setup

Will Velero work with more than 1 replicas?

No. Velero server does not work with more than 1 replica.

we have planned to add persistent storage for the velero and restic pods instead of emptyDir.

I did not tried it before but I think it's possible and doable.

@navilg
Copy link
Contributor

navilg commented Jul 6, 2023

@jenting What data is filled in emptyDir path ? is housekeeping of this path not done by velero ? I think there are temporary data under this path.

@kkavin
Copy link
Author

kkavin commented Jul 12, 2023

@jenting Can you please let us know what data are stored in the /scratch or emptyDir ? Often, we are getting issue in the velero pod it has been evicted due to disk pressure or the node was on low disk space ephemeral storage error.

@jenting
Copy link
Collaborator

jenting commented Jul 12, 2023

@qiuming-best could you help this issue?

@qiuming-best
Copy link
Collaborator

@kkavin Velero server could not work with more than 1 replica, it'll have concurrency issues currently.

The scratch dir it's a place where Restic put its' cache in it, and the empty dir is where Velero put its' third-party plugin.

All of the Restic cache or third-party plugins are temp files, so we didn't put them into persistent volume.

But for your problem, you could put them into persistent volume and it's work.

@DonghaopengZhu
Copy link

Hi @qiuming-best and @jenting,
I just came across the same issue. As you can see the node that velero locates got a spike of usage of node filesystem size in a short time.
图片
And then, it was evicted by kubelet.
"kind":"Pod","namespace":"velero","name":"velero-c4844d876-bvntd","uid":"1960a28a-15e6-44da-ab2b-65bf77616020","apiVersion":"v1","resourceVersion":"452456224"},"reason":"Evicted","message":"The node was low on resource: ephemeral-storage. Threshold quantity: 5119338572, available: 4544316Ki. Container velero was using 211020Ki, request is 0, has larger consumption of ephemeral-storage.
I just wondering why the ephemeral storage that emptyDir consumes grows rapidly at this short period and I'm sure there is neither restic backup(pv backup) nor object backup performed. So when does velero or restic store data to the emptyDir?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants