OADP Operator - Backups and CSI Volume snapshots running a long time to complete the Scheduled backups #1422

Naveen-Kamagani · 2024-06-12T06:51:30Z

We use the OADP operator to backup Kubernetes resources and EBS volumes using the CSI snapshot feature.
We have created a scheduler to trigger backups daily. But each backup is running nearly 2 hours. Is there any way we can reduce the backup running time to nearly 30 minutes or within 1 hour? Is there anyway to trigger to CSI Snapshots parallel and complete them fast so that backup runs faster. We are looking for different options.

DataProtectionApplication -

apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
  name: bcdr-data-protection-app
  namespace: oadp-velero
spec:
  backupLocations:
    - velero:
        config:
          profile: default
          region: us-east-1
        credential:
          key: cloud
          name: cloud-credentials
        default: true
        objectStorage:
          bucket: bcdr-stg-cp-01-bcdr-us-east-1-s3
          prefix: bcdr-stg-cp-01
        provider: aws
  configuration:
    restic:
      enable: false
    velero:
      defaultPlugins:
        - openshift
        - aws
        - csi
      featureFlags:
        - EnableCSI
      podConfig:
        resourceAllocations:
          limits:
            cpu: '3'
            memory: 3Gi
          requests:
            cpu: 500m
            memory: 512Mi
  features: {}
  podDnsConfig: {}
  snapshotLocations:
    - velero:
        config:
          profile: default
          region: us-east-1
        provider: aws
status:
  conditions:
    - lastTransitionTime: '2024-06-12T05:22:58Z'
      message: Reconcile complete
      reason: Complete
      status: 'True'
      type: Reconciled

Scheduled Backup -

apiVersion: velero.io/v1
kind: Backup
metadata:
  name: bcdr-scheduler-20240611140020
  namespace: oadp-velero
  labels:
    by-squad: mcsp-controlplane
    for-product: trbo
    velero.io/schedule-name: bcdr-scheduler
    velero.io/storage-location: bcdr-s3-location
spec:
  volumeSnapshotLocations:
    - bcdr-volumesnapshot-location
  defaultVolumesToFsBackup: false
  excludedNamespaces:
    - openshift
    - openshift-apiserver
    - openshift-apiserver-operator
    - openshift-authentication
    - openshift-authentication-operator
    - openshift-cloud-credential-operator
    - openshift-cluster-machine-approver
    - openshift-cluster-node-tuning-operator
    - openshift-cluster-samples-operator
    - openshift-cluster-storage-operator
    - openshift-cluster-version
    - openshift-config
    - openshift-config-managed
    - openshift-console
    - openshift-console-operator
    - openshift-controller-manager
    - openshift-controller-manager-operator
    - openshift-dns
    - openshift-dns-operator
    - openshift-etcd
    - openshift-image-registry
    - openshift-infra
    - openshift-ingress
    - openshift-ingress-operator
    - openshift-insights
    - openshift-kni-infra
    - openshift-kube-apiserver
    - openshift-kube-apiserver-operator
    - openshift-kube-controller-manager
    - openshift-kube-controller-manager-operator
    - openshift-kube-proxy
    - openshift-kube-scheduler
    - openshift-kube-scheduler-operator
    - openshift-machine-api
    - openshift-machine-config-operator
    - openshift-monitoring
    - openshift-multus
    - openshift-network-operator
    - openshift-node
    - openshift-openstack-infra
    - openshift-operator-lifecycle-manager
    - openshift-ovirt-infra
    - openshift-service-ca
    - openshift-service-ca-operator
    - openshift-service-catalog-apiserver-operator
    - openshift-service-catalog-controller-manager-operator
    - openshift-user-workload-monitoring
    - velero
  csiSnapshotTimeout: 10m0s
  includedResources:
    - '*'
  ttl: 504h0m0s
  itemOperationTimeout: 4h0m0s
  metadata:
    labels:
      by-squad: mcsp-controlplane
      for-product: trbo
      velero.io/schedule-name: bcdr-scheduler
  storageLocation: bcdr-s3-location
  hooks: {}
  includeClusterResources: true
  includedNamespaces:
    - '*'
  snapshotVolumes: true
  excludedResources:
    - storageclasses.storage.k8s.io
    - imagestreams.image.openshift.io
  snapshotMoveData: false
status:
  formatVersion: 1.1.0
  backupItemOperationsCompleted: 890
  backupItemOperationsAttempted: 890
  progress:
    itemsBackedUp: 189113
    totalItems: 189113
  expiration: '2024-07-02T14:00:20Z'
  csiVolumeSnapshotsCompleted: 445
  csiVolumeSnapshotsAttempted: 445
  startTimestamp: '2024-06-11T14:00:21Z'
  version: 1
  completionTimestamp: '2024-06-11T15:47:48Z'
  phase: Completed

VolumeSnapshotLocation -

apiVersion: velero.io/v1
kind: VolumeSnapshotLocation
metadata:
  name: bcdr-volumesnapshot-location
  namespace: oadp-velero
spec:
  config:
    profile: default
    region: us-east-1
  provider: aws

VolumeSnapshotClass -

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: bcdr-csi-vsc
  labels:
    velero.io/csi-volumesnapshot-class: "true"
driver: ebs.csi.aws.com
deletionPolicy: Retain

BackupStorageLocation -

apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
  name: bcdr-data-protection-app-1
  namespace: oadp-velero
  ownerReferences:
    - apiVersion: oadp.openshift.io/v1alpha1
      blockOwnerDeletion: true
      controller: true
      kind: DataProtectionApplication
      name: bcdr-data-protection-app
      uid: f38b3e0f-6016-4fba-8ceb-21a9bd31f325
  labels:
    app.kubernetes.io/component: bsl
    app.kubernetes.io/instance: bcdr-data-protection-app-1
    app.kubernetes.io/managed-by: oadp-operator
    app.kubernetes.io/name: oadp-operator-velero
    openshift.io/oadp: 'True'
    openshift.io/oadp-registry: 'True'
spec:
  config:
    profile: default
    region: us-east-1
  credential:
    key: cloud
    name: cloud-credentials
  default: true
  objectStorage:
    bucket: bcdr-stg-cp-01-bcdr-us-east-1-s3
    prefix: bcdr-stg-cp-01
  provider: aws
status:
  lastSyncedTime: '2024-06-12T06:43:43Z'
  lastValidationTime: '2024-06-12T06:43:43Z'
  phase: Available

We do not want to use restic. Please suggest a solution to increase the efficiency because there is another cluster where 300 namespaces and each namespace will have 12 volumes.
In the shared example we have 40 namespaces and it is running for nearly 4 hours, if we have to take backup for 300 namespaces cluster I don't know what will be the impact and backup running time would be.

The text was updated successfully, but these errors were encountered:

Naveen-Kamagani added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OADP Operator - Backups and CSI Volume snapshots running a long time to complete the Scheduled backups #1422

OADP Operator - Backups and CSI Volume snapshots running a long time to complete the Scheduled backups #1422

Naveen-Kamagani commented Jun 12, 2024

OADP Operator - Backups and CSI Volume snapshots running a long time to complete the Scheduled backups #1422

OADP Operator - Backups and CSI Volume snapshots running a long time to complete the Scheduled backups #1422

Comments

Naveen-Kamagani commented Jun 12, 2024