Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OADP Operator - All scheduled backups are failing #1420

Open
Naveen-Kamagani opened this issue Jun 11, 2024 · 5 comments
Open

OADP Operator - All scheduled backups are failing #1420

Naveen-Kamagani opened this issue Jun 11, 2024 · 5 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@Naveen-Kamagani
Copy link

apiVersion: velero.io/v1
kind: Backup
metadata:
  annotations:
    velero.io/resource-timeout: 10m0s
    velero.io/source-cluster-k8s-gitversion: v1.26.15+4818370
    velero.io/source-cluster-k8s-major-version: '1'
    velero.io/source-cluster-k8s-minor-version: '26'
  resourceVersion: '822753579'
  name: bcdr-scheduler-20240611110024
  uid: b2f10f1c-2b86-4361-953e-95f514d09ebe
  creationTimestamp: '2024-06-11T11:00:24Z'
  generation: 3
  managedFields:
    - apiVersion: velero.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            .: {}
            'f:velero.io/resource-timeout': {}
            'f:velero.io/source-cluster-k8s-gitversion': {}
            'f:velero.io/source-cluster-k8s-major-version': {}
            'f:velero.io/source-cluster-k8s-minor-version': {}
          'f:labels':
            .: {}
            'f:by-squad': {}
            'f:for-product': {}
            'f:velero.io/schedule-name': {}
            'f:velero.io/storage-location': {}
        'f:spec':
          'f:itemOperationTimeout': {}
          'f:snapshotVolumes': {}
          'f:metadata':
            .: {}
            'f:labels':
              .: {}
              'f:by-squad': {}
              'f:for-product': {}
              'f:velero.io/schedule-name': {}
          'f:excludedResources': {}
          'f:includedNamespaces': {}
          'f:volumeSnapshotLocations': {}
          .: {}
          'f:snapshotMoveData': {}
          'f:includedResources': {}
          'f:defaultVolumesToFsBackup': {}
          'f:excludedNamespaces': {}
          'f:ttl': {}
          'f:csiSnapshotTimeout': {}
          'f:storageLocation': {}
          'f:hooks': {}
          'f:includeClusterResources': {}
        'f:status':
          .: {}
          'f:completionTimestamp': {}
          'f:expiration': {}
          'f:failureReason': {}
          'f:formatVersion': {}
          'f:phase': {}
          'f:startTimestamp': {}
          'f:version': {}
      manager: velero-server
      operation: Update
      time: '2024-06-11T11:03:26Z'
  namespace: oadp-velero
  labels:
    by-squad: mcsp-controlplane
    for-product: trbo
    velero.io/schedule-name: bcdr-scheduler
    velero.io/storage-location: bcdr-s3-location
spec:
  volumeSnapshotLocations:
    - bcdr-volumesnapshot-location
  defaultVolumesToFsBackup: false
  excludedNamespaces:
    - openshift
    - openshift-apiserver
    - openshift-apiserver-operator
    - openshift-authentication
    - openshift-authentication-operator
    - openshift-cloud-credential-operator
    - openshift-cluster-machine-approver
    - openshift-cluster-node-tuning-operator
    - openshift-cluster-samples-operator
    - openshift-cluster-storage-operator
    - openshift-cluster-version
    - openshift-config
    - openshift-config-managed
    - openshift-console
    - openshift-console-operator
    - openshift-controller-manager
    - openshift-controller-manager-operator
    - openshift-dns
    - openshift-dns-operator
    - openshift-etcd
    - openshift-image-registry
    - openshift-infra
    - openshift-ingress
    - openshift-ingress-operator
    - openshift-insights
    - openshift-kni-infra
    - openshift-kube-apiserver
    - openshift-kube-apiserver-operator
    - openshift-kube-controller-manager
    - openshift-kube-controller-manager-operator
    - openshift-kube-proxy
    - openshift-kube-scheduler
    - openshift-kube-scheduler-operator
    - openshift-machine-api
    - openshift-machine-config-operator
    - openshift-monitoring
    - openshift-multus
    - openshift-network-operator
    - openshift-node
    - openshift-openstack-infra
    - openshift-operator-lifecycle-manager
    - openshift-ovirt-infra
    - openshift-service-ca
    - openshift-service-ca-operator
    - openshift-service-catalog-apiserver-operator
    - openshift-service-catalog-controller-manager-operator
    - openshift-user-workload-monitoring
    - velero
  csiSnapshotTimeout: 10m0s
  includedResources:
    - '*'
  ttl: 504h0m0s
  itemOperationTimeout: 4h0m0s
  metadata:
    labels:
      by-squad: mcsp-controlplane
      for-product: trbo
      velero.io/schedule-name: bcdr-scheduler
  storageLocation: bcdr-s3-location
  hooks: {}
  includeClusterResources: true
  includedNamespaces:
    - '*'
  snapshotVolumes: true
  excludedResources:
    - storageclasses.storage.k8s.io
    - imagestreams.image.openshift.io
  snapshotMoveData: false
status:
  completionTimestamp: '2024-06-11T11:03:26Z'
  expiration: '2024-07-02T11:00:24Z'
  failureReason: >-
    found a backup with status "InProgress" during the server starting, mark it
    as "Failed"
  formatVersion: 1.1.0
  phase: Failed
  startTimestamp: '2024-06-11T11:00:24Z'
  version: 1

OADP Operator version is v1.3.2

oc -n oadp-velero exec deployment/velero -c velero -- ./velero \
 version  
Client:
       Version: v1.12.4-OADP
       Git commit: -
Server:
       Version: v1.12.4-OADP

velero log -

time="2024-06-11T11:03:22Z" level=info msg="All Velero custom resource definitions exist" logSource="/remote-source/velero/app/pkg/cmd/server/server.go:513"
time="2024-06-11T11:03:22Z" level=warning msg="Velero node agent not found; pod volume backups/restores will not work until it's created" logSource="/remote-source/velero/app/pkg/cmd/server/server.go:585"
I0611 11:03:25.942975 1 request.go:690] Waited for 3.696252465s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/discovery.k8s.io/v1?timeout=32s
time="2024-06-11T11:03:26Z" level=warning msg="found a backup with status \"InProgress\" during the server starting, mark it as \"Failed\"" backup=bcdr-scheduler-20240611110024 logSource="/remote-source/velero/app/pkg/cmd/server/server.go:1077"
@Naveen-Kamagani Naveen-Kamagani added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 11, 2024
@shubham-pampattiwar
Copy link
Member

This could happen if you triggered a backup and when the backup was in-progress, the velero server pod was restarted, in such scenarios the backup is marked as Failed

@Naveen-Kamagani
Copy link
Author

I uninstalled and installed the OADP operator and ran the backup manually,

apiVersion: velero.io/v1
kind: Backup
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: >
      {"apiVersion":"velero.io/v1","kind":"Backup","metadata":{"annotations":{},"name":"bcdr-scheduler-ondemand-vol-3","namespace":"oadp-velero"},"spec":{"csiSnapshotTimeout":"10m0s","defaultVolumesToFsBackup":false,"excludedNamespaces":["openshift","openshift-apiserver","openshift-apiserver-operator","openshift-authentication","openshift-authentication-operator","openshift-cloud-credential-operator","openshift-cluster-machine-approver","openshift-cluster-node-tuning-operator","openshift-cluster-samples-operator","openshift-cluster-storage-operator","openshift-cluster-version","openshift-config","openshift-config-managed","openshift-console","openshift-console-operator","openshift-controller-manager","openshift-controller-manager-operator","openshift-dns","openshift-dns-operator","openshift-etcd","openshift-image-registry","openshift-infra","openshift-ingress","openshift-ingress-operator","openshift-insights","openshift-kni-infra","openshift-kube-apiserver","openshift-kube-apiserver-operator","openshift-kube-controller-manager","openshift-kube-controller-manager-operator","openshift-kube-proxy","openshift-kube-scheduler","openshift-kube-scheduler-operator","openshift-machine-api","openshift-machine-config-operator","openshift-monitoring","openshift-multus","openshift-network-operator","openshift-node","openshift-openstack-infra","openshift-operator-lifecycle-manager","openshift-ovirt-infra","openshift-service-ca","openshift-service-ca-operator","openshift-service-catalog-apiserver-operator","openshift-service-catalog-controller-manager-operator","openshift-user-workload-monitoring","velero"],"excludedResources":["storageclasses.storage.k8s.io","imagestreams.image.openshift.io"],"hooks":{},"includeClusterResources":true,"includedNamespaces":["*"],"includedResources":["*"],"snapshotVolumes":true,"storageLocation":"bcdr-s3-location","ttl":"504h0m0s","volumeSnapshotLocations":["bcdr-volumesnapshot-location"]}}
    velero.io/resource-timeout: 10m0s
    velero.io/source-cluster-k8s-gitversion: v1.26.15+4818370
    velero.io/source-cluster-k8s-major-version: '1'
    velero.io/source-cluster-k8s-minor-version: '26'
  resourceVersion: '825278564'
  name: bcdr-scheduler-ondemand-vol-3
  uid: 30526dba-4135-4f54-98a5-13004c567634
  creationTimestamp: '2024-06-12T13:50:33Z'
  generation: 3
  managedFields:
    - apiVersion: velero.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            .: {}
            'f:kubectl.kubernetes.io/last-applied-configuration': {}
        'f:spec':
          'f:snapshotVolumes': {}
          'f:excludedResources': {}
          'f:includedNamespaces': {}
          'f:volumeSnapshotLocations': {}
          .: {}
          'f:includedResources': {}
          'f:defaultVolumesToFsBackup': {}
          'f:excludedNamespaces': {}
          'f:ttl': {}
          'f:csiSnapshotTimeout': {}
          'f:storageLocation': {}
          'f:hooks': {}
          'f:includeClusterResources': {}
      manager: kubectl-client-side-apply
      operation: Update
      time: '2024-06-12T13:50:33Z'
    - apiVersion: velero.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            'f:velero.io/resource-timeout': {}
            'f:velero.io/source-cluster-k8s-gitversion': {}
            'f:velero.io/source-cluster-k8s-major-version': {}
            'f:velero.io/source-cluster-k8s-minor-version': {}
          'f:labels':
            .: {}
            'f:velero.io/storage-location': {}
        'f:spec':
          'f:itemOperationTimeout': {}
          'f:snapshotMoveData': {}
        'f:status':
          .: {}
          'f:completionTimestamp': {}
          'f:expiration': {}
          'f:failureReason': {}
          'f:formatVersion': {}
          'f:phase': {}
          'f:startTimestamp': {}
          'f:version': {}
      manager: velero-server
      operation: Update
      time: '2024-06-12T13:53:13Z'
  namespace: oadp-velero
  labels:
    velero.io/storage-location: bcdr-s3-location
spec:
  volumeSnapshotLocations:
    - bcdr-volumesnapshot-location
  defaultVolumesToFsBackup: false
  excludedNamespaces:
    - openshift
    - openshift-apiserver
    - openshift-apiserver-operator
    - openshift-authentication
    - openshift-authentication-operator
    - openshift-cloud-credential-operator
    - openshift-cluster-machine-approver
    - openshift-cluster-node-tuning-operator
    - openshift-cluster-samples-operator
    - openshift-cluster-storage-operator
    - openshift-cluster-version
    - openshift-config
    - openshift-config-managed
    - openshift-console
    - openshift-console-operator
    - openshift-controller-manager
    - openshift-controller-manager-operator
    - openshift-dns
    - openshift-dns-operator
    - openshift-etcd
    - openshift-image-registry
    - openshift-infra
    - openshift-ingress
    - openshift-ingress-operator
    - openshift-insights
    - openshift-kni-infra
    - openshift-kube-apiserver
    - openshift-kube-apiserver-operator
    - openshift-kube-controller-manager
    - openshift-kube-controller-manager-operator
    - openshift-kube-proxy
    - openshift-kube-scheduler
    - openshift-kube-scheduler-operator
    - openshift-machine-api
    - openshift-machine-config-operator
    - openshift-monitoring
    - openshift-multus
    - openshift-network-operator
    - openshift-node
    - openshift-openstack-infra
    - openshift-operator-lifecycle-manager
    - openshift-ovirt-infra
    - openshift-service-ca
    - openshift-service-ca-operator
    - openshift-service-catalog-apiserver-operator
    - openshift-service-catalog-controller-manager-operator
    - openshift-user-workload-monitoring
    - velero
  csiSnapshotTimeout: 10m0s
  includedResources:
    - '*'
  ttl: 504h0m0s
  itemOperationTimeout: 4h0m0s
  storageLocation: bcdr-s3-location
  hooks: {}
  includeClusterResources: true
  includedNamespaces:
    - '*'
  snapshotVolumes: true
  excludedResources:
    - storageclasses.storage.k8s.io
    - imagestreams.image.openshift.io
  snapshotMoveData: false
status:
  completionTimestamp: '2024-06-12T13:53:13Z'
  expiration: '2024-07-03T13:50:33Z'
  failureReason: >-
    found a backup with status "InProgress" during the server starting, mark it
    as "Failed"
  formatVersion: 1.1.0
  phase: Failed
  startTimestamp: '2024-06-12T13:50:40Z'
  version: 1

There are no backups triggered but still the backup is going into failed state.

@Naveen-Kamagani
Copy link
Author

@shubham-pampattiwar I made sure no backups are triggered while this backup is running but still gets the same issue

@Naveen-Kamagani
Copy link
Author

Naveen-Kamagani commented Jun 13, 2024

Can some one help me on this issue?

time="2024-06-13T12:03:42Z" level=info msg="All Velero custom resource definitions exist" logSource="/remote-source/velero/app/pkg/cmd/server/server.go:513"
time="2024-06-13T12:03:42Z" level=warning msg="Velero node agent not found; pod volume backups/restores will not work until it's created" logSource="/remote-source/velero/app/pkg/cmd/server/server.go:585"
time="2024-06-13T12:03:46Z" level=warning msg="found a backup with status \"InProgress\" during the server starting, mark it as \"Failed\"" backup=bcdr-scheduler-20240613120039 logSource="/remote-source/velero/app/pkg/cmd/server/server.go:1077"
time="2024-06-13T12:03:46Z" level=info msg="Starting controllers" logSource="/remote-source/velero/app/pkg/cmd/server/server.go:640"

@kaovilai
Copy link
Member

helped via velero issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants