-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On restore PV is not getting restored correctly in Azure AKS #6595
Comments
Could you provide the yaml of PV post restored? The status should contain information about why no underlying disk is created |
pvc-41f06d44-68f8-4e82-a3e3-f479ae7c09c4.txt please find the yaml of restored PV |
The PV is bound and it's underlying disk is
|
This yaml got restored as it is from the backup point. the PV name was same during backup and post restore. Before retore, the PV and underlying disk mentioned in volume handle was deleted. During restore the PV yaml was restored as it is without creating the underlying disk. So post restore the underlying disk is missing |
@soumyapattnaik Could you upload the debug bundle by running |
I had to repro the issue again as my backup point had got cleaned up. This time i created two deployments, one with pvc as retain data and other with delete data.
Then i deleted all the data in my namespace for both the type of data
Post restore
As i can see for PV with retain data again the PV was created with same name as during backup- pvc-0e6b64d1-f141-4bf4-b044-ffdc02c99b3f with the underlying azure disk not created
Attaching the bundle for debugging. |
@blackpiglet I reproduced this issue in my env. |
PV:
PVC:
Pod cannot start up because of the not-found disk:
|
@soumyapattnaik |
Looks like CSI snapshot data mover is also affected |
If we wanna track how PVs are snapshotted, we need a design, as this may be a break change. |
@Lyndon-Li will this change be relatively trivial to add or will it require design as @reasonerjt commented above? |
I feel this is a critical scenario and could impact restores for end customers, and I wanted to see how we can prioritize this... |
@anshulahuja98 |
Can't we just enhance the check for |
Skipping PVCs which had Retain tag but still had snapshots and no PVC in target cluster |
Not preferring that way is it's better to not read the result of the plugin result. Enhancing Not just the skipped PVCs. The plan is to extend the summary and add all PVCs handling information, including backed up by which method, status, snapshot, and other useful information, then store that in the OSS backup storage as a separate metadata file. |
are you suggesting that once this information is backed up in a different JSON file in the storage. You'll rely on that value in the |
Currently also I believe the hasSnapshot checks for snapshots taken by plugins of diff clouds |
That makes sense. The backporting change should be more concise. |
Great that we have consensus! |
@anshulahuja98 |
@soumyapattnaik @anshulahuja98 @shubham-pampattiwar Please note that the main release will introduce a new backup volume information metadata. Velero will decide according to the content of the new metadata file. The fix is already in the main release, and it will be included in the coming v1.13 release. The fix PR is #7061. It is used to let Velero can also work with old format backup. According to the N-2 support policy, #7061 will be kept for two releases, which means it will be removed in v1.15.x. After that, the backup volume information metadata will be used instead. Is this acceptable to you? |
@blackpiglet I understand that you would not prefer take out a release for older versions <= 1.11 Just having it on main is not sufficient, atleast 1 GAd release should ideally have the changes. |
@anshulahuja98 |
Great, thank you so much @blackpiglet. |
What steps did you take and what happened:
I created a namespace soumya and created a deployment, pvc and pv(dynamically provisioned csi disk) in the namespace. The PV created was with retain data.
Then i triggered velero backup for the cluster.
After the backup is complete, i deleted the deployment, pvc, pv and the underlying azure disk and then retriggered restored
Post restore i see that deployment anf pvc is restored. PV is restored with the same name as it was backed up and the underlying disk was missing.
What did you expect to happen:
on restore the PV should have been created with a different name as it is dynamically provisioned. To fully confirm this behaviour i retriggered another restore from the same backup , but this time i restored the backed up resources in a different namespace soumya1. Post restore i could see deployment, pvc, pv(with a different name) and also the underlying disk getting created successfully.
Sharing the data post restore. During backup also the pv name was
pvc-41f06d44-68f8-4e82-a3e3-f479ae7c09c4
Attaching the logs for backup and restores
clusterbackup-dataprotection-microsoft-backup-47e70f1e-c1fa-4615-bc20-b01bf772c1d5-logs
- backup logsrestore-clusterbackup-dataprotection-microsoft-restore-6f926513-fc7e-439d-a20c-a8cda64973a4-logs
- restore to soumya namespace logs. Here the same name pv is getting created which is not expectedrestore-clusterbackup-dataprotection-microsoft-restore-8aa648c4-4ac2-4729-acef-220a613e6eeb-logs
- restore to namespace soumya1 with dynamically created PV which is the desired behaviorrestore-clusterbackup-dataprotection-microsoft-restore-6f926513-fc7e-439d-a20c-a8cda64973a4-logs.gz
clusterbackup-dataprotection-microsoft-backup-47e70f1e-c1fa-4615-bc20-b01bf772c1d5-logs.gz
restore-clusterbackup-dataprotection-microsoft-restore-8aa648c4-4ac2-4729-acef-220a613e6eeb-logs.gz
The text was updated successfully, but these errors were encountered: