You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
VolumeSnapshot has the .status.readyToUse flag which indicates if a snapshot is ready to be used to restore a volume.
snapscheduler does not take this flag into account when deciding weather the maxCount retention has been reached.
This results in the loss of the last opportunity for recovery.
Steps to reproduce
in GKE(in my case v1.28.11) with snapscheduler(v3.4.0) installed:
run some pod with new pvc in order to create the volume: $ kubectl -n default run -it --rm snapscheduler-test --image=gcr.io/distroless/static-debian12 --overrides='{"spec": {"restartPolicy": "Never", "volumes": [{"name": "pvc", "persistentVolumeClaim":{"claimName": "snapscheduler-test"}}]}}' -- sh
Expected behavior
❗ retention policy must not take into account VolumeSnapshots with .status.readyToUse==false.
❔ if possible, create a new snapshot only after the previous one has entered the ready state
Actual results
retention policy removes last valid snapshot, leaving no possibility of recovery
Additional context
The text was updated successfully, but these errors were encountered:
I agree... that's not good. I'm happy to have thoughts/suggestions on a good fix.
A few ideas:
Only count readyToUse snapshots when implementing the cleanup policy
This runs the risk of creating an unbounded number of (unready) snapshots, potentially consuming all available space (or excessive expense)
Skip the next snapshot if the previous is not ready
This will cause problems for environments where it takes a long time for the snapshot to become ready (e.g., AWS), causing SnapScheduler to miss intervals
If the policy determines that a snapshot should be deleted, we delete unready snapshots (starting with the oldest) before ready ones.
This has the same problem as (2) in being unable to handle intervals that are less than the time for a snapshot to become ready.
Describe the bug
VolumeSnapshot has the .status.readyToUse flag which indicates if a snapshot is ready to be used to restore a volume.
snapscheduler does not take this flag into account when deciding weather the
maxCount
retention has been reached.This results in the loss of the last opportunity for recovery.
Steps to reproduce
in GKE(in my case v1.28.11) with snapscheduler(v3.4.0) installed:
$ kubectl -n default run -it --rm snapscheduler-test --image=gcr.io/distroless/static-debian12 --overrides='{"spec": {"restartPolicy": "Never", "volumes": [{"name": "pvc", "persistentVolumeClaim":{"claimName": "snapscheduler-test"}}]}}' -- sh
readytouse=false
:maxCount: 3
retention policy)Expected behavior
❗ retention policy must not take into account VolumeSnapshots with
.status.readyToUse==false
.❔ if possible, create a new snapshot only after the previous one has entered the ready state
Actual results
retention policy removes last valid snapshot, leaving no possibility of recovery
Additional context
The text was updated successfully, but these errors were encountered: