Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RBD: Snapshots >450 cannot be taken on a PVC in certain cases #4069

Closed
Rakshith-R opened this issue Aug 22, 2023 · 5 comments · May be fixed by #4029
Closed

RBD: Snapshots >450 cannot be taken on a PVC in certain cases #4069

Rakshith-R opened this issue Aug 22, 2023 · 5 comments · May be fixed by #4029
Assignees
Labels
bug Something isn't working component/rbd Issues related to RBD wontfix This will not be worked on

Comments

@Rakshith-R
Copy link
Contributor

Describe the bug

CephCSI follows the below logic before a k8s snapshot or pvc-pvc clone is created.

// check snapshots on the rbd image, as we have limit from krbd that an image
// cannot have more than 510 snapshot at a given point of time. If the
// snapshots are more than the `maxSnapshotsOnImage` Add a task to flatten all
// the temporary cloned images and return ABORT error message. If the snapshots
// are more than the `minSnapshotOnImage` Add a task to flatten all the
// temporary cloned images.
func flattenTemporaryClonedImages(ctx context.Context, rbdVol *rbdVolume, cr *util.Credentials) error {
snaps, err := rbdVol.listSnapshots()

But if the cloned image[1] is in trash[2],
CephCSI still tries to add a task to flatten that cloned image (without checking if it is in trash) and fails.
The failure is only logged. CephCSI will not be able to create any more snapshots or pvc-pvc clones once the number hits 450+.

[1] cloned image: underlying a k8s snapshot or pvc-pvc clone)
[2] image in trash: this occurs when the image has child images of its own still alive.

Environment details

  • Image/version of Ceph CSI driver : all
  • Helm chart version : all
  • Kernel version : all
  • Mounter used for mounting PVC (for cephFS its fuse or kernel. for rbd its
    krbd or rbd-nbd) : rbd
  • Kubernetes cluster version : all
  • Ceph cluster version : all

Steps to reproduce

Steps to reproduce the behavior:

  1. Create a rbd PVC.
  2. Repeat both the belove steps greater than 450 times.
    • Create Snapshot.
    • Restore Snapshot into PVC.
    • Delete Snapshot.
  3. No more snapshots above 450 can be created.

Actual results

The final snapshot creation fails.

Expected behavior

Snapshots should be created without error.


Propose solutions 1:

During k8s Snapshot Deletion :

  • Check snapshot count on parent rbd image(underlying the parent PVC) [ Maybe skip this step and always flatten ? ]
  • Add a task to flatten on rbd image underlying the k8s snapshot itself
  • Move k8s Snapshot rbd image to trash and add a task to remove it [current process]

@idryomov @Madhu-1 @nixpanic Can you please provide your opinions/suggestions regarding this?

@Rakshith-R Rakshith-R added bug Something isn't working component/rbd Issues related to RBD labels Aug 22, 2023
@Rakshith-R Rakshith-R added this to the release-v3.10.0 milestone Aug 22, 2023
@Rakshith-R Rakshith-R changed the title RBD: flattening when max Snapshots limit is reached fails RBD: Snapshots >450 cannot be taken on a PVC in certain cases Aug 22, 2023
@Rakshith-R Rakshith-R self-assigned this Sep 4, 2023
@github-actions
Copy link

github-actions bot commented Oct 4, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the wontfix This will not be worked on label Oct 4, 2023
@Rakshith-R Rakshith-R removed the wontfix This will not be worked on label Oct 5, 2023
Copy link

github-actions bot commented Nov 4, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the wontfix This will not be worked on label Nov 4, 2023
Copy link

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 12, 2023
@Rakshith-R Rakshith-R reopened this Nov 13, 2023
@Rakshith-R Rakshith-R removed the wontfix This will not be worked on label Nov 13, 2023
@nixpanic nixpanic removed this from the release-v3.10.0 milestone Nov 14, 2023
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the wontfix This will not be worked on label Dec 14, 2023
Copy link

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component/rbd Issues related to RBD wontfix This will not be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants