Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pageserver crash when tenant detach is in progress can lead to detach never making progress (possible inconsistency on startup) #4284

Closed
LizardWizzard opened this issue May 19, 2023 · 1 comment
Labels
c/storage/pageserver Component: storage: pageserver t/bug Issue Type: Bug triaged bugs that were already triaged

Comments

@LizardWizzard
Copy link
Contributor

LizardWizzard commented May 19, 2023

Separated from #2238, and #2238 (comment) in particular.

The sequence:

Console calls delete, pageserver starts to delete files, then crashes and restarts. Because file deletion order is not specified and there no special detach marker which can prevent pageserver from loading this tenant during startup we can have half broken tenant that wont be deleted by following retry from console.

We'll have mark file for deletion of a tenant as described in the RFC. See https://github.com/zenithdb/zenith/blob/4158e24e60d294e0f039395ea95dd87f8ab317d9/docs/rfcs/022-pageserver-delete-from-s3.md#L76

Having as a separate issue to not forget about it and create separate tests for this case.


After #4855 this is only relevant for detach. Timeline delete/tenant delete now removes file in specific order. Interrupted operations now can be safely resumed.

There is one more possible glitch, we have ignored mark for tenants and combining it with unspecified order of deletion in fs::remove_dir_all its is possible that the mark will be removed first, then pageserver crashes and after that tenant leaves ignored state and appears as active.

See also #4326

@problame
Copy link
Contributor

Related: #3478

@vadim2404 vadim2404 added t/bug Issue Type: Bug c/storage/pageserver Component: storage: pageserver labels May 22, 2023
@shanyp shanyp added the triaged bugs that were already triaged label Jun 1, 2023
@LizardWizzard LizardWizzard changed the title Pageserver crash during deletion of a tenant can lead to delete never making progress Pageserver crash when tenant detach is in progress can lead to detach never making progress (possible inconsistency on startup) Aug 11, 2023
@LizardWizzard LizardWizzard removed their assignment Aug 14, 2023
@jcsp jcsp closed this as completed Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/pageserver Component: storage: pageserver t/bug Issue Type: Bug triaged bugs that were already triaged
Projects
None yet
Development

No branches or pull requests

5 participants