-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clear topics orphan files with fix #8960
Clear topics orphan files with fix #8960
Conversation
/ci-repeat 10 |
/ci-repeat 10 |
/ci-repeat 10 |
/ci-repeat 10 |
/ci-repeat 10 |
/ci-repeat 10 |
0ecefac
to
a1a595a
Compare
/ci-repeat 10 |
a1a595a
to
9e222a1
Compare
/ci-repeat 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense to do this on controller bakcned start-up, or integrate it into the general GC and space reclaimation process that is driven by log manager?
Now we assume that orphan files can appear only on redpanda restart. |
Can you please describe the decisions that are being made to determine what is and is not an orphaned file? |
Orphaned files are those files that for some reason remained on the disk, although they should have been deleted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code lgtm, would be great to document as it is not trivial for a future reader.
@@ -289,9 +290,69 @@ void controller_backend::setup_metrics() { | |||
}); | |||
} | |||
|
|||
static absl::flat_hash_map<model::ntp, model::revision_id> | |||
create_topic_table_snapshot( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it will be helpful to have a comment why we make a snpashot and what exactly is in the snapshot.
ssx::spawn_with_gate( | ||
_gate, | ||
[this, bootstrap_revision, snapshot = std::move(snapshot)] { | ||
return clear_orphan_topic_files( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add some comments around what exactly orphan files and are why we do it at startup? (for readers who don't look at this as a commit).
} | ||
if ( | ||
topic_table_snapshot.contains(ntp) | ||
&& ntp_directory_data.revision_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would be nice to document what qualifies as an orphan in the method doc.
}); | ||
} | ||
|
||
ss::future<> log_manager::remove_orphan_files( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would be nice to mention that this walks thru the entire directory structure in an async fiber to remove...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good. You might want to catch all exceptions/failed futures in one place though. At the moment only std::filesystem::filesystem_error
is handled. If anything else bubbles up, I think the whole clean-up stops. It would make sense to attempt the clean-up for the rest of the ntps, even if one failed.
@@ -570,6 +631,65 @@ ss::future<> controller_backend::fetch_deltas() { | |||
}); | |||
} | |||
|
|||
bool topic_files_are_orphan( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logging in this function is a bit confusing to me. Cleaning up orphan topic files. ntp {} skip dir
makes it sound like we'll clear up.
@@ -437,6 +441,85 @@ ss::future<> log_manager::remove(model::ntp ntp) { | |||
co_await dispatch_topic_dir_deletion(topic_dir); | |||
} | |||
|
|||
ss::future<> remove_orphan_partition_files( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd move this free function into the anon namespace (namespace {}
) or mark it static.
9e222a1
to
e6facfa
Compare
e6facfa
to
5aad020
Compare
When redpanda is restarted while delete operation is not finish Partition files might be left on disk. We need to cleanup orphan partition files after bootstrap.
5aad020
to
2eb0976
Compare
/backport v22.3.x |
Failed to run cherry-pick command. I executed the below command:
|
When node is restarted while it was executing partition delete operation, this operation may be not finished and we will have orphan files for that partition left on device.
On node restart we don't have that partition data in memory so we couldn't retry partition delete operation and won't clean up partition files on reconciliation.
This pr bring garbage collector mechanism that will force delete partition files after node bootstrap.
We gather all existing ntps and then go through all data directories and check if topic directory responsible for any exisitng ntp if it don't we delete it.
#8396 pr with fix
Fixes #8919
Backports Required
UX Changes
Release Notes