-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloud_storage: topic_manifest_downloader::find_manifests #20755
Conversation
Adds a couple utility methods to get the topic namespace from a given path. This will be useful in an upcoming commit where we do this to parse list results. Why not have a single method? We could, but the upcoming use case is topic recovery, which uses a specific path prefix, and will know whether it expects labeled or hash prefixed objects. In that context, it seems natural to expect the correct kind of path.
These will be useful for callers that may want to list manifests, like the list-based topic recovery.
chunked_vector<ss::sstring> prefixed_topic_manifests_roots() { | ||
constexpr static auto hex_chars = std::string_view{"0123456789abcdef"}; | ||
chunked_vector<ss::sstring> roots; | ||
roots.reserve(hex_chars.size()); | ||
for (char c : hex_chars) { | ||
roots.emplace_back(fmt::format("{}0000000", c)); | ||
} | ||
return roots; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a static list of 16 strings? seems ok unless its in the hot path in which case perhaps it'd be an easy thing to optimize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it could be static/constexpr. Opted for brevity mostly, this is roughly what's used in topic_recovery_service today. Agreed though we can optimize in the future
ss::lowres_clock::time_point deadline, | ||
model::timestamp_clock::duration backoff, | ||
std::optional<tp_ns_filter_t> tp_filter, | ||
chunked_vector<topic_manifest>*); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chunked_vector<topic_manifest>*
is that output variable something that is built up iteratively, or could it be part of the return value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be built up iteratively, but isn't today. At least in a few places recently I've been trying a style where return types are a status/enum, and everything else is an out-parameter. I'm finding it much easier to express expected outcomes that way.
retry_chain_node& parent_retry, | ||
ss::lowres_clock::time_point deadline, | ||
model::timestamp_clock::duration backoff, | ||
std::optional<tp_ns_filter_t> tp_filter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: std::function already behaves sort of like an optional std::function::bool returns false if it doesn't contain a callable function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good to know. I'll keep this as is for now though, given it's somewhat easier to write a filter to include things, and the default value should be to include everything
topic_manifest_downloader dl(bucket, std::nullopt, tp, remote); | ||
// Not the most optimal since the downloader will check multiple paths, | ||
// even though we looked at paths above, but this is nice and tidy. | ||
auto res = co_await dl.download_manifest( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe download_manifest()
could take a value/container of "hint(s)" or indicator(s) for which format of manifest should be downloaded, if the caller has an idea of which one to retrieve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not that big a deal though. Just a thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, not a bad idea. I didn't spend much time optimizing, like I mentioned. All of these are great points but I don't think it's worth it just now.
7993565
to
49b02fd
Compare
This adds a find_manifests() method to the topic_manifest_downloader. This will subsume the logic currently in topic_recovery_service, that lists objects in the buckets looking for objects that look like topic manifests. With our object naming scheme changing, this will need to account for the new naming scheme. Admittedly, there's somewhat of an impedence mismatch since the topic_manifest_downloader expects to target one topic, so this just makes the method static. But it still feels natural for this logic and its tests to be coupled to topic_manifest_downloader. The code isn't as optimal as it could be in terms of number of list/get operations performed, since I wanted to rely on the downloader to duplicating some of the path-related complexity. But this endpoint is likely not going to be widely used, particularly as we begin having more clusters per bucket. Instead controller snapshot-based restore will be used.
49b02fd
to
df8b13a
Compare
This adds a find_manifests() method to the topic_manifest_downloader.
This will subsume the logic currently in topic_recovery_service, that
lists objects in the buckets looking for objects that look like topic
manifests. With our object naming scheme changing, this will need to
account for the new naming scheme.
Admittedly, there's somewhat of an impedence mismatch since the
topic_manifest_downloader expects to target one topic, so this just
makes the method static. But it still feels natural for this logic and
its tests to be coupled to topic_manifest_downloader.
The code isn't as optimal as it could be in terms of number of list/get
operations performed, since I wanted to rely on the downloader to
duplicating some of the path-related complexity. But this endpoint is
likely not going to be widely used, particularly as we begin having more
clusters per bucket. Instead controller snapshot-based restore will be
used.
Backports Required
Release Notes