Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud_storage: topic_manifest_downloader::find_manifests #20755

Merged
merged 3 commits into from
Jun 29, 2024

Conversation

andrwng
Copy link
Contributor

@andrwng andrwng commented Jun 28, 2024

This adds a find_manifests() method to the topic_manifest_downloader.
This will subsume the logic currently in topic_recovery_service, that
lists objects in the buckets looking for objects that look like topic
manifests. With our object naming scheme changing, this will need to
account for the new naming scheme.

Admittedly, there's somewhat of an impedence mismatch since the
topic_manifest_downloader expects to target one topic, so this just
makes the method static. But it still feels natural for this logic and
its tests to be coupled to topic_manifest_downloader.

The code isn't as optimal as it could be in terms of number of list/get
operations performed, since I wanted to rely on the downloader to
duplicating some of the path-related complexity. But this endpoint is
likely not going to be widely used, particularly as we begin having more
clusters per bucket. Instead controller snapshot-based restore will be
used.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.1.x
  • v23.3.x
  • v23.2.x

Release Notes

  • none

andrwng added 2 commits June 28, 2024 11:14
Adds a couple utility methods to get the topic namespace from a given
path. This will be useful in an upcoming commit where we do this to
parse list results.

Why not have a single method? We could, but the upcoming use case is
topic recovery, which uses a specific path prefix, and will know whether
it expects labeled or hash prefixed objects. In that context, it seems
natural to expect the correct kind of path.
These will be useful for callers that may want to list manifests, like
the list-based topic recovery.
@andrwng andrwng requested review from WillemKauf, dotnwat and abhijat and removed request for WillemKauf June 28, 2024 18:17
dotnwat
dotnwat previously approved these changes Jun 28, 2024
Comment on lines +29 to +37
chunked_vector<ss::sstring> prefixed_topic_manifests_roots() {
constexpr static auto hex_chars = std::string_view{"0123456789abcdef"};
chunked_vector<ss::sstring> roots;
roots.reserve(hex_chars.size());
for (char c : hex_chars) {
roots.emplace_back(fmt::format("{}0000000", c));
}
return roots;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a static list of 16 strings? seems ok unless its in the hot path in which case perhaps it'd be an easy thing to optimize.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it could be static/constexpr. Opted for brevity mostly, this is roughly what's used in topic_recovery_service today. Agreed though we can optimize in the future

ss::lowres_clock::time_point deadline,
model::timestamp_clock::duration backoff,
std::optional<tp_ns_filter_t> tp_filter,
chunked_vector<topic_manifest>*);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chunked_vector<topic_manifest>*

is that output variable something that is built up iteratively, or could it be part of the return value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be built up iteratively, but isn't today. At least in a few places recently I've been trying a style where return types are a status/enum, and everything else is an out-parameter. I'm finding it much easier to express expected outcomes that way.

retry_chain_node& parent_retry,
ss::lowres_clock::time_point deadline,
model::timestamp_clock::duration backoff,
std::optional<tp_ns_filter_t> tp_filter,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: std::function already behaves sort of like an optional std::function::bool returns false if it doesn't contain a callable function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good to know. I'll keep this as is for now though, given it's somewhat easier to write a filter to include things, and the default value should be to include everything

WillemKauf
WillemKauf previously approved these changes Jun 28, 2024
src/v/cloud_storage/topic_manifest_downloader.cc Outdated Show resolved Hide resolved
src/v/cloud_storage/topic_manifest_downloader.cc Outdated Show resolved Hide resolved
topic_manifest_downloader dl(bucket, std::nullopt, tp, remote);
// Not the most optimal since the downloader will check multiple paths,
// even though we looked at paths above, but this is nice and tidy.
auto res = co_await dl.download_manifest(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe download_manifest() could take a value/container of "hint(s)" or indicator(s) for which format of manifest should be downloaded, if the caller has an idea of which one to retrieve?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not that big a deal though. Just a thought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, not a bad idea. I didn't spend much time optimizing, like I mentioned. All of these are great points but I don't think it's worth it just now.

@andrwng andrwng dismissed stale reviews from WillemKauf and dotnwat via 49b02fd June 28, 2024 20:15
@andrwng andrwng force-pushed the topic-manifest-finder branch from 7993565 to 49b02fd Compare June 28, 2024 20:15
This adds a find_manifests() method to the topic_manifest_downloader.
This will subsume the logic currently in topic_recovery_service, that
lists objects in the buckets looking for objects that look like topic
manifests. With our object naming scheme changing, this will need to
account for the new naming scheme.

Admittedly, there's somewhat of an impedence mismatch since the
topic_manifest_downloader expects to target one topic, so this just
makes the method static. But it still feels natural for this logic and
its tests to be coupled to topic_manifest_downloader.

The code isn't as optimal as it could be in terms of number of list/get
operations performed, since I wanted to rely on the downloader to
duplicating some of the path-related complexity. But this endpoint is
likely not going to be widely used, particularly as we begin having more
clusters per bucket. Instead controller snapshot-based restore will be
used.
@andrwng andrwng force-pushed the topic-manifest-finder branch from 49b02fd to df8b13a Compare June 28, 2024 20:18
@andrwng andrwng requested review from dotnwat and WillemKauf June 28, 2024 20:19
@andrwng andrwng merged commit b529169 into redpanda-data:dev Jun 29, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants