-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloud_storage: remote labels #20778
cloud_storage: remote labels #20778
Conversation
/ci-repeat |
/ci-repeat |
/cdt |
5b5661a
to
38ee5f1
Compare
/ci-repeat |
972ee44
to
1603f89
Compare
There's still a CI failure (likely a test issue), but I think this otherwise ready for review. |
965d4f2
to
83dee1f
Compare
This is a preemptive commit for the upcoming addition of remote labels to the topic properties.
This plumbs the remote label (cluster uuid) associated with a topic from the topics table into the archival meta stm, in the same fashion that vcluster-id is plumbed into the rm_stm. Conceptually, the archival STM will be the owner of the root remote path provider used throughout the cloud_storage and archival subsystems.
This replaces path generation in ntp_archiver with variants that use the path provider. Namely: - topic manifest uploads - partition manifest uploads - segment uploads - generating spillover manifests
This plumbs the archival stm's path provider into the manifest view. The manifest view serves as the underlying abstraction to the remote_partition class, so this will be necessary for getting the path provider to be used on the remote read path.
Topic recovery need to be able to find a topic manifest whose format (serde or json) is not yet known. This is handled transparently by the topic manifest downloader, while also taking into account remote labels. This commit updates the create topic path to use the downloader.
Automated topic recovery performs a list bucket operation to discover what topics can be restored. This commit updates this to account for topic manifests labeled with the cluster UUID.
Snapshot recovery needs to be able to find a partition manifest whose format (serde or json) is not yet known. This is handled transparently by the partition manifest downloader, while also taking remote labels into account. This commit updates the STM to use the downloader for recovery.
@@ -132,6 +132,7 @@ class remote_partition | |||
static ss::future<erase_result> erase( | |||
cloud_storage::remote&, | |||
cloud_storage_clients::bucket_name, | |||
const remote_path_provider& path_provider, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can store remote_path_provider
inside the materialized_resources
instance. It is available through remote
. So you won't have to pass it directly into both archiver and remote_partition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can since the path provider is tied to each topic since for example a recovery topic will have a different base label
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -113,6 +117,10 @@ def read_topic_properties_serde(rdr: Reader, version): | |||
'flush_ms': rdr.read_optional(Reader.read_int64), | |||
'flush_bytes': rdr.read_optional(Reader.read_int64) | |||
} | |||
if version >= 9: | |||
topic_properties |= { | |||
'remote_labels': rdr.read_optional(read_remote_label_serde) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this remote_labels
intended to be singular?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops! Yeah.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
res); | ||
co_return std::make_tuple(errc::topic_operation_error, json_path.second); | ||
if ( | ||
download_res.value() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does values that aren't success (but also aren't errors?) get logged in download_manifest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good callout -- they aren't. Will add some logging in a follow-up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -818,12 +819,12 @@ ss::future<> ntp_archiver::sync_manifest_until_term_change() { | |||
vlog( | |||
_rtclog.error, | |||
"Failed to download manifest {}", | |||
manifest().get_manifest_path()); | |||
manifest().get_manifest_path(remote_path_provider())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remote_path_prvider
is passed into every call? IIUC the remote_label
is either set for the partition or not. So it'd be possible to pass remote_path_provider
to the c-tor of the partition_manifest
in the archival STM (which also "knows" about remote_label
now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We chatted about this offline and agreed that making the path provider a member of the manifest probably isn't the way to go, given how we use partition_manifest in many places as just a struct that knows about serialization. For now, having a singular remote_path_provider and passing it around is tentatively the least evil.
@@ -3257,6 +3261,11 @@ const storage::ntp_config& ntp_archiver::ntp_config() const { | |||
return _parent.log()->config(); | |||
} | |||
|
|||
const cloud_storage::remote_path_provider& | |||
ntp_archiver::remote_path_provider() const { | |||
return _parent.archival_meta_stm()->path_provider(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as before, it's always the same path provider for the partition and it lives in archival STM so why not connect manifest and path provider on that level instead of doing this in every call? This makes code confusing because the reader might think that the path provider could be changed from upload to upload. Also, the async_manifest_view
is getting the path provider in c-tor and partition_manifest
doesn't. Which is also a bit confusing and not uniform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pasting from other PR comments:
I think the shift in mindset is that the manifest view and the partition manifest are not equivalent to each other. The manifest view is much more full-featured and must be able to access paths of an STM manifest and spillover manifests, unlike the partition_manifest class which is focused on tracking member fields and serialization.
@@ -158,6 +160,10 @@ class async_manifest_view { | |||
std::optional<size_t> size_limit, | |||
std::optional<std::chrono::milliseconds> time_limit) noexcept; | |||
|
|||
const remote_path_provider& path_provider() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is confusing, why async_manifest_view
exposes its own path provider which potentially could be different from the one in the archival_metadata_stm
. If I'm writing the code in the remote_partition
or ntp_archiver
I can use async_manifest_view::path_provider
or archival_metadata_stm::path_provider
. What is the goal here? Why can't it be routed everywhere in c-tor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pasting from another PR comment:
I think it makes sense for the remote partition to depend on async_manifest_view's -- the hierarchy at least makes sense because the remote_partition uses the view for both getting segments and cursoring over spillover manifests (done by the view internally with the view's path provider)
@@ -1602,8 +1602,8 @@ remote_manifest_path async_manifest_view::get_spillover_manifest_path( | |||
.base_ts = meta.base_timestamp, | |||
.last_ts = meta.max_timestamp, | |||
}; | |||
return generate_spillover_manifest_path( | |||
get_ntp(), _stm_manifest.get_revision_id(), comp); | |||
return remote_manifest_path{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here the code actually uses internal path provider to return correct name instead of accepting path provider as a parameter which is different from the way partition_manifest is implemented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, I think the shift in mindset is that the manifest view and the partition manifest are not equivalent to each other. The manifest view is much more full-featured and must be able to access paths of an STM manifest and spillover manifests, unlike the partition_manifest class which is focused on tracking member fields and serialization.
@@ -199,7 +200,8 @@ remote_partition::borrow_result_t remote_partition::borrow_next_segment_reader( | |||
} | |||
} | |||
if (iter == _segments.end()) { | |||
auto path = manifest.generate_segment_path(*mit); | |||
auto path = manifest.generate_segment_path( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case the manifest
is a spillover manifest, you're using _manifest_view->path_provider()
here but in order to generate the spillover manifest the code is picking up the right path provider automatically. It's not incorrect but it's confusing to route all these path_providers
to some calls and not other calls. It will be difficult for someone new to understand the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed offline, I'll pass the path provider directly to the remote_partition
rather than relying on the view's path provider. Though FWIW, I think it makes sense for the remote partition to depend on async_manifest_view's -- the hierarchy at least makes sense because the remote_partition uses the view for both getting segments and cursoring over spillover manifests (done by the view internally with the view's path provider)
@@ -1068,7 +1070,8 @@ remote_partition::aborted_transactions(offset_range offsets) { | |||
// up front at the start of the function. | |||
auto segment_unit = co_await materialized().get_segment_units( | |||
std::nullopt); | |||
auto path = stm_manifest.generate_segment_path(*it); | |||
auto path = stm_manifest.generate_segment_path( | |||
*it, _manifest_view->path_provider()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the path provider from _manifest_view
is used with stm_manifest
which has its own path_provider
(I know that there is one path provider per partition but the code is written in such way as if everything could have a different instance of path_provider
but on the other hand it doesn't respect this in all cases, like here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed a bit and agreed that we could plumb a path provider directly into remote_partition.
I'll at least defend this current impl a bit, but I'm happy to make that change if you still think it's less confusing.
The async manifest view is the root of the read path, in the same way that ntp archiver is the root of the write path. Ultimately they’re both rooted in the STM’s path provider, but at least there is a conceptual hierarchy , not just "dumb plumbing"
ssx::task_local_ptr<const partition_manifest> manifest) { | ||
for (auto it = manifest->segment_containing(offsets.begin); | ||
it != manifest->end(); | ||
++it) { | ||
if (it->base_offset > offsets.end_rp) { | ||
return ss::stop_iteration::yes; | ||
} | ||
auto path = manifest->generate_segment_path(*it); | ||
auto path = manifest->generate_segment_path( | ||
*it, _manifest_view->path_provider()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the manifest
could be an STM
manifest which has different path provider instance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this offline. The intent here isn't for readers to think "we're passing path providers around because they can be different path providers" but instead think "we're passing path providers around because we need to generate paths" with the understanding that there is a singular path provider per partition.
I'll make the path provider non-copyable, hopefully that will help part of the confusion
@@ -57,9 +57,6 @@ class base_manifest { | |||
/// \return asynchronous input_stream with the serialized json | |||
virtual ss::future<serialized_data_stream> serialize() const = 0; | |||
|
|||
/// Manifest object format and name in S3 | |||
virtual remote_manifest_path get_manifest_path() const = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious about this one, why is it removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some manifests now fully have the ability to generate their paths, but some will rely on the path provider. An alternative is to lean into this more and make the partition manifest have a path provider as a member, but as discussed, this isn't a great option given all the places partition_manifest
is constructed with simple args (no args, or just an NTPR)
The downloader has a lot of context but reduces results into an outcome or error enum. This updates the downloader to log before returning early from the class, logging errors at ERROR log level, and anything else at INFO level (since it should be up to callers whether such code paths are actually problematic). This is review follow-up to redpanda-data#20778
Fixes a typo in how 'remote_label' was logged. This is review follow-up to redpanda-data#20778
The fact that we can copy around path providers is potentially error-prone and confusing. The primary path provider meant to be used by most subsystems (i.e. the archival metadata STM) may not be clear if callers end up making copies. This removes most special constructors, and exposes an explicit copy() method to force callers to think twice before copying. The only existing callsite where such a copy happens has been updated to use this method. This is review follow-up to redpanda-data#20778
Adds a comment clarifying the usage of a path provider at partition startup time. This is review follow-up to redpanda-data#20778
Adds a comment explaining the condition with which we set a remote label. This is follow-up to redpanda-data#20778
The downloader has a lot of context but reduces results into an outcome or error enum. This updates the downloader to log before returning early from the class, logging errors at ERROR log level, and anything else at INFO level (since it should be up to callers whether such code paths are actually problematic). This is review follow-up to redpanda-data#20778
Fixes a typo in how 'remote_label' was logged. This is review follow-up to redpanda-data#20778
The fact that we can copy around path providers is potentially error-prone and confusing. The primary path provider meant to be used by most subsystems (i.e. the archival metadata STM) may not be clear if callers end up making copies. This removes most special constructors, and exposes an explicit copy() method to force callers to think twice before copying. The only existing callsite where such a copy happens has been updated to use this method. This is review follow-up to redpanda-data#20778
Adds a comment clarifying the usage of a path provider at partition startup time. This is review follow-up to redpanda-data#20778
Adds a comment explaining the condition with which we set a remote label. This is follow-up to redpanda-data#20778
The downloader has a lot of context but reduces results into an outcome or error enum. This updates the downloader to log before returning early from the class, logging errors at ERROR log level, and anything else at INFO level (since it should be up to callers whether such code paths are actually problematic). This is review follow-up to redpanda-data#20778
Fixes a typo in how 'remote_label' was logged. This is review follow-up to redpanda-data#20778
The fact that we can copy around path providers is potentially error-prone and confusing. The primary path provider meant to be used by most subsystems (i.e. the archival metadata STM) may not be clear if callers end up making copies. This removes most special constructors, and exposes an explicit copy() method to force callers to think twice before copying. The only existing callsite where such a copy happens has been updated to use this method. This is review follow-up to redpanda-data#20778
Adds a comment clarifying the usage of a path provider at partition startup time. This is review follow-up to redpanda-data#20778
Adds a comment explaining the condition with which we set a remote label. This is follow-up to redpanda-data#20778
This PR introduces remote labels as a new topic property. Remote labels today
only contain the originating cluster's UUID, though in the future this may be
extended to also include something user-provider. Once a remote label is set
for a topic, it cannot be changed.
When set, subcomponents within the partition of the topic will use this label
as the basis for generating paths for remote objects via a new
remote_path_provider that is owned by the archival_metadata_stm.
A large chunk of this PR entails clearing out vestiges of path-decision-making
from the cloud_storage::remote, instead relying on recently added
topic/partition manifest downloader classes, which take into account remote
labels.
The high level structure of this PR is:
(needed for tests, which use the viewer to decode topic manifests)
defaulting to the old style of naming for now
appropriate
(remote_partition, purger, segment merger, anomaly detector, etc)
grouping metadata per label
A new configuration is also added as a part of this PR that will be useful in
general for testing, to disable the addition of the remote label for new
topics.
Backports Required
Release Notes