Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud_storage: remote labels #20778

Merged
merged 34 commits into from
Jul 8, 2024
Merged

cloud_storage: remote labels #20778

merged 34 commits into from
Jul 8, 2024

Conversation

andrwng
Copy link
Contributor

@andrwng andrwng commented Jun 29, 2024

This PR introduces remote labels as a new topic property. Remote labels today
only contain the originating cluster's UUID, though in the future this may be
extended to also include something user-provider. Once a remote label is set
for a topic, it cannot be changed.

When set, subcomponents within the partition of the topic will use this label
as the basis for generating paths for remote objects via a new
remote_path_provider that is owned by the archival_metadata_stm.

A large chunk of this PR entails clearing out vestiges of path-decision-making
from the cloud_storage::remote, instead relying on recently added
topic/partition manifest downloader classes, which take into account remote
labels.

The high level structure of this PR is:

  • preemptive change to offline_log_viewer to be able to read the remote label
    (needed for tests, which use the viewer to decode topic manifests)
  • introduce the remote label and plug it into the archival metadata stm,
    defaulting to the old style of naming for now
  • plug in the recently added topic_manifest_downloader code where appropriate
  • plug in the recently added partition_manifest_downloader code where
    appropriate
  • remove newly dead code subsumed by the downloaders
  • plug in path providers to the rest of the calls within partition
    (remote_partition, purger, segment merger, anomaly detector, etc)
  • update ducktape test utilities to support the new naming scheme, and support
    grouping metadata per label
  • plug in the actual cluster UUID into new topics, so they use new paths
  • remove newly dead code / rid manifest code of path generation
  • simple ducktape test to show things working

A new configuration is also added as a part of this PR that will be useful in
general for testing, to disable the addition of the remote label for new
topics.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.1.x
  • v23.3.x
  • v23.2.x

Release Notes

  • none

@andrwng
Copy link
Contributor Author

andrwng commented Jun 29, 2024

/ci-repeat

@andrwng
Copy link
Contributor Author

andrwng commented Jun 29, 2024

/ci-repeat

@andrwng
Copy link
Contributor Author

andrwng commented Jun 29, 2024

/cdt

@andrwng andrwng force-pushed the remote-labels branch 2 times, most recently from 5b5661a to 38ee5f1 Compare July 1, 2024 16:32
@andrwng
Copy link
Contributor Author

andrwng commented Jul 1, 2024

/ci-repeat

@andrwng andrwng force-pushed the remote-labels branch 2 times, most recently from 972ee44 to 1603f89 Compare July 1, 2024 22:31
@andrwng andrwng marked this pull request as ready for review July 1, 2024 23:30
@andrwng andrwng requested a review from a team as a code owner July 1, 2024 23:30
@andrwng andrwng added this to the v24.2.1-rc2 milestone Jul 2, 2024
@andrwng
Copy link
Contributor Author

andrwng commented Jul 2, 2024

There's still a CI failure (likely a test issue), but I think this otherwise ready for review.

@andrwng andrwng requested review from Lazin, dotnwat and WillemKauf July 2, 2024 04:00
@andrwng andrwng force-pushed the remote-labels branch 4 times, most recently from 965d4f2 to 83dee1f Compare July 3, 2024 01:35
kbatuigas
kbatuigas previously approved these changes Jul 3, 2024
src/v/config/configuration.cc Outdated Show resolved Hide resolved
andrwng added 8 commits July 3, 2024 22:26
This is a preemptive commit for the upcoming addition of remote labels
to the topic properties.
This plumbs the remote label (cluster uuid) associated with a topic from
the topics table into the archival meta stm, in the same fashion that
vcluster-id is plumbed into the rm_stm.

Conceptually, the archival STM will be the owner of the root remote path
provider used throughout the cloud_storage and archival subsystems.
This replaces path generation in ntp_archiver with variants that use the
path provider. Namely:
- topic manifest uploads
- partition manifest uploads
- segment uploads
- generating spillover manifests
This plumbs the archival stm's path provider into the manifest view. The
manifest view serves as the underlying abstraction to the
remote_partition class, so this will be necessary for getting the path
provider to be used on the remote read path.
Topic recovery need to be able to find a topic manifest whose format
(serde or json) is not yet known. This is handled transparently by the
topic manifest downloader, while also taking into account remote labels.

This commit updates the create topic path to use the downloader.
Automated topic recovery performs a list bucket operation to discover
what topics can be restored. This commit updates this to account for
topic manifests labeled with the cluster UUID.
Snapshot recovery needs to be able to find a partition manifest whose
format (serde or json) is not yet known. This is handled transparently
by the partition manifest downloader, while also taking remote labels
into account.

This commit updates the STM to use the downloader for recovery.
@andrwng
Copy link
Contributor Author

andrwng commented Jul 4, 2024

Latest pushes:

  • Rebase here
  • Updated config doc string here

@@ -132,6 +132,7 @@ class remote_partition
static ss::future<erase_result> erase(
cloud_storage::remote&,
cloud_storage_clients::bucket_name,
const remote_path_provider& path_provider,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can store remote_path_provider inside the materialized_resources instance. It is available through remote. So you won't have to pass it directly into both archiver and remote_partition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can since the path provider is tied to each topic since for example a recovery topic will have a different base label

Copy link
Contributor

@WillemKauf WillemKauf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aanthony-rp aanthony-rp modified the milestones: v24.2.1-rc2, v24.1.x-next Jul 5, 2024
@@ -113,6 +117,10 @@ def read_topic_properties_serde(rdr: Reader, version):
'flush_ms': rdr.read_optional(Reader.read_int64),
'flush_bytes': rdr.read_optional(Reader.read_int64)
}
if version >= 9:
topic_properties |= {
'remote_labels': rdr.read_optional(read_remote_label_serde)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this remote_labels intended to be singular?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! Yeah.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

src/v/archival/ntp_archiver_service.cc Outdated Show resolved Hide resolved
src/v/archival/ntp_archiver_service.cc Show resolved Hide resolved
res);
co_return std::make_tuple(errc::topic_operation_error, json_path.second);
if (
download_res.value()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does values that aren't success (but also aren't errors?) get logged in download_manifest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good callout -- they aren't. Will add some logging in a follow-up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

src/v/cluster/topic_recovery_service.cc Show resolved Hide resolved
src/v/cluster/topic_recovery_service.cc Show resolved Hide resolved
src/v/cluster/partition_recovery_manager.cc Show resolved Hide resolved
src/v/cluster/topics_frontend.cc Show resolved Hide resolved
@andrwng andrwng merged commit 17776b2 into redpanda-data:dev Jul 8, 2024
19 checks passed
@@ -818,12 +819,12 @@ ss::future<> ntp_archiver::sync_manifest_until_term_change() {
vlog(
_rtclog.error,
"Failed to download manifest {}",
manifest().get_manifest_path());
manifest().get_manifest_path(remote_path_provider()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remote_path_prvider is passed into every call? IIUC the remote_label is either set for the partition or not. So it'd be possible to pass remote_path_provider to the c-tor of the partition_manifest in the archival STM (which also "knows" about remote_label now).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We chatted about this offline and agreed that making the path provider a member of the manifest probably isn't the way to go, given how we use partition_manifest in many places as just a struct that knows about serialization. For now, having a singular remote_path_provider and passing it around is tentatively the least evil.

@@ -3257,6 +3261,11 @@ const storage::ntp_config& ntp_archiver::ntp_config() const {
return _parent.log()->config();
}

const cloud_storage::remote_path_provider&
ntp_archiver::remote_path_provider() const {
return _parent.archival_meta_stm()->path_provider();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as before, it's always the same path provider for the partition and it lives in archival STM so why not connect manifest and path provider on that level instead of doing this in every call? This makes code confusing because the reader might think that the path provider could be changed from upload to upload. Also, the async_manifest_view is getting the path provider in c-tor and partition_manifest doesn't. Which is also a bit confusing and not uniform.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pasting from other PR comments:

I think the shift in mindset is that the manifest view and the partition manifest are not equivalent to each other. The manifest view is much more full-featured and must be able to access paths of an STM manifest and spillover manifests, unlike the partition_manifest class which is focused on tracking member fields and serialization.

src/v/cloud_storage/async_manifest_view.cc Show resolved Hide resolved
@@ -158,6 +160,10 @@ class async_manifest_view {
std::optional<size_t> size_limit,
std::optional<std::chrono::milliseconds> time_limit) noexcept;

const remote_path_provider& path_provider() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing, why async_manifest_view exposes its own path provider which potentially could be different from the one in the archival_metadata_stm. If I'm writing the code in the remote_partition or ntp_archiver I can use async_manifest_view::path_provider or archival_metadata_stm::path_provider. What is the goal here? Why can't it be routed everywhere in c-tor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pasting from another PR comment:

I think it makes sense for the remote partition to depend on async_manifest_view's -- the hierarchy at least makes sense because the remote_partition uses the view for both getting segments and cursoring over spillover manifests (done by the view internally with the view's path provider)

@@ -1602,8 +1602,8 @@ remote_manifest_path async_manifest_view::get_spillover_manifest_path(
.base_ts = meta.base_timestamp,
.last_ts = meta.max_timestamp,
};
return generate_spillover_manifest_path(
get_ntp(), _stm_manifest.get_revision_id(), comp);
return remote_manifest_path{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here the code actually uses internal path provider to return correct name instead of accepting path provider as a parameter which is different from the way partition_manifest is implemented

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, I think the shift in mindset is that the manifest view and the partition manifest are not equivalent to each other. The manifest view is much more full-featured and must be able to access paths of an STM manifest and spillover manifests, unlike the partition_manifest class which is focused on tracking member fields and serialization.

@@ -199,7 +200,8 @@ remote_partition::borrow_result_t remote_partition::borrow_next_segment_reader(
}
}
if (iter == _segments.end()) {
auto path = manifest.generate_segment_path(*mit);
auto path = manifest.generate_segment_path(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case the manifest is a spillover manifest, you're using _manifest_view->path_provider() here but in order to generate the spillover manifest the code is picking up the right path provider automatically. It's not incorrect but it's confusing to route all these path_providers to some calls and not other calls. It will be difficult for someone new to understand the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed offline, I'll pass the path provider directly to the remote_partition rather than relying on the view's path provider. Though FWIW, I think it makes sense for the remote partition to depend on async_manifest_view's -- the hierarchy at least makes sense because the remote_partition uses the view for both getting segments and cursoring over spillover manifests (done by the view internally with the view's path provider)

@@ -1068,7 +1070,8 @@ remote_partition::aborted_transactions(offset_range offsets) {
// up front at the start of the function.
auto segment_unit = co_await materialized().get_segment_units(
std::nullopt);
auto path = stm_manifest.generate_segment_path(*it);
auto path = stm_manifest.generate_segment_path(
*it, _manifest_view->path_provider());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the path provider from _manifest_view is used with stm_manifest which has its own path_provider (I know that there is one path provider per partition but the code is written in such way as if everything could have a different instance of path_provider but on the other hand it doesn't respect this in all cases, like here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed a bit and agreed that we could plumb a path provider directly into remote_partition.

I'll at least defend this current impl a bit, but I'm happy to make that change if you still think it's less confusing.

The async manifest view is the root of the read path, in the same way that ntp archiver is the root of the write path. Ultimately they’re both rooted in the STM’s path provider, but at least there is a conceptual hierarchy , not just "dumb plumbing"

ssx::task_local_ptr<const partition_manifest> manifest) {
for (auto it = manifest->segment_containing(offsets.begin);
it != manifest->end();
++it) {
if (it->base_offset > offsets.end_rp) {
return ss::stop_iteration::yes;
}
auto path = manifest->generate_segment_path(*it);
auto path = manifest->generate_segment_path(
*it, _manifest_view->path_provider());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the manifest could be an STM manifest which has different path provider instance

Copy link
Contributor Author

@andrwng andrwng Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this offline. The intent here isn't for readers to think "we're passing path providers around because they can be different path providers" but instead think "we're passing path providers around because we need to generate paths" with the understanding that there is a singular path provider per partition.

I'll make the path provider non-copyable, hopefully that will help part of the confusion

@@ -57,9 +57,6 @@ class base_manifest {
/// \return asynchronous input_stream with the serialized json
virtual ss::future<serialized_data_stream> serialize() const = 0;

/// Manifest object format and name in S3
virtual remote_manifest_path get_manifest_path() const = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious about this one, why is it removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some manifests now fully have the ability to generate their paths, but some will rely on the path provider. An alternative is to lean into this more and make the partition manifest have a path provider as a member, but as discussed, this isn't a great option given all the places partition_manifest is constructed with simple args (no args, or just an NTPR)

andrwng added a commit to andrwng/redpanda that referenced this pull request Jul 8, 2024
The downloader has a lot of context but reduces results into an outcome
or error enum. This updates the downloader to log before returning early
from the class, logging errors at ERROR log level, and anything else at
INFO level (since it should be up to callers whether such code paths are
actually problematic).

This is review follow-up to
redpanda-data#20778
andrwng added a commit to andrwng/redpanda that referenced this pull request Jul 8, 2024
Fixes a typo in how 'remote_label' was logged.

This is review follow-up to
redpanda-data#20778
andrwng added a commit to andrwng/redpanda that referenced this pull request Jul 8, 2024
The fact that we can copy around path providers is potentially
error-prone and confusing. The primary path provider meant to be used by
most subsystems (i.e. the archival metadata STM) may not be clear if
callers end up making copies.

This removes most special constructors, and exposes an explicit copy()
method to force callers to think twice before copying.

The only existing callsite where such a copy happens has been updated to
use this method.

This is review follow-up to
redpanda-data#20778
andrwng added a commit to andrwng/redpanda that referenced this pull request Jul 8, 2024
Adds a comment clarifying the usage of a path provider at partition
startup time. This is review follow-up to
redpanda-data#20778
andrwng added a commit to andrwng/redpanda that referenced this pull request Jul 8, 2024
Adds a comment explaining the condition with which we set a remote
label.

This is follow-up to
redpanda-data#20778
andrwng added a commit to andrwng/redpanda that referenced this pull request Jul 9, 2024
The downloader has a lot of context but reduces results into an outcome
or error enum. This updates the downloader to log before returning early
from the class, logging errors at ERROR log level, and anything else at
INFO level (since it should be up to callers whether such code paths are
actually problematic).

This is review follow-up to
redpanda-data#20778
andrwng added a commit to andrwng/redpanda that referenced this pull request Jul 9, 2024
Fixes a typo in how 'remote_label' was logged.

This is review follow-up to
redpanda-data#20778
andrwng added a commit to andrwng/redpanda that referenced this pull request Jul 9, 2024
The fact that we can copy around path providers is potentially
error-prone and confusing. The primary path provider meant to be used by
most subsystems (i.e. the archival metadata STM) may not be clear if
callers end up making copies.

This removes most special constructors, and exposes an explicit copy()
method to force callers to think twice before copying.

The only existing callsite where such a copy happens has been updated to
use this method.

This is review follow-up to
redpanda-data#20778
andrwng added a commit to andrwng/redpanda that referenced this pull request Jul 9, 2024
Adds a comment clarifying the usage of a path provider at partition
startup time. This is review follow-up to
redpanda-data#20778
andrwng added a commit to andrwng/redpanda that referenced this pull request Jul 9, 2024
Adds a comment explaining the condition with which we set a remote
label.

This is follow-up to
redpanda-data#20778
andrwng added a commit to andrwng/redpanda that referenced this pull request Jul 9, 2024
The downloader has a lot of context but reduces results into an outcome
or error enum. This updates the downloader to log before returning early
from the class, logging errors at ERROR log level, and anything else at
INFO level (since it should be up to callers whether such code paths are
actually problematic).

This is review follow-up to
redpanda-data#20778
andrwng added a commit to andrwng/redpanda that referenced this pull request Jul 9, 2024
Fixes a typo in how 'remote_label' was logged.

This is review follow-up to
redpanda-data#20778
andrwng added a commit to andrwng/redpanda that referenced this pull request Jul 9, 2024
The fact that we can copy around path providers is potentially
error-prone and confusing. The primary path provider meant to be used by
most subsystems (i.e. the archival metadata STM) may not be clear if
callers end up making copies.

This removes most special constructors, and exposes an explicit copy()
method to force callers to think twice before copying.

The only existing callsite where such a copy happens has been updated to
use this method.

This is review follow-up to
redpanda-data#20778
andrwng added a commit to andrwng/redpanda that referenced this pull request Jul 9, 2024
Adds a comment clarifying the usage of a path provider at partition
startup time. This is review follow-up to
redpanda-data#20778
andrwng added a commit to andrwng/redpanda that referenced this pull request Jul 9, 2024
Adds a comment explaining the condition with which we set a remote
label.

This is follow-up to
redpanda-data#20778
@BenPope
Copy link
Member

BenPope commented Jul 19, 2024

@andrwng The milestone suggests a backport was intended?

@piyushredpanda piyushredpanda modified the milestones: v24.1.x-next, v24.1.11 Jul 19, 2024
@andrwng
Copy link
Contributor Author

andrwng commented Jul 20, 2024

@andrwng The milestone suggests a backport was intended?

Oops! Not intended, thanks Piyush for removing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants