Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CORE-2157: cloud_storage_clients: add support for path-style addressing #17806

Merged
merged 7 commits into from
Apr 24, 2024

Conversation

WillemKauf
Copy link
Contributor

@WillemKauf WillemKauf commented Apr 11, 2024

Fixes #2183.

This PR adds support for path-style requests to S3 cloud storage.

cloud_storage_url_style is a configurable option in redpanda. It defaults to the existing virtual_host style, but can be set to path. The s3_client will now have its headers for requests generated based on the configured URL style.

Support for ducktape has been included, and path-style requests have been verified as working using the s3_test_client_main test. Ideally, the cloud storage self-test in PR #17586 would be integrated with these changes so as to verify both virtual-hosted style requests and path style requests.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x

Release Notes

Features

  • Adds path-style addressing to S3 client requests.
  • This option is configurable through rpk cluster config edit:
# The addressing style to use for S3 requests. (one of virtual_host, path, restart required)
cloud_storage_url_style: path

@@ -0,0 +1,28 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think stuff like this could go into the cloud_storage_clients/types.h.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Fixed!

return fmt::format("{}.{}", name(), _ap());
case s3_url_style::path:
// Host: s3.region-code.amazonaws.com
return fmt::format("{}", _ap());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the bucket name is not included?

Copy link
Contributor Author

@WillemKauf WillemKauf Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the S3 User Guide's examples of path-style requests, this seems to be the proper form for the host in the header.

The bucket name is instead used in the target of the header.

In commit 8edf7d5, I added some comments in the various request_creator::make_*() functions to reflect this style.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the S3 User Guide's examples of path-style requests, this seems to be the proper form for the host in the header.

The bucket name is instead used in the target of the header.

In commit 8edf7d5, I added some comments in the various request_creator::make_*() functions to reflect this style.

I was also confused here. But the answer is that this is make_host, not make_target, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct. This is make_host.

@WillemKauf WillemKauf requested a review from Lazin April 16, 2024 15:35
@WillemKauf WillemKauf marked this pull request as ready for review April 16, 2024 15:46
@WillemKauf WillemKauf force-pushed the path_style_url branch 5 times, most recently from 7727b0c to f27b536 Compare April 16, 2024 21:45
Lazin
Lazin previously approved these changes Apr 17, 2024
Copy link
Contributor

@Lazin Lazin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -76,6 +77,11 @@ void cli_opts(boost::program_options::options_description_easy_init opt) {
po::value<std::string>()->default_value("us-east-1"),
"aws region");

opt(
"url_style",
po::value<std::string>()->default_value("virtual_host"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good for development but we also need a proper unit-test. In the src/v/cloud_storage/remote.h we have a remote class which serves as a proxy for the code that has to interact with the cloud storage. All code paths which are downloading or uploading or deleting are using this class instead of the client. The unit-test has to show that when the path-style is enabled no REST API request uses virtual host style URL. The cloud_storage::remote has a unit-test suite. It invokes every method of the remote and checks the results using the s3_imposter. The imposter allows you to examine parameters of the request and also validate the URL. So basically, we need to have all these tests to be invoked with virtual host style URLs and also with path-style URLs.

The motivation here is that some cloud storage providers may not support one style or the other. In this situation we don't want to accidentally use the wrong style.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like all existing REST API's that we're using are supported. But we can add new request in the future so some check is nice to have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be done in a followup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be done in a followup.

@Lazin Yes, and we are also going to wire up ducktape wiht minio running in both modes (pretty sure it supports it). You can see all teh follow up work in CORE-725.

@WillemKauf can you capture Evgeny's suggestions about additional unit testing in a new or existing jira ticket as a subtask for the overall project?

@WillemKauf
Copy link
Contributor Author

WillemKauf commented Apr 17, 2024

Removed a commit that attempted to add virtual-host style configuration to the boto3.client.

Contrary to what the understanding was, minIO by default works with path style requests, unless MINIO_DOMAIN is set.

This environmental variable appears to be set in docker/docker_compose.yml, but ducktape tests utilizing the boto3.client are still failing to make requests with the virtual-host style to minIO.

I have removed the broken code for now, and will follow up in a future PR to add virtual-host support to minIO and boto3.client in our ducktape tests once I figure out why it is failing.

Redpanda currently supports only virtual-hosted style URLs
for use with S3.

Add `s3_url_style` to `cloud_storage_clients/types.h" to allow for
future work towards supporting path-style URLs.
To support path-style requests in addition to the currently supported
virtual-host style, 's3_url_style' has been added to 's3_configuration',
as well as 'request_creator'.

New methods 'make_host()' and 'make_target()' in 'request_creator'
will generate host and target strings according to the url style set
for 's3_client' requests (currently defaulted to virtual-host).
Allow user to set s3_url_style in their redpanda configuration
file. New setting 'cloud_storage_url_style' can be set to
'virtual_host' or 'path'.
@WillemKauf WillemKauf requested a review from a team as a code owner April 17, 2024 19:36
Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. There are a few small nits that looks like they could be a commit tacked onto the end. Then I think we should approve this, merge it right after we branch 24.1, and move on to the other tasks in CORE-725 like expanded testing.

BTW, please update this PR so that it isn't setup to close CORE-725 when it merges, which is the high-level task. Instead, it looks like CORE-2157 is the right one for this PR. Wdyt?

Comment on lines +69 to +74
switch (us) {
case s3_url_style::virtual_host:
return os << "virtual_host";
case s3_url_style::path:
return os << "path";
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you stick the implementation in a .cc file?

Copy link
Contributor Author

@WillemKauf WillemKauf Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit tricky, because the config library is going to depend on this implementation. Having it inline and accessible from the header saves us from having to adding a dependency on the cloud_storage_clients library from config (and adding this would create a cyclic dependency).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, yeh, then don't worry about it. we'll fix those dependencies as we are switching modules over to be new-style. it's a lot of work no need to do that here.

return fmt::format("{}.{}", name(), _ap());
case s3_url_style::path:
// Host: s3.region-code.amazonaws.com
return fmt::format("{}", _ap());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the S3 User Guide's examples of path-style requests, this seems to be the proper form for the host in the header.

The bucket name is instead used in the target of the header.

In commit 8edf7d5, I added some comments in the various request_creator::make_*() functions to reflect this style.

I was also confused here. But the answer is that this is make_host, not make_target, right?

@@ -343,6 +346,30 @@ request_creator::make_delete_objects_request(
std::make_unique<delete_objects_body>(std::move(body))}}};
}

std::string
request_creator::make_host([[maybe_unused]] const bucket_name& name) const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you do don't need maybe_unused -- it's only needed if the code generated doesn't use it (e.g. in conjunction with constexpr or templates etc...). in this case it is used in the virtual host case.

}

std::string request_creator::make_target(
[[maybe_unused]] const bucket_name& name, const object_key& key) const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can drop maybe_unused.

Comment on lines 110 to 113
std::string make_host([[maybe_unused]] const bucket_name& name) const;

std::string make_target(
[[maybe_unused]] const bucket_name& name, const object_key& key) const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can drop maybe_unused.

src/v/cloud_storage_clients/s3_client.h Outdated Show resolved Hide resolved
Comment on lines +1633 to +1644
, cloud_storage_url_style(
*this,
"cloud_storage_url_style",
"The addressing style to use for S3 requests.",
{.needs_restart = needs_restart::yes,
.example = "virtual_host",
.visibility = visibility::user},
cloud_storage_clients::s3_url_style::virtual_host,
{
cloud_storage_clients::s3_url_style::virtual_host,
cloud_storage_clients::s3_url_style::path,
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add an auto option? From what I can tell the strategy is basically to try virtual-host and then fallback to path-style, and cache the decision.

See https://github.com/boto/botocore/blob/develop/botocore/utils.py#L1375-L1381

@Lazin what do you think?

@WillemKauf if we all think this is a good idea, let's create a new jira ticket to capture it--no need to expand the scope of this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great idea for future work! +1

Comment on lines +292 to +298
// Virtual Style:
// POST /?delete HTTP/1.1
// Host: <Bucket>.s3.amazonaws.com
// Host: {bucket-name}.s3.{region}.amazonaws.com
// Path Style:
// POST /{bucket-name}/?delete HTTP/1.1
// Host: s3.{region}.amazonaws.com
//
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments!

@@ -76,6 +77,11 @@ void cli_opts(boost::program_options::options_description_easy_init opt) {
po::value<std::string>()->default_value("us-east-1"),
"aws region");

opt(
"url_style",
po::value<std::string>()->default_value("virtual_host"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be done in a followup.

@Lazin Yes, and we are also going to wire up ducktape wiht minio running in both modes (pretty sure it supports it). You can see all teh follow up work in CORE-725.

@WillemKauf can you capture Evgeny's suggestions about additional unit testing in a new or existing jira ticket as a subtask for the overall project?

@WillemKauf WillemKauf changed the title CORE-725: cloud_storage_clients: add support for path-style addressing CORE-2157: cloud_storage_clients: add support for path-style addressing Apr 17, 2024
Added the option for `url_style` to `s3_test_client_main.cc`,
allowing for testing of both virtual-style and path-style requests.

Also make sure the `url_style` is initialized correctly everywhere
the `s3_configuration` object is created.
Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved -- let's just wait a couple days for 24.1 branching before merging.

@dotnwat
Copy link
Member

dotnwat commented Apr 23, 2024

/ci-repeat 1

@redpanda-data redpanda-data deleted a comment from vbotbuildovich Apr 23, 2024
@redpanda-data redpanda-data deleted a comment from vbotbuildovich Apr 23, 2024
@dotnwat dotnwat requested review from Lazin and dotnwat April 23, 2024 18:57
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Apr 23, 2024

new failures in https://buildkite.com/redpanda/redpanda/builds/48181#018f0c84-6c7f-4a56-ae78-5ca002edee99:

"rptest.tests.upgrade_test.UpgradeFromPriorFeatureVersionCloudStorageTest.test_rolling_upgrade.cloud_storage_type=CloudStorageType.S3"

new failures in https://buildkite.com/redpanda/redpanda/builds/48181#018f0c84-6c7c-438e-9ff7-2734d2d8c424:

"rptest.tests.read_replica_e2e_test.ReadReplicasUpgradeTest.test_upgrades.cloud_storage_type=CloudStorageType.S3"

new failures in https://buildkite.com/redpanda/redpanda/builds/48181#018f0c84-6c83-4834-9456-7d5fc0d51a8e:

"rptest.tests.license_upgrade_test.UpgradeMigratingLicenseVersion.test_license_upgrade.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.rbac_upgrade_test.UpgradeMigrationCreatingDefaultRole.test_rbac_migration"

new failures in https://buildkite.com/redpanda/redpanda/builds/48181#018f0c8a-f91c-456b-81a5-9e99f5925d52:

"rptest.tests.read_replica_e2e_test.ReadReplicasUpgradeTest.test_upgrades.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.license_upgrade_test.UpgradeMigratingLicenseVersion.test_license_upgrade.cloud_storage_type=CloudStorageType.S3"

new failures in https://buildkite.com/redpanda/redpanda/builds/48181#018f0c8a-f919-4df8-a841-e6f389e3a455:

"rptest.tests.rbac_upgrade_test.UpgradeMigrationCreatingDefaultRole.test_rbac_migration"

new failures in https://buildkite.com/redpanda/redpanda/builds/48181#018f0c8a-f914-4373-ae01-97032c260087:

"rptest.tests.partition_movement_test.SIPartitionMovementTest.test_cross_shard.num_to_upgrade=2.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.partition_movement_test.SIPartitionMovementTest.test_shadow_indexing.num_to_upgrade=2.cloud_storage_type=CloudStorageType.S3"

new failures in https://buildkite.com/redpanda/redpanda/builds/48181#018f0c8a-f917-47dc-9d33-9c5d0e34f7b9:

"rptest.tests.upgrade_test.UpgradeFromPriorFeatureVersionCloudStorageTest.test_rolling_upgrade.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.workload_upgrade_runner_test.RedpandaUpgradeTest.test_workloads_through_releases.cloud_storage_type=CloudStorageType.S3"

new failures in https://buildkite.com/redpanda/redpanda/builds/48189#018f0dd7-4e8b-4c86-8133-375b14f250c7:

"rptest.tests.rbac_upgrade_test.UpgradeMigrationCreatingDefaultRole.test_rbac_migration"

new failures in https://buildkite.com/redpanda/redpanda/builds/48189#018f0ddf-027e-41c8-b10c-d4b050167d86:

"rptest.tests.rbac_upgrade_test.UpgradeMigrationCreatingDefaultRole.test_rbac_migration"

@vbotbuildovich
Copy link
Collaborator

micheleRP
micheleRP previously approved these changes Apr 23, 2024
Copy link

@micheleRP micheleRP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm from docs side!

Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes look great, we should merge this as soon as we take care of a what looks like some related failures.

looks like a few ducktape tests are complaining about a new configuration option.

INFO  2024-04-23 20:34:59,789 [shard 0] redpanda::main - application.cc:269 - Failure during startup: std::invalid_argument (Unknown property cloud_storage_url_style)

The addressing style used by `redpanda` for ducktape tests can now
be set with the variable `cloud_storage_url_style` in `SISettings`,
or injected into a test similar to `cloud_storage_type` with a decorator.

e.g: `@matrix(cloud_storage_url_style=['path','virtual_host'])`
Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failures are from the 24.1 branching issue in upgrade tests, and #17920.

@dotnwat dotnwat merged commit b9adac6 into redpanda-data:dev Apr 24, 2024
13 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for path-style cloud storage addressing
5 participants