-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CORE-2157: cloud_storage_clients: add support for path-style addressing #17806
Conversation
fbb50b4
to
f8240dd
Compare
@@ -0,0 +1,28 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think stuff like this could go into the cloud_storage_clients/types.h
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Fixed!
return fmt::format("{}.{}", name(), _ap()); | ||
case s3_url_style::path: | ||
// Host: s3.region-code.amazonaws.com | ||
return fmt::format("{}", _ap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the bucket name is not included?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the S3 User Guide's examples of path-style requests, this seems to be the proper form for the host in the header.
The bucket name is instead used in the target of the header.
In commit 8edf7d5, I added some comments in the various request_creator::make_*()
functions to reflect this style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the S3 User Guide's examples of path-style requests, this seems to be the proper form for the host in the header.
The bucket name is instead used in the target of the header.
In commit 8edf7d5, I added some comments in the various
request_creator::make_*()
functions to reflect this style.
I was also confused here. But the answer is that this is make_host
, not make_target
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, correct. This is make_host
.
f8240dd
to
8d815c7
Compare
7727b0c
to
f27b536
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -76,6 +77,11 @@ void cli_opts(boost::program_options::options_description_easy_init opt) { | |||
po::value<std::string>()->default_value("us-east-1"), | |||
"aws region"); | |||
|
|||
opt( | |||
"url_style", | |||
po::value<std::string>()->default_value("virtual_host"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good for development but we also need a proper unit-test. In the src/v/cloud_storage/remote.h
we have a remote
class which serves as a proxy for the code that has to interact with the cloud storage. All code paths which are downloading or uploading or deleting are using this class instead of the client. The unit-test has to show that when the path-style is enabled no REST API request uses virtual host style URL. The cloud_storage::remote
has a unit-test suite. It invokes every method of the remote
and checks the results using the s3_imposter
. The imposter allows you to examine parameters of the request and also validate the URL. So basically, we need to have all these tests to be invoked with virtual host style URLs and also with path-style URLs.
The motivation here is that some cloud storage providers may not support one style or the other. In this situation we don't want to accidentally use the wrong style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like all existing REST API's that we're using are supported. But we can add new request in the future so some check is nice to have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be done in a followup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be done in a followup.
@Lazin Yes, and we are also going to wire up ducktape wiht minio running in both modes (pretty sure it supports it). You can see all teh follow up work in CORE-725.
@WillemKauf can you capture Evgeny's suggestions about additional unit testing in a new or existing jira ticket as a subtask for the overall project?
f27b536
to
df9d817
Compare
Removed a commit that attempted to add virtual-host style configuration to the Contrary to what the understanding was, minIO by default works with path style requests, unless This environmental variable appears to be set in I have removed the broken code for now, and will follow up in a future PR to add virtual-host support to minIO and |
Redpanda currently supports only virtual-hosted style URLs for use with S3. Add `s3_url_style` to `cloud_storage_clients/types.h" to allow for future work towards supporting path-style URLs.
To support path-style requests in addition to the currently supported virtual-host style, 's3_url_style' has been added to 's3_configuration', as well as 'request_creator'. New methods 'make_host()' and 'make_target()' in 'request_creator' will generate host and target strings according to the url style set for 's3_client' requests (currently defaulted to virtual-host).
Allow user to set s3_url_style in their redpanda configuration file. New setting 'cloud_storage_url_style' can be set to 'virtual_host' or 'path'.
df9d817
to
e02e84c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. There are a few small nits that looks like they could be a commit tacked onto the end. Then I think we should approve this, merge it right after we branch 24.1, and move on to the other tasks in CORE-725 like expanded testing.
BTW, please update this PR so that it isn't setup to close CORE-725 when it merges, which is the high-level task. Instead, it looks like CORE-2157 is the right one for this PR. Wdyt?
switch (us) { | ||
case s3_url_style::virtual_host: | ||
return os << "virtual_host"; | ||
case s3_url_style::path: | ||
return os << "path"; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you stick the implementation in a .cc file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit tricky, because the config
library is going to depend on this implementation. Having it inline
and accessible from the header saves us from having to adding a dependency on the cloud_storage_clients
library from config
(and adding this would create a cyclic dependency).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh, yeh, then don't worry about it. we'll fix those dependencies as we are switching modules over to be new-style. it's a lot of work no need to do that here.
return fmt::format("{}.{}", name(), _ap()); | ||
case s3_url_style::path: | ||
// Host: s3.region-code.amazonaws.com | ||
return fmt::format("{}", _ap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the S3 User Guide's examples of path-style requests, this seems to be the proper form for the host in the header.
The bucket name is instead used in the target of the header.
In commit 8edf7d5, I added some comments in the various
request_creator::make_*()
functions to reflect this style.
I was also confused here. But the answer is that this is make_host
, not make_target
, right?
@@ -343,6 +346,30 @@ request_creator::make_delete_objects_request( | |||
std::make_unique<delete_objects_body>(std::move(body))}}}; | |||
} | |||
|
|||
std::string | |||
request_creator::make_host([[maybe_unused]] const bucket_name& name) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you do don't need maybe_unused
-- it's only needed if the code generated doesn't use it (e.g. in conjunction with constexpr or templates etc...). in this case it is used in the virtual host case.
} | ||
|
||
std::string request_creator::make_target( | ||
[[maybe_unused]] const bucket_name& name, const object_key& key) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can drop maybe_unused.
std::string make_host([[maybe_unused]] const bucket_name& name) const; | ||
|
||
std::string make_target( | ||
[[maybe_unused]] const bucket_name& name, const object_key& key) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can drop maybe_unused.
, cloud_storage_url_style( | ||
*this, | ||
"cloud_storage_url_style", | ||
"The addressing style to use for S3 requests.", | ||
{.needs_restart = needs_restart::yes, | ||
.example = "virtual_host", | ||
.visibility = visibility::user}, | ||
cloud_storage_clients::s3_url_style::virtual_host, | ||
{ | ||
cloud_storage_clients::s3_url_style::virtual_host, | ||
cloud_storage_clients::s3_url_style::path, | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add an auto
option? From what I can tell the strategy is basically to try virtual-host and then fallback to path-style, and cache the decision.
See https://github.com/boto/botocore/blob/develop/botocore/utils.py#L1375-L1381
@Lazin what do you think?
@WillemKauf if we all think this is a good idea, let's create a new jira ticket to capture it--no need to expand the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great idea for future work! +1
// Virtual Style: | ||
// POST /?delete HTTP/1.1 | ||
// Host: <Bucket>.s3.amazonaws.com | ||
// Host: {bucket-name}.s3.{region}.amazonaws.com | ||
// Path Style: | ||
// POST /{bucket-name}/?delete HTTP/1.1 | ||
// Host: s3.{region}.amazonaws.com | ||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments!
@@ -76,6 +77,11 @@ void cli_opts(boost::program_options::options_description_easy_init opt) { | |||
po::value<std::string>()->default_value("us-east-1"), | |||
"aws region"); | |||
|
|||
opt( | |||
"url_style", | |||
po::value<std::string>()->default_value("virtual_host"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be done in a followup.
@Lazin Yes, and we are also going to wire up ducktape wiht minio running in both modes (pretty sure it supports it). You can see all teh follow up work in CORE-725.
@WillemKauf can you capture Evgeny's suggestions about additional unit testing in a new or existing jira ticket as a subtask for the overall project?
Added the option for `url_style` to `s3_test_client_main.cc`, allowing for testing of both virtual-style and path-style requests. Also make sure the `url_style` is initialized correctly everywhere the `s3_configuration` object is created.
e02e84c
to
080ae70
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved -- let's just wait a couple days for 24.1 branching before merging.
/ci-repeat 1 |
new failures in https://buildkite.com/redpanda/redpanda/builds/48181#018f0c84-6c7f-4a56-ae78-5ca002edee99:
new failures in https://buildkite.com/redpanda/redpanda/builds/48181#018f0c84-6c7c-438e-9ff7-2734d2d8c424:
new failures in https://buildkite.com/redpanda/redpanda/builds/48181#018f0c84-6c83-4834-9456-7d5fc0d51a8e:
new failures in https://buildkite.com/redpanda/redpanda/builds/48181#018f0c8a-f91c-456b-81a5-9e99f5925d52:
new failures in https://buildkite.com/redpanda/redpanda/builds/48181#018f0c8a-f919-4df8-a841-e6f389e3a455:
new failures in https://buildkite.com/redpanda/redpanda/builds/48181#018f0c8a-f914-4373-ae01-97032c260087:
new failures in https://buildkite.com/redpanda/redpanda/builds/48181#018f0c8a-f917-47dc-9d33-9c5d0e34f7b9:
new failures in https://buildkite.com/redpanda/redpanda/builds/48189#018f0dd7-4e8b-4c86-8133-375b14f250c7:
new failures in https://buildkite.com/redpanda/redpanda/builds/48189#018f0ddf-027e-41c8-b10c-d4b050167d86:
|
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/48181#018f0c8a-f917-47dc-9d33-9c5d0e34f7b9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm from docs side!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes look great, we should merge this as soon as we take care of a what looks like some related failures.
looks like a few ducktape tests are complaining about a new configuration option.
INFO 2024-04-23 20:34:59,789 [shard 0] redpanda::main - application.cc:269 - Failure during startup: std::invalid_argument (Unknown property cloud_storage_url_style)
The addressing style used by `redpanda` for ducktape tests can now be set with the variable `cloud_storage_url_style` in `SISettings`, or injected into a test similar to `cloud_storage_type` with a decorator. e.g: `@matrix(cloud_storage_url_style=['path','virtual_host'])`
1e09d37
to
5e9ac31
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failures are from the 24.1 branching issue in upgrade tests, and #17920.
Fixes #2183.
This PR adds support for path-style requests to S3 cloud storage.
cloud_storage_url_style
is a configurable option inredpanda
. It defaults to the existingvirtual_host
style, but can be set topath
. Thes3_client
will now have its headers for requests generated based on the configured URL style.Support for ducktape has been included, and path-style requests have been verified as working using the
s3_test_client_main
test. Ideally, the cloud storage self-test in PR #17586 would be integrated with these changes so as to verify both virtual-hosted style requests and path style requests.Backports Required
Release Notes
Features
rpk cluster config edit
: