Skip to content

Commit

Permalink
cloud_storage: respect max_keys in s3_client::list_objects()
Browse files Browse the repository at this point in the history
Previously, the `max_keys` value was erroneously added to the header
of a `list_objects` request in the `s3_client`. This is, in fact,
a URI request parameter, not a request header.

See:
 * https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjects.html
 * https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html

If `max_keys` is specified in a call to `remote::list_objects()`,
the onus is now on the user to deal with a possibly truncated value
at the call site, likely in a while loop.

The idea is that the user will be able to check for this case using
`list_result.is_truncated`, and then pass `list_result.next_continuation_token`
to `remote::list_objects()` in future requests.
  • Loading branch information
WillemKauf committed May 7, 2024
1 parent 11c0cb8 commit 39c48f4
Show file tree
Hide file tree
Showing 3 changed files with 29 additions and 6 deletions.
14 changes: 14 additions & 0 deletions src/v/cloud_storage/remote.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1364,6 +1364,12 @@ ss::future<remote::list_result> remote::list_objects(
// Gathers the items from a series of successful ListObjectsV2 calls
cloud_storage_clients::client::list_bucket_result list_bucket_result;

const auto caller_handle_truncation = max_keys.has_value();

if (caller_handle_truncation) {
vassert(max_keys.value() > 0, "Max keys must be greater than 0.");
}

// Keep iterating while the ListObjectsV2 calls has more items to return
while (!_gate.is_closed() && permit.is_allowed && !result) {
auto res = co_await lease.client->list_objects(
Expand Down Expand Up @@ -1403,6 +1409,14 @@ ss::future<remote::list_result> remote::list_objects(

// Continue to list the remaining items
if (items_remaining) {
// But, return early if max_keys was specified (caller will
// handle truncation)
if (caller_handle_truncation) {
list_bucket_result.is_truncated = true;
list_bucket_result.next_continuation_token
= continuation_token.value();
co_return list_bucket_result;
}
continue;
}

Expand Down
13 changes: 11 additions & 2 deletions src/v/cloud_storage/remote.h
Original file line number Diff line number Diff line change
Expand Up @@ -448,8 +448,17 @@ class remote
/// \param prefix Optional prefix to restrict listing of objects
/// \param delimiter A character to use as a delimiter when grouping list
/// results
/// \param item_filter Optional filter to apply to items before
/// collecting
/// \param max_keys The maximum number of keys to return. If left
/// unspecified, all object keys that fulfill the request will be collected,
/// and the result will not be truncated (truncation not allowed). If
/// specified, it will be up to the user to deal with a possibly-truncated
/// result (using list_result.is_truncated) at the call site, most likely in
/// a while loop. The continuation-token generated by that request will be
/// available through list_result.next_continuation_token for future
/// requests. It is also important to note that the value for max_keys will
/// be capped by the cloud provider default (which may vary between
/// providers, e.g AWS has a limit of 1000 keys per ListObjects request).
/// \param item_filter Optional filter to apply to items before collecting
ss::future<list_result> list_objects(
const cloud_storage_clients::bucket_name& name,
retry_chain_node& parent,
Expand Down
8 changes: 4 additions & 4 deletions src/v/cloud_storage_clients/s3_client.cc
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,9 @@ request_creator::make_list_objects_v2_request(
if (delimiter.has_value()) {
key = fmt::format("{}&delimiter={}", key, *delimiter);
}
if (max_keys.has_value()) {
key = fmt::format("{}&max-keys={}", key, *max_keys);
}
auto target = make_target(name, object_key{key});
header.method(boost::beast::http::verb::get);
header.target(target);
Expand All @@ -220,9 +223,6 @@ request_creator::make_list_objects_v2_request(
if (start_after) {
header.insert(aws_header_names::start_after, (*start_after)().string());
}
if (max_keys) {
header.insert(aws_header_names::max_keys, std::to_string(*max_keys));
}
if (continuation_token) {
header.insert(
aws_header_names::continuation_token,
Expand Down Expand Up @@ -829,7 +829,7 @@ ss::future<s3_client::list_bucket_result> s3_client::do_list_objects_v2(
name,
std::move(prefix),
std::move(start_after),
max_keys,
std::move(max_keys),
std::move(continuation_token),
delimiter);
if (!header) {
Expand Down

0 comments on commit 39c48f4

Please sign in to comment.