Skip to content

Commit

Permalink
cloud_storage/remote_partition: handle race retention+timequery
Browse files Browse the repository at this point in the history
log a warning if when trying to get a async_manifest_view_cursor for a
timequery the result is out_of_range.
such error can happen if retention is kicking in and deleting some or
all spillover manifests: in this case there is a window where the
manifest are reclaimable but still kept in memory, and the timequery
hits one of the manifest in the reclaimable range.

handling this as a warning and no result is acceptable because the kafka
client issuing the request can handle this failure, and otherwise
handling this edge cases would increase the complexity of the callstack

(cherry picked from commit 4ca7281)
  • Loading branch information
andijcr authored and vbotbuildovich committed Jan 26, 2024
1 parent 46b3225 commit c85d3a6
Showing 1 changed file with 27 additions and 1 deletion.
28 changes: 27 additions & 1 deletion src/v/cloud_storage/remote_partition.cc
Original file line number Diff line number Diff line change
Expand Up @@ -521,6 +521,7 @@ class partition_record_batch_reader_impl final
cur.error() == error_outcome::out_of_range
&& ss::visit(
query,
[&](model::offset) { return false; },
[&](kafka::offset query_offset) {
// Special case queries below the start offset of the log.
// The start offset may have advanced while the request was
Expand All @@ -540,7 +541,32 @@ class partition_record_batch_reader_impl final
}
return false;
},
[](auto) { return false; })) {
[&](model::timestamp query_ts) {
// Special case, it can happen when a timequery falls below
// the clean offset. Caused when the query races with
// retention/gc. log a warning, since the kafka client can
// handle a failed query
auto const& spillovers = _partition->_manifest_view
->stm_manifest()
.get_spillover_map();
if (
spillovers.empty()
|| spillovers.get_max_timestamp_column()
.last_value()
.value_or(model::timestamp::max()())
>= query_ts()) {
vlog(
_ctxlog.debug,
"Manifest query raced with retention and the result "
"is below the clean/start offset for {}",
query_ts);
return true;
}

// query was not meant for archive region. fallthrough and
// log an error
return false;
})) {
// error was handled
co_return;
}
Expand Down

0 comments on commit c85d3a6

Please sign in to comment.