Skip to content

Commit

Permalink
archival: don't stop the upload loop on sync fails
Browse files Browse the repository at this point in the history
Previously, we'd bail out of the upload loop if the initial sync call
failed. The upload loop wouldn't restart until forced by re-setting the
remote write topic config or restarting the node or changing leadership.
This leads to the disk filling up since the collectable offset isn't
advancing due to the lack of uploads.

The fix is to swap the `co_return` with `continue` to allow for retries.
I've also added some error logging for when the loops quit unexpectedly.

(cherry picked from commit 6576e32)
  • Loading branch information
Vlad Lazar authored and vbotbuildovich committed Oct 18, 2023
1 parent 5883ef0 commit 57cc25c
Showing 1 changed file with 24 additions and 4 deletions.
28 changes: 24 additions & 4 deletions src/v/archival/ntp_archiver_service.cc
Original file line number Diff line number Diff line change
Expand Up @@ -240,10 +240,30 @@ const cloud_storage::partition_manifest& ntp_archiver::manifest() const {

ss::future<> ntp_archiver::start() {
if (_parent.get_ntp_config().is_read_replica_mode_enabled()) {
ssx::spawn_with_gate(
_gate, [this] { return sync_manifest_until_abort(); });
ssx::spawn_with_gate(_gate, [this] {
return sync_manifest_until_abort().then([this] {
if (!_as.abort_requested()) {
vlog(
_rtclog.error,
"Sync loop stopped without an abort being requested. "
"Please disable and re-enable "
"redpanda.remote.readreplica "
"the topic in order to restart it.");
}
});
});
} else {
ssx::spawn_with_gate(_gate, [this] { return upload_until_abort(); });
ssx::spawn_with_gate(_gate, [this] {
return upload_until_abort().then([this]() {
if (!_as.abort_requested()) {
vlog(
_rtclog.error,
"Upload loop stopped without an abort being requested. "
"Please disable and re-enable redpanda.remote.write "
"the topic in order to restart it.");
}
});
});
}

return ss::now();
Expand Down Expand Up @@ -302,7 +322,7 @@ ss::future<> ntp_archiver::upload_until_abort() {
bool is_synced = co_await _parent.archival_meta_stm()->sync(
sync_timeout);
if (!is_synced) {
co_return;
continue;
}
vlog(_rtclog.debug, "upload loop synced in term {}", _start_term);

Expand Down

0 comments on commit 57cc25c

Please sign in to comment.