r/recovery_stm: stop recovery when follower was already updated

Recovery and replicate stms are not synchronized. It may be the case when both of stms are active at the same time that the same batch is delivered to the follower twice. In general this batch duplication is harmless as Raft is not vulnerable for messages redelivery but it may cause unnecessary truncation and latency increase. Added a check validating expected log end offset right before sending recovery append entries request. This will prevent sending the same set of batches twice to the follower. Fixes: #14413 Signed-off-by: Michal Maslanka <michal@redpanda.com>
redpanda-data · Nov 20, 2023 · f10017d · f10017d
1 parent bf9a11d
commit f10017d
Showing 1 changed file with 12 additions and 0 deletions.
diff --git a/src/v/raft/recovery_stm.cc b/src/v/raft/recovery_stm.cc
@@ -557,6 +557,18 @@ ss::future<> recovery_stm::replicate(
         _stop_requested = true;
         return ss::now();
     }
+    if (meta.value()->expected_log_end_offset >= _last_batch_offset) {
+        vlog(
+          _ctxlog.trace,
+          "follower expected log end offset is already updated, stopping "
+          "recovery. Expected log end offset: {}, recovery range last offset: "
+          "{}",
+          meta.value()->expected_log_end_offset,
+          _last_batch_offset);
+
+        _stop_requested = true;
+        return ss::now();
+    }
     /**
      * Update follower expected log end. It is equal to the last batch in a set
      * of batches read for this recovery round.