-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tx/rm_stm: remove mem_state #18076
tx/rm_stm: remove mem_state #18076
Conversation
the method factors out logic to update transactional state, no logic changes.
/dt |
/dt |
/dt |
/ci-repeat 3 |
'estimated' was used as an estimate for the first data batch offset (so LSO cannot exceed it) while replication is in progress. This is the last in memory state that is pending for cleanup. This commit redefines the transacation boundary to begin from fence batch rather the first data batch which allows us to get rid of this additional state which lays the foundation for producer state move.
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/48318#018f193e-897e-404f-970c-ce4a80c92930 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/48346#018f1b8e-c5fb-41ff-826a-7e4e98dc82e0 |
@@ -561,6 +541,8 @@ class rm_stm final : public raft::persisted_stm<> { | |||
ss::gate _gate; | |||
// Highest producer ID applied to this stm. | |||
model::producer_id _highest_producer_id; | |||
// for monotonicity of computed LSO. | |||
model::offset _last_known_lso{-1}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do not need to be initialized, it will be default initialized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made a small change in sync() related to this.. retained this to be -1 to be compatible with previous code.
Chaos logs showed that there can be duplicate (back to back) begin_tx requests due to some certain client quirks beyond our control. We had some checks in place to ignore them but they were invalidated by redpanda-data#18076 which redefined the transaction boundries. After redpanda-data#18076 an entry in the ongoing_map does not mean the data has been replicated as a part of the transaction. This PR adjusts the checks accordingly. Note: This code is going away soon, so we will have better API in the producer_state to make the code more easily understandable. This PR is to make chaos happy until then.
In earlier versions we were not considering transactions without data batches as candidates for expiry, thats not the case any more with redpanda-data#18076, adding a regression test to future proof.
In earlier versions we were not considering transactions without data batches as candidates for expiry, thats not the case any more with redpanda-data#18076, adding a regression test to future proof.
In earlier versions we were not considering transactions without data batches as candidates for expiry, thats not the case any more with redpanda-data#18076, adding a regression test to future proof.
In earlier versions we were not considering transactions without data batches as candidates for expiry, thats not the case any more with redpanda-data#18076, adding a regression test to future proof.
Another step towards consolidated state management.
This PR removes
mem_state
struct from rm_stm entirely. Main change in this PR is to get rid of mem_state::estimated map that is supposed to hold the estimated offset of the first data batch in a transaction while its replication is in progress. This estimate is in place so that any racy LSO computation on the partition does not exceed the (future) replicated offset. We remove that by redefining the transaction boundary to include the begin/fence batch in the transaction. first data batch is guaranteed tohappen after
the fence batch, so clamping the LSO at the fence batch by redefining the transaction boundaries ensures that in replication batches are not accounted for LSO.The advantage of removing mem state is that the entire transactional state left in the stm is replayed from the log, so it is easy it to port to the new consolidated producer state definitions.
Backports Required
Release Notes