Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented raft::state_machine_manager #12685

Merged
merged 14 commits into from
Sep 5, 2023

Conversation

mmaslankaprv
Copy link
Member

@mmaslankaprv mmaslankaprv commented Aug 9, 2023

State Machine Manager

Implemented raft::state_machine_manager. State machine manager is supposed to be an entry point for creating state machines built on top of data partition. All managed state machines must implement state_machine_base interface. The state machine manager uses a single fiber to apply batches to all managed state machines without copying model::record_batch and reading it multiple times. Additionally a manager is responsible for applying and taking Raft snapshots.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.2.x
  • v23.1.x
  • v22.3.x

Release Notes

  • none

@mmaslankaprv mmaslankaprv changed the title Stm manager Implemented raft::state_machine_manager Aug 9, 2023
Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeh this is starting to look nice!

@mmaslankaprv mmaslankaprv force-pushed the stm-manager branch 4 times, most recently from b15dea0 to cd233bc Compare August 17, 2023 06:32
@mmaslankaprv mmaslankaprv force-pushed the stm-manager branch 8 times, most recently from eb81b62 to 063fcba Compare August 22, 2023 16:21
@mmaslankaprv mmaslankaprv force-pushed the stm-manager branch 7 times, most recently from 8b6762e to c67fd3f Compare August 25, 2023 12:05
@mmaslankaprv mmaslankaprv force-pushed the stm-manager branch 2 times, most recently from 0661a06 to 278b82d Compare August 30, 2023 10:28
@mmaslankaprv mmaslankaprv requested a review from dotnwat August 30, 2023 12:41
@mmaslankaprv mmaslankaprv marked this pull request as ready for review August 30, 2023 12:41
rockwotj
rockwotj previously approved these changes Sep 5, 2023
* If an apply timed out (took long time) the stm may already advanced
* past what was requested to read
*/
if (state.stm_entry->stm->next() > batch.base_offset()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this assumption be documented anywhere? Maybe in state_machine_base.h?

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
State machine manager is an entry point for registering state machines
built on top of replicated log. State machine managers uses a single
fiber to read and apply record batches to all managed state machines.

When a machine throws an exception or timeouts when applying batches to
its state subsequent applies are executed in the separate apply fiber
specific for that STM.

State machine manager also takes care of the snapshot consistency. It
wraps state machine snapshots in its own snapshot format which is a map
where each individual STM snapshot is stored separately. When snapshot
is
taken the manager guarantees that all the STMs have the same offset
applied so that the snapshot is consistent across the state machines.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Added simple test fixture allowing to create a raft containing multiple
nodes. The fixture uses in memory protocol implementation that in future
will allow us to inject arbitrary failures into communication layer.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Replaced `ss::shared_promise` with `ss::condition_variable`

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
The base class i.e. `raft::state_machine` already have a raft pointer it
is pointless to store it once again in `persisted_stm`.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
Made `cluster::persited_stm` managable with `state_machine_manager`.
This way all the state machines will share the same fiber reading and
applying batches.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Reoved `_insync_offset` from `cluster::persisted_stm` tracking this
offset was redundant to `_next` as it is always offset previous to
`_next` and can be replaced with `last_applied_offset()`

Signed-off-by: Michal Maslanka <michal@redpanda.com>
The error may happen as a natural consequence of a race condition
between two segments uploads. It shouldn't make the tests to fail

Signed-off-by: Michal Maslanka <michal@redpanda.com>
@mmaslankaprv
Copy link
Member Author

ci-failure: #13181

@mmaslankaprv mmaslankaprv merged commit e16ff68 into redpanda-data:dev Sep 5, 2023
9 checks passed
@mmaslankaprv mmaslankaprv deleted the stm-manager branch September 5, 2023 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants