Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linearizable barrier improvements #11939

Merged
merged 4 commits into from
Jul 13, 2023

Conversation

mmaslankaprv
Copy link
Member

@mmaslankaprv mmaslankaprv commented Jul 7, 2023

Improved the way how linearizable barrier is handled

Fixes: #11062

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.1.x
  • v22.3.x
  • v22.2.x

Release Notes

Improvements

  • Made linearizable barrier semantics more strict.

@mergify
Copy link

mergify bot commented Jul 7, 2023

⚠️ The sha of the head commit of this PR conflicts with #11905. Mergify cannot evaluate rules on this PR. ⚠️

1 similar comment
@mergify
Copy link

mergify bot commented Jul 7, 2023

⚠️ The sha of the head commit of this PR conflicts with #11905. Mergify cannot evaluate rules on this PR. ⚠️

@mmaslankaprv mmaslankaprv marked this pull request as draft July 7, 2023 13:35
When linearizable barrier is requested we want follower to flush its log
to make sure that all possible entries are committed with traditional
raft semantics. Added handling of flushing log on the follower if leader
requested it and append entries request is empty.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
When linearizable barrier is set we want to move committed offset
forward. In this case followers must flush their offsets to allow leader
committing its entries.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
For the STM linearizable barrier to make sense we must wait for the
offset to be applied to the stm. Otherwise the linearizable barrier
gives no guarantees about the state machine state.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
In order to prevent contention implemented sharing linearizable barrier
result between contended callers. Instead of calling linearizable
barrier multiple times a caller will wait for the result of a barrier
that is already being executed. This doesn't change the current
semnatics of linearizable barrier as either way a caller must check the
returned offset if they want to wait for the whole history to be
applied. Sharing results helps in a situation where multiple parallel
fibers try to setup linearizable barrier.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
@mmaslankaprv mmaslankaprv marked this pull request as ready for review July 10, 2023 14:30
}

// wait for the returned offset to be applied
return wait(r.value(), timeout).then([r] {
return result<model::offset>(r.value());
return wait(r.value(), timeout).then([this, r] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there may be some undesirable side effects to this optimization.. In a happy path, the shared future resolves when the wait() succeeds, consider the following sequence of actions..

  1. linearizable_barrier at offset o
  2. usual replicate() at offset o + 1
  3. linearizable_barrier at offset o + 2

(3) can potentially resolve with the same shared future as (1), we have a problem? Normally I don't think this is a problem if we validate the offset returned from linearizable_barrier (3) is > o + 1 but we don't in many places. Think we have this pattern in topics_frontend.. eg: if we create too many topics back to back and topics stm may not have applied the changes by the time barrier returns.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the barrier semantics was like that before i.e. it could always race with the replicate(), this is why i decided to it this way. This is why we return an offset that is confirmed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the barrier semantics was like that before

But 3bcf038 tightens that behavior but this optimization brings it back, no?

it could always race with the replicate()

just to be clear I'm talking about read-your-own-writes here, so its not just any replicate, its the replicate before the barrier.

Copy link
Member Author

@mmaslankaprv mmaslankaprv Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right it may happen that the barrier called after the replicate will return, the value of previous barrier, it won't hurt in this case as that was the previous semantics, but it is definitely confusing.

@mmaslankaprv mmaslankaprv merged commit 8dde461 into redpanda-data:dev Jul 13, 2023
@mmaslankaprv mmaslankaprv deleted the fix-11062 branch July 13, 2023 05:37
@vbotbuildovich
Copy link
Collaborator

/backport v23.1.x

@vbotbuildovich
Copy link
Collaborator

Failed to run cherry-pick command. I executed the commands below:

git checkout -b backport-pr-11939-v23.1.x-903 remotes/upstream/v23.1.x
git cherry-pick -x 9cd72e50af8180a2e6c69a07be281b4770a6cfb1 f936b6d5335d0e580b1e040aca711aede1b1d034 3bcf0388c0de9d44ed0f953e93c19a4e59bfc44b 7cb919c0dda574ddab1595bd5b5423fcb2c65283

Workflow run logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI Failure (startup failure) in ScalingUpTest.test_adding_multiple_nodes_to_the_cluster
4 participants