Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v23.1.x] Backport of #9380 #8750 #10923 #11164 #11350 #11838 #11905 #11840 #11691 #11726 #10860 #12073 #12075

Merged
merged 38 commits into from
Jul 20, 2023

Conversation

@mmaslankaprv mmaslankaprv changed the title [v23.1.x] Backport [v23.1.x] Backport of #9380 #8750 #10923 #11164 #11350 #11838 #11905 #11840 #11691 #11726 #10860 Jul 13, 2023
@mmaslankaprv mmaslankaprv requested a review from dotnwat July 13, 2023 14:12
@mmaslankaprv mmaslankaprv force-pushed the v23.1.x-backports branch 2 times, most recently from 3b4409d to 35b996b Compare July 14, 2023 05:54
@mmaslankaprv mmaslankaprv requested a review from bharathv July 14, 2023 05:54
@mmaslankaprv mmaslankaprv changed the title [v23.1.x] Backport of #9380 #8750 #10923 #11164 #11350 #11838 #11905 #11840 #11691 #11726 #10860 [v23.1.x] Backport of #9380 #8750 #10923 #11164 #11350 #11838 #11905 #11840 #11691 #11726 #10860 #12073 Jul 17, 2023
@BenPope BenPope added the kind/backport PRs targeting a stable branch label Jul 18, 2023
@BenPope BenPope added this to the v23.1.x-next milestone Jul 18, 2023
@piyushredpanda
Copy link
Contributor

/ci-repeat 1

1 similar comment
@mmaslankaprv
Copy link
Member Author

/ci-repeat 1

@ztlpn
Copy link
Contributor

ztlpn commented Jul 19, 2023

hmm, also contains commits from #11597 but looks like it is already backported?

@mmaslankaprv
Copy link
Member Author

ci falilure: #12310

@mmaslankaprv
Copy link
Member Author

The failure is not a regression

@mmaslankaprv mmaslankaprv requested a review from ztlpn July 20, 2023 06:33
rockwotj and others added 7 commits July 20, 2023 08:50
Followup from redpanda-data#10810, this moves values that were default constructed
when reading json.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
(cherry picked from commit 5906483)
Added Admin API retries on timeout to nodes decommissioning test.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit d59b9a7)
Since all decommissioning tests base on the assumption that the
decommissioning operation was successful added a safe versions of
decommission/recommission operations that will retry if
it is required.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit f842a90)
When configuration is being applied we gate updates with a
`_last_seen_version` of a configuration frontend. Previously the version
was updated after the configuration update was applied to the
`configration_manager` leading to a situation in which the configuration
was overridden by the previous update.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit 7771db9)
Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit 53cf8dc)
Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit e7ccef2)
Made documentation clear about what is being returned for each API and
matched the declared returned type with the one that actually is being
returned.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit 52722e3)
Sometimes controller log dirty offset may be helpful to understand the
gap between what is know to be committed and what is available in the
log.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit ed47391)
Controller erasure test is supposed to validate if there is a mismatch
between the last appended entry in kvstore and controller max offset. In
order for the test to work correctly we must wait for all the messages
to be committed as we only delete the last segment that contains a
single message (new replicated configuration). In order to make the test
reliable change the condition to wait for the applied offset on the node
where controller log is going to be removed to be equal to the leader
dirty offset.

Fixes: redpanda-data#8217

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit 57fb4c0)
In Kafka the semantics of `log_end_offset` is defined as a next offset
assigned to a record produced to a given partition.

When local log is empty its `dirty_offset` is equal to `start_offset -
1`. In this case a `log_end_offset` should return Kafka offset
corresponding to the `start_offset` as this is the next offset assigned
to a record produced to given topic partition.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit 1d682b9)
Added test that validates if a consumer is able to continue consuming a
log that has been completely removed by delete retention.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit 89a8d97)
When forcibly aborting reconfiguration we should wait for the new leader
to be elected in the configuration that the partition was forced to.
This way we can be certain that the new configuration will finally be
replicated to the majority of nodes even tough the leader may not
exists at the time when configuration is replicated.

Fixes: redpanda-data#9243

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit 3948312)
Change the `raft::state_machine::apply()` to always read only committed
entries. Previously it might happen that with `acks=1` some of the
entries that were not yet committed were applied to the stm.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit 1aaf98f)
…pshot`

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit a8e1a59)
Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit 4daab41)
Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit baabe8f)
If `last_applied` offset persisted in kvstore is greater than the log
dirty offset it indicates the inconsistency. The inconsistency may be a
result of intentional removal of a log segments from Redpanda data
directory. In this scenario the `last_applied` offset must be removed
from kv-store to prevent it from updating the committed offset which may
result in not committed batches being applied to stm.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit 8989a1a)
Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit 114daef)
Deltas were previously stored as a vector per `ntp`. Deltas access
pattern (iteration, inserting and popping elements from back and from
the end) makes it perfect candidate for `std::list` usage. The
`std::deque` doesn't use large contiguous allocation so will not account for
the memory fragmentation.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit da0958c)
Using `ss::chunked_fifo` to return deltas processed by controller
backend. Previously used `std::vector` may lead to large allocations as
it allocated large chunks of contiguous memory.

Fixes: redpanda-data#11673

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit fddcb30)
Made entries indicating receiving append entries and vote request more
obvious.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit 4ff810b)
Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit 7d68e46)
When a voter receives vote request and it votes for the candidate it
updates the last heartbeat timeout. If this happens during the prevote
phase and in a deployment with even number of locks it may lead to
temporary live lock and not being able to elect the leader.

Fixes: redpanda-data#11657

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit 995c775)
Added test verifying if a controller is elected in timely fashion when
some of the cluster nodes are down.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit c6555b6)
When linearizable barrier is requested we want follower to flush its log
to make sure that all possible entries are committed with traditional
raft semantics. Added handling of flushing log on the follower if leader
requested it and append entries request is empty.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit 9cd72e5)
When linearizable barrier is set we want to move committed offset
forward. In this case followers must flush their offsets to allow leader
committing its entries.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit f936b6d)
For the STM linearizable barrier to make sense we must wait for the
offset to be applied to the stm. Otherwise the linearizable barrier
gives no guarantees about the state machine state.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit 3bcf038)
In order to prevent contention implemented sharing linearizable barrier
result between contended callers. Instead of calling linearizable
barrier multiple times a caller will wait for the result of a barrier
that is already being executed. This doesn't change the current
semnatics of linearizable barrier as either way a caller must check the
returned offset if they want to wait for the whole history to be
applied. Sharing results helps in a situation where multiple parallel
fibers try to setup linearizable barrier.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit 7cb919c)
An upstream Kafka validates if fetch offset is between start offset and
log end offset. Fixed validation in Redpanda as we were validating the
end of range with high watermark.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit bd73f10)
- This change reverts a previous attempt to fix rm_stm reporting
invalid_lso (in change 20285df), which
sets `booststrap_committed_offset` in apply_snapshot() to fix the
case where invalid_lso is indefinitely returned on a node that had
just restarted, applied a snapshot, but no data was produced onto it.

- The change was however incorrect, as the bootstrap committed offset is
expected to be the value of a complete read of the log, and the value of
the offset in the rm_stm snapshot at the time of reading it at startup,
does not necessarily reflect this.

- The solution is to at startup wait until the consensus later has
modified the committed offset.

(cherry picked from commit 88cfa56)
@mmaslankaprv mmaslankaprv merged commit 9a9ea29 into redpanda-data:v23.1.x Jul 20, 2023
@mmaslankaprv mmaslankaprv deleted the v23.1.x-backports branch July 20, 2023 13:33
@BenPope BenPope modified the milestones: v23.1.x-next, v23.1.14 Aug 7, 2023
@andrewhsu
Copy link
Member

fyi i updated the PR description to remove the Release Notes section since this is a backport PR so rpchangelog will use the referenced PRs in the list to generate release notes lines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/redpanda kind/backport PRs targeting a stable branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants