Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v24.1.x] [CORE-7803] Audit Log Manager: Refactoring - use and reduce retries. #23868

Conversation

oleiman
Copy link
Member

@oleiman oleiman commented Oct 21, 2024

Backport of PR #23775

Closes #23863

Useful in audit_log_manager as well as transform logging

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
(cherry picked from commit 090f41b)
Conflicts:
  No bazel anywhere
And adds make_batch_of_one

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
(cherry picked from commit 11b8a23)
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
(cherry picked from commit 37414a7)
@oleiman oleiman added this to the v24.1.x-next milestone Oct 21, 2024
@oleiman oleiman added the kind/backport PRs targeting a stable branch label Oct 21, 2024
@github-actions github-actions bot added area/redpanda area/wasm WASM Data Transforms labels Oct 21, 2024
@oleiman oleiman self-assigned this Oct 21, 2024
@oleiman oleiman marked this pull request as ready for review October 21, 2024 21:35
We need these for sorting out which partitions are locally led

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
(cherry picked from commit 5c7ed51)
Conflicts:
  audit_log_manager.cc: kafka/server/handlers/topics/types.h
Previous implementation used a very high value for retries on the
internal kafka client, which prevents the client from recovering
certain types of errors.

Instead, we batch up drained records on the manager side, allowing
us to hold a copy of each batch in memory and retry failed produce
calls from "scratch".

This also allows us to be _much_ more aggressive about batching.
The internal kafka client will calculate a destination partition
for each record, round robin style over the number of partitions.
In the new scheme, we shoot for a maximally sized batch first, then
select a destination, still round-robin style, but biasing heavily
toward locally led partitions. In this way, given the default audit
per-shard queue limit and default max batch size (both 1MiB), the
most common drain operation should result in exactly one produce
request.

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
(cherry picked from commit e5fd326)
Conflicts:
  No bazel anywhere
@oleiman oleiman force-pushed the vbotbuildovich/backport-23775-v24.1.x-221 branch from 30c139c to 7e4fef8 Compare October 21, 2024 22:22
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Oct 22, 2024

@oleiman oleiman merged commit 0ee9a52 into redpanda-data:v24.1.x Oct 22, 2024
17 checks passed
@BenPope BenPope modified the milestones: v24.1.x-next, v24.1.18 Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/redpanda area/wasm WASM Data Transforms kind/backport PRs targeting a stable branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants