Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archival: consistent log size probes across replicas (pull) #24342

Conversation

nvartolomei
Copy link
Contributor

We called update probe only from leaders and after exiting the upload loop which led to inconsistent and stale metrics.

Fix this by introducing a subscription mechanism to the STM which is the source of truth for the manifest state and must be consistent across all replicas.

The first attempt was in
#24257 but the feedback suggested that the approach in this commit is better.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

Bug Fixes

  • Ensure redpanda_cloud_storage_cloud_log_size metric consistent across all replicas. We used to update it seldomly from the leader replica only which lead to inconsistent/stale values.

We called update probe only from leaders and after exiting the upload
loop which led to inconsistent and stale metrics.

Fix this by introducing a subscription mechanism to the STM which
is the source of truth for the manifest state and must be consistent
across all replicas.

The first attempt was in
redpanda-data#24257 but the feedback
suggested that the approach in this commit is better.
@vbotbuildovich
Copy link
Collaborator

Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the lifetimes look ok to me. do you see any issues @Lazin ?

@nvartolomei nvartolomei changed the title archival: consistent log size probes across replicas archival: consistent log size probes across replicas (pull) Nov 28, 2024
@nvartolomei nvartolomei merged commit f31bbb1 into redpanda-data:dev Nov 28, 2024
19 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v24.3.x

@vbotbuildovich
Copy link
Collaborator

/backport v24.2.x

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v24.2.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-24342-v24.2.x-71 remotes/upstream/v24.2.x
git cherry-pick -x ab1dd5371b

Workflow run logs.

@nvartolomei
Copy link
Contributor Author

/cdt
tests/rptest/scale_tests/many_partitions_test.py

@nvartolomei
Copy link
Contributor Author

/cdt
rp_version=pr
tests/rptest/scale_tests/many_partitions_test.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants