Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storcon: don't take any Service locks in /status and /ready #9944

Merged
merged 2 commits into from
Dec 1, 2024

Conversation

jcsp
Copy link
Collaborator

@jcsp jcsp commented Nov 29, 2024

Problem

We saw unexpected container terminations when running in k8s with with small CPU resource requests.

The /status and /ready handlers called maybe_forward, which always takes the lock on Service::inner.

If there is a lot of writer lock contention, and the container is starved of CPU, this increases the likelihood that we will get killed by the kubelet.

It isn't certain that this was a cause of issues, but it is a potential source that we can eliminate.

Summary of changes

  • Revise logic to return immediately if the URL is in the non-forwarded list, rather than calling maybe_forward

@jcsp jcsp added t/bug Issue Type: Bug c/storage/controller Component: Storage Controller labels Nov 29, 2024
@jcsp jcsp requested a review from VladLazar November 29, 2024 16:07
@jcsp jcsp marked this pull request as ready for review November 29, 2024 16:08
@jcsp jcsp requested a review from a team as a code owner November 29, 2024 16:08
@jcsp jcsp enabled auto-merge November 29, 2024 16:43
Copy link

github-actions bot commented Nov 29, 2024

6952 tests run: 6644 passed, 0 failed, 308 skipped (full report)


Flaky tests (2)

Postgres 17

Code coverage* (full report)

  • functions: 30.3% (8185 of 27046 functions)
  • lines: 47.7% (64863 of 135981 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
b003de1 at 2024-11-29T19:21:23.566Z :recycle:

@jcsp jcsp added this pull request to the merge queue Dec 1, 2024
Merged via the queue into main with commit 14853a3 Dec 1, 2024
80 checks passed
@jcsp jcsp deleted the jcsp/storcon-nonblocking-hooks branch December 1, 2024 18:10
awarus pushed a commit that referenced this pull request Dec 5, 2024
## Problem

We saw unexpected container terminations when running in k8s with with
small CPU resource requests.

The /status and /ready handlers called `maybe_forward`, which always
takes the lock on Service::inner.

If there is a lot of writer lock contention, and the container is
starved of CPU, this increases the likelihood that we will get killed by
the kubelet.

It isn't certain that this was a cause of issues, but it is a potential
source that we can eliminate.

## Summary of changes

- Revise logic to return immediately if the URL is in the non-forwarded
list, rather than calling maybe_forward
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/controller Component: Storage Controller t/bug Issue Type: Bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants