-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proxy+pageserver: shared leaky bucket impl #8539
Conversation
3853 tests run: 3747 passed, 0 failed, 106 skipped (full report)Flaky tests (7)Postgres 16
Postgres 15
Postgres 14
Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
2aa98e3 at 2024-08-29T11:36:07.181Z :recycle: |
|
I think all of the waits are for one token, so no. |
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
It should now be 100% compatible by tracking the start time, and only adjusting empty bucket position based on that start time. |
…er-leaky-bucket/rebase Conflicts: Cargo.lock proxy/src/rate_limiter.rs proxy/src/rate_limiter/leaky_bucket.rs Due to split-up of leaky_bucket.rs, the merge would have lost the (minor) changes that were made to proxy/src/rate_limiter/leaky_bucket.rs since #8539 was created. Backported them manually, see the commits in the first parent. git log -p 2416da3..origin/main -- proxy/src/rate_limiter/
I just resolved the conflicts by merging from See commit message for how I resolved conflicts: 5b9d371 |
Context why I'm looking into this: https://github.com/neondatabase/cloud/issues/16886#issuecomment-2315257641 => @conradludgate please review my recent pushes and let's get this merged. Here's my proposed updated PR description
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reviewed libs/utils/src/leaky_bucket.rs
. Didn't know GCRA before and only skimmed the Wikipedia article. Some comments
Call with conrad:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a perf test after the recent changes pushed by Conrad. All looking good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Not sure why state and config are kept completely separate instead of keeping both inside a LeakyBucket type.
In proxy, we have 1 config globally (32 bytes), then 1 state per endpoint (16 bytes), hence the split |
…he rollout to prod is safe
|
I fixed the issue regardless, so it should not matter |
I was replying on the basis of page_service requests done which I think are the only throttled ones. I think my answer still holds. Please let me know if that is wrong. |
Dashboard still doesn't show the 44 second elapsed time for query-1 that we had before (see green line in ) |
@Bodobolero let's continue the discussion in the investigation issue https://github.com/neondatabase/cloud/issues/16886#issuecomment-2324742299 |
In proxy I switched to a leaky-bucket impl using the GCRA algorithm. I figured I could share the code with pageserver and remove the leaky_bucket crate dependency with some very basic tokio timers and queues for fairness.
The underlying algorithm should be fairly clear how it works from the comments I have left in the code.
In benchmarking pageserver, @problame found that the new implementation fixes a getpage throughput discontinuity in pageserver under the
pagebench get-page-latest-lsn
benchmark with the clickbench dataset (test_perf_olap.py
).The discontinuity is that for any of
--num-clients={2,3,4}
, getpage throughput remains 10k.With
--num-clients=5
and greater, getpage throughput then jumps to the configured 20k rate limit.With the changes in this PR, the discontinuity is gone, and we scale throughput linearly to
--num-clients
until the configured rate limit.More context in https://github.com/neondatabase/cloud/issues/16886#issuecomment-2315257641.
closes https://github.com/neondatabase/cloud/issues/16886