-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pageserver: revert flush backpressure (#8550) #10135
Conversation
7110 tests run: 6813 passed, 0 failed, 297 skipped (full report)Flaky tests (7)Postgres 17
Postgres 16
Postgres 15
Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
9f3f060 at 2024-12-13T14:32:24.415Z :recycle: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me but I want to see if Christian is okay with this (I think he is the one who originally suggested it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a few tests that fail, they need fixing.
I know, working my way through them. |
In #10144, I add parallel layer uploads by only scheduling index uploads (which act as upload queue barriers) every 8 layers. If we want to derisk this, a halfway solution might be to keep the flush backpressure for now, but only trigger it every time we perform an index upload (every 8 layers) -- this would provide some parallelism and pipelining without building up an unbounded upload queue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving under the condition that this time we actually implement compact debt based backpressure.
Is this in Q1 planning?
I like your plan in #10095
Merging this to get some benchmark runtime.
I took this to be part of "Write path throttling", which is right on the edge of Q1 inclusion. I agree that it's worth bumping this, certainly more so than S3 throttling. cc @jcsp
Sharding complicates #10095 and can lead to deadlock. I have an updated proposal in #8390. |
This reverts commit f3ecd5d. It is [suspected](https://neondb.slack.com/archives/C033RQ5SPDH/p1735907405716759) to have caused significant read amplification in the [ingest benchmark](https://neonprod.grafana.net/d/de3mupf4g68e8e/perf-test3a-ingest-benchmark?orgId=1&from=now-30d&to=now&timezone=utc&var-new_project_endpoint_id=ep-solitary-sun-w22bmut6&var-large_tenant_endpoint_id=ep-holy-bread-w203krzs) (specifically during index creation). We will revisit an intermediate improvement here to unblock [upload parallelism](#10096) before properly addressing [compaction backpressure](#8390).
This reverts commit f3ecd5d. It is [suspected](https://neondb.slack.com/archives/C033RQ5SPDH/p1735907405716759) to have caused significant read amplification in the [ingest benchmark](https://neonprod.grafana.net/d/de3mupf4g68e8e/perf-test3a-ingest-benchmark?orgId=1&from=now-30d&to=now&timezone=utc&var-new_project_endpoint_id=ep-solitary-sun-w22bmut6&var-large_tenant_endpoint_id=ep-holy-bread-w203krzs) (specifically during index creation). We will revisit an intermediate improvement here to unblock [upload parallelism](#10096) before properly addressing [compaction backpressure](#8390).
Problem
In #8550, we made the flush loop wait for uploads after every layer. This was to avoid unbounded buildup of uploads, and to reduce compaction debt. However, the approach has several problems:
disk_consistent_lsn
), and will build up in-memory layers.An alternative solution to these problems is proposed in #8390.
In the meanwhile, we revert the change to reduce the impact on ingest throughput. This does reintroduce some risk of unbounded upload/compaction buildup. Until #8390, this can be addressed in other ways:
max_replication_apply_lag
(akaremote_consistent_lsn
), which will more directly limit upload debt.Touches #10095.
Summary of changes
Remove waiting on the upload queue in the flush loop.