-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(VirtualFile::crashsafe_overwrite): avoid Handle::block_on in callers #6731
refactor(VirtualFile::crashsafe_overwrite): avoid Handle::block_on in callers #6731
Conversation
…ath/crashsafe_overwrite/switch-to-spawn-blocking
2436 tests run: 2319 passed, 0 failed, 117 skipped (full report)Flaky tests (4)Postgres 15Code coverage (full report)
The comment gets automatically updated with the latest test results
d0f2ab3 at 2024-02-14T14:25:41.325Z :recycle: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to remove the REVIEW: tags. I see no issues apart from the cutpasted comment.
Ok, so, you don't see a problem with us now doing spawn_blocking in those places? |
…ath/crashsafe_overwrite/switch-to-spawn-blocking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. In hindsight, we should have done this way back when the prototype of crashsafe_overwrite was changed to async
.
No issues. |
…ck_on in callers" (#6765) Reverts #6731 On high tenant count Pageservers in staging, memory and CPU usage shoots to 100% with this change. (NB: staging currently has tokio-epoll-uring enabled) Will analyze tomorrow. https://neondb.slack.com/archives/C03H1K0PGKH/p1707933875639379?thread_ts=1707929541.125329&cid=C03H1K0PGKH
…dle::block_on in callers"" (#6775) Reverts #6765 , bringing back #6731 We concluded that #6731 never was the root cause for the instability in staging. More details: https://neondb.slack.com/archives/C033RQ5SPDH/p1708011674755319 However, the massive amount of concurrent `spawn_blocking` calls from the `save_metadata` calls during startups might cause a performance regression. So, we'll merge this PR here after we've stopped writing the metadata #6769).
Some callers of
VirtualFile::crashsafe_overwrite
call it on the executor thread, thereby potentially stalling it.Others are more diligent and wrap it in
spawn_blocking(..., Handle::block_on, ... )
to avoid stalling the executor thread.However, because
crashsafe_overwrite
uses VirtualFile::open_with_options internally, we spawn a new thread-localtokio-epoll-uring::System
in the blocking pool thread that's used for thespawn_blocking
call.This PR refactors the situation such that we do the
spawn_blocking
insideVirtualFile::crashsafe_overwrite
. This unifies the situation for the better:spawn_blocking(..., Handle::block_on, ...)
before no longer stall the executor.block_on
, resolving the problem with the short-livedtokio-epoll-uring::System
s in the blocking pool threads.A future PR will build on top of this and divert to tokio-epoll-uring if it's configures as the IO engine.
Changes
Convert implementation to std::fs and move it into
crashsafe.rs
durable_rename
andfsync_async_opt
recently. However,crashsafe_overwrite
is different in the sense that it's higher level, i.e., it's more likestd::fs::write
and the Safekeeper team's code is more building block style.Use
tokio::task::spawn_blocking
inVirtualFile::crashsafe_overwrite
to call the newcrashsafe::overwrite
API.Inspect all callers to remove any double-
spawn_blocking
spawn_blocking requires the captures data to be 'static + Send. So, refactor the callers. We'll need this for future tokio-epoll-uring support anyway, because tokio-epoll-uring requires owned buffers.
Related Issues
spawn_blocking+Handle::block_on