-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pageserver: add no_sync
for use in regression tests (1/2)
#9677
Conversation
no_sync
and use it in regression tests
5337 tests run: 5115 passed, 0 failed, 222 skipped (full report)Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
2aa794a at 2024-11-07T16:55:01.305Z :recycle: |
9e43b43
to
d3b7850
Compare
no_sync
and use it in regression testsno_sync
for use in regression tests (1/2)
Maybe rename it to |
I was a bit intentionally vague with the name, as in future we might extend it to fsyncs in general, and I wouldn't want to have two separate configs |
## Problem Followup to #9677 which enables `no_sync` in tests. This can be merged once the next release has happened. ## Summary of changes - Always run pageserver with `no_sync = true` in tests.
## Problem `no_sync` initially just skipped syncfs on startup (#9677). I'm also interested in flaky tests that time out during pageserver shutdown while flushing l0s, so to eliminate disk throughput as a source of issues there, ## Summary of changes - Drive-by change for test timeouts: add a couple more ::info logs during pageserver startup so it's obvious which part got stuck. - Add a SyncMode enum to configure VirtualFile and respect it in sync_all and sync_data functions - During pageserver startup, set SyncMode according to `no_sync`
Problem
In test environments, the
syncfs
that the pageserver does on startup can take a long time, as other tests running concurrently might have many gigabytes of dirty pages.Summary of changes
no_sync
option to the pageserver's config.no_sync
for use in regression tests (2/2) #9678) will enable this by default in tests. We need to wait until after the next release to avoid breaking compat tests, which would fail if we set no_sync & use an old pageserver binary.Q: Why is this a different mechanism than safekeeper, which as a --no-sync CLI?
A: Because the way we manage pageservers in neon_local depends on the pageserver.toml containing the full configuration, whereas safekeepers have a config file which is neon-local-specific and can drive a CLI flag.
Q: Why is the option no_sync rather than sync?
A: For boolean configs with a dangerous value, it's preferable to make "false" the safe option, so that any downstream future config tooling that might have a "booleans are false by default" behavior (e.g. golang structs) is safe by default.
Q: Why only skip the syncfs, and not all fsyncs?
A: Skipping all fsyncs would require more code changes, and the most acute problem isn't fsyncs themselves (these just slow down a running test), it's the syncfs (which makes a pageserver startup slow as a result of other tests)
Checklist before requesting a review
Checklist before merging