pageserver: default to 4MiB stack size and add env var to control it #8862
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
In #8832 I get tokio runtime worker stack overflow errors in debug builds.
In a similar vein, I had tokio runtimer worker stack overflow when trying to eliminate
async_trait
(#8296).The 2MiB default is kind of arbitrary - so this PR bumps it to 4MiB.
It also adds an env var to control it.
Risk Assessment
With our 4 runtimes, the worst case stack memory usage is
4 (runtimes) * ($num_cpus (executor threads) + 512 (blocking pool threads)) * 4MiB
.On i3en.3xlarge, that's
8384 MiB
.On im4gn.2xlarge, that's
8320 MiB
.Before this change, it was half that.
Looking at production metrics, we do have the headroom to accomodate this worst case case.
Alternatives
The problems only occur with debug builds, so technically we could only raise the stack size for debug builds.
However, it would be another configuration where
debug != release
.Future Work
If we ever enable single runtime mode in prod (=> #7312 ) then the worst case will drop to 25% of its current value.
Eliminating the use of
tokio::spawn_blocking
/tokio::fs
in favor oftokio-epoll-uring
(=> #7370 ) would reduce the worst case to4 (runtimes) * $num_cpus (executor threads) * 4 MiB
.