Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Describe your changes
The motivation behind using
mmap()
on Linux (and the equivalent feature on Windows - not yet supported) is to allow for immediate memory reclaiming of the coroutine stack (quite large). If heap allocations are used, although the allocated memory is freed, the dynamic memory management algorithm will not return immediately these blocks to the system as they may be reused at a later date (for obvious performance reasons). If the system is very busy, these blocks may become fragmented and new coroutine stack allocations will again require further expansion from system memory (via sbrk() or mmap()) as large contiguous areas won't be necessarily available. The result is that the application's RSS keeps increasing even through all coroutines may have finished processing.mmap()
is just as fast as a call tonew()
without the drawbacks of memory leakage. An alternative to usingmmap()
is to tinker withmallopt()
M_MMAP_THRESHOLD
andM_TRIM_THRESHOLD
but it's not as reliable asmmap()
. Furthermore, withmmap()
we can now protect the lowest page of the coroutine stack (only for pre-allocated blocks) so that we may catch any stack overflows. These overflows are hard to debug and fail in very different (often confusing) ways.One reason for not pre-allocating one, single large contiguous block for all pre-allocated coroutines (rather than piece-meal) is to prevent trampling from one stack to another. Even with page protection, a function may somehow write beyond the 4096 byte range of protection and hit the next stack. This is easily reproducible.
Testing performed
Ran entire test suite. Any coroutine allocation issues are sure to fail immediately.
Signed-off-by: Alexander Damian adamian@bloomberg.net