Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit stack size #9104

Merged
merged 12 commits into from
Dec 16, 2022
Merged

Limit stack size #9104

merged 12 commits into from
Dec 16, 2022

Conversation

arnaud-lb
Copy link
Member

@arnaud-lb arnaud-lb commented Jul 22, 2022

This is an alternative to #5135 (see #5135 (comment)).

By checking the stack size in the VM, we can throw an exception before the stack overflows.

We can also check the stack size in recursive C code. As an example, this PR adds a check in zend_compile_expr() (see #8315).

By default the max allowed stack size is auto detected, but it can be controlled with the following settings:

  • zend.max_allowed_stack_size
  • zend.reserved_stack_size: Subtracted from the max allowed stack size, as a buffer, when checking for overflow

For fibers, we conveniently use fiber.stack_size as max allowed stack size.

Platform support

Stack size checking support

Platforms with stacks that grow downwards are supported (probably all platforms supported by PHP). This assumes contiguous stacks.

On other platforms, this change has no effect.

zend.max_allowed_stack_size detection

We can auto-detect the right value for zend.max_allowed_stack_size on the following OSes:

  • Linux (glibc, musl)
  • FreeBSD
  • Windows (arm untested)
  • MacOS

Otherwise, zend.max_allowed_stack_size is set to 2MiB by default. Should this default value be too high, this change will have no effect.

Not bullet proof

Stack overflows can still happen in the following cases:

  • zend.max_allowed_stack_size is configured to a too large value
  • Some C code is called that will use more than zend.reserved_stack_size bytes of stack when the stack size is already close to zend.max_allowed_stack_size-zend.reserved_stack_size. The default value of zend.reserved_stack_size accounts for the maximum alloca() size, known functions using a lot of stack, and some margin, but it will not be enough for all cases.
  • Some deeply recursive C code is called

Cases 2 and 3 can be discovered and mitigated over time.

This will still catch most cases of stack overflows, and improve the developer experience (which is the goal of this change).

Implementation

We take note of the stack start when initializing the executor, and compute the stack limit. We then compare the current stack position to the limit when entering the VM.

We can get the stack start with some non-standard pthread extensions or by looking at the process memory mappings. If determining the stack start is not possible, we fallback to the current stack position.

We can get an approximation of the current stack position with __builtin_frame_address(0), which is supported by GCC and clang on all platforms. We can fallback to the address of a local variable.

This is very close to what the V8 engine is doing to prevent stack overflows.

Risks

There is a possibility of mis-detecting the maximum stack size (when zend.reserved_stack_size=0), with the effect of throwing the exception too early or not at all.

This could happen on versions of the supported platforms that we didn't test, or if some configuration changes the stack layout.

The user could mitigate this by setting zend.max_allowed_stack_size to a fixed size, or to -1 to disable stack size checking.

We could also reduce the risk by enabling stack size checking only on the platform versions that are known to be safe.

BC breaks

  • Programs whose stack size was less than zend.reserved_stack_size away from the stack limit may stop working
  • On platforms where the stack size can not be detected, the default limit is 2MiB. Programs whose maximum call stack size is larger than that may stop working on these platforms.
  • Minimum supported Windows version bumped to Windows 8 or Windows Server 2012 (Edit: this previously mentioned Windows 2008 instead of Windows 8)
  • Internals: Custom Fiber implementations have to initialize EG(stack_limit) and EG(stack_base).
  • Internals: SAPIs that may execute PHP in alternative stacks have to set EG(stack_limit) and EG(stack_base)
  • Internals: Extensions must check the value of EG(opline_before_exception) before de-referencing it

TODO:

  • Improve the non-pthread implementation
  • Windows implementation (tested on windows 8.1, 11)
    • Windows arm64 (untested)
  • Test on FreeBSD (tested on 13.1, 13.0, 12.3)
  • Test on MacOS (tested on Monterey, Big Sur, Catalina)
  • Test a ZTS build
  • Test with musl
  • Check lifecycles / detect stack limits at the right time
  • Check exception implications
  • JIT
  • Benchmarks [result.md] (no performance impact)


?>
--EXPECTF--
Maximum call stack size reached. Infinite recursion?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to print here more details, especially the stack size and a reentry count?

@arnaud-lb arnaud-lb force-pushed the stack-limit branch 2 times, most recently from 841c974 to 1604381 Compare July 24, 2022 13:15
@@ -16,6 +16,15 @@ ZEND_API void {%EXECUTOR_NAME%}_ex(zend_execute_data *ex)
LOAD_OPLINE();
ZEND_VM_LOOP_INTERRUPT_CHECK();

#ifdef ZEND_CHECK_STACK_LIMIT
if (zend_call_stack_overflowed(EG(stack_limit))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (zend_call_stack_overflowed(EG(stack_limit))) {
if (zend_call_stack_overflowed(EG(stack_limit)) && !EG(exception)) {

I think... Otherwise we're going to enter a loop if any ZEND_HANDLE_EXCEPTION overload or destructor in leave_helper invokes the executor again.

#ifdef ZEND_CHECK_STACK_LIMIT
if (zend_call_stack_overflowed(EG(stack_limit))) {
zend_call_stack_size_error();
EG(opline_before_exception) = NULL;
Copy link
Member

@bwoebi bwoebi Jul 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too happy with this hack here requiring extra checks everywhere where EG(opline_before_exception) is accessed. AFAIK all functions do have a final return or throw op (which obviously is not in any try block (as try blocks precede their catch or finally blocks) nor having temps needing to be freed (being the last instruction)). Can we just point the EG(opline_before_exception) to &EX(func)->op_array.opcodes[EX(func)->op_array.last - 1] ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I added this as a quick fix so that I could move forward, but I will revisit this part. This may end up being the right solution though. The thing is that ZEND_HANDLE_EXCEPTION assumes that EG(opline_before_exception) was executed, which is not the case here, and tries to cleanup after it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more precise, ZEND_HANDLE_EXCEPTION assumes that everything what the opcode at EG(opline_before_exception) consumes already has been cleaned up. ZEND_HANDLE_EXCEPTION only cares about garbage having a longer lifetime than the exception. On the last exception, there will be no such garbage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK all functions do have a final return or throw op (which obviously is not in any try block (as try blocks precede their catch or finally blocks) nor having temps needing to be freed (being the last instruction)). Can we just point the EG(opline_before_exception) to &EX(func)->op_array.opcodes[EX(func)->op_array.last - 1] ?

I think that this would work, but I'm not happy about this solution.

EG(opline_before_exception) = NULL; seems to be the natural thing to do. No opline was executed before the exception, so it's natural to have EG(opline_before_exception) set to NULL.

Most usages of EG(opline_before_exception) already have non-null checks before de-referencing it (except in JIT code).

@bwoebi
Copy link
Member

bwoebi commented Jul 24, 2022

I generally think this approach is superior to the fixed vm reentry limit. Thanks for tackling this.

@arnaud-lb arnaud-lb force-pushed the stack-limit branch 4 times, most recently from 45b898d to fd4666a Compare September 17, 2022 13:04
@arnaud-lb arnaud-lb changed the title [wip] Limit stack size Limit stack size Sep 30, 2022
* | | v
* Mapping start --> |-------------------| <-- Current stack end
* (adjusted : :
* downards as the . .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

downwards (missing a w).

@mvorisek
Copy link
Contributor

in #5135 (comment) I have shown infinite recursion is possible by fibers - would such demo with millions of nesting still pass with this PR?

@arnaud-lb
Copy link
Member Author

Your segmented stack demo still passes. This PR marginally reduces the available stack space by about zend.reserved_stack_size bytes. Code that would crash with a SIGSEGV after using n bytes of stack in master may now throw an Error exception when using n-zend.reserved_stack_size bytes of stack.

Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the solution looks too complex and expensive.

It's possible to create guard page(s) at the end of the stack, catch SIGSEGV signal, check the stack pointer, unwind/longjump, emit error message and die. Most probably system already creates guard pages for processes and threads, and "manual" guard pages are necessary only for fibers. The similar technique is used to extend thread stack in HotSpot JVM. See https://pangin.pro/posts/stack-overflow-handling

Also, I wouldn't allow different stack limits for different requests/directories/etc. They should be INI_SYSTEM.

I would

@arnaud-lb
Copy link
Member Author

arnaud-lb commented Oct 24, 2022

I think the solution looks too complex and expensive.

The performance impact is negligible according to these benchmark results: https://gist.github.com/arnaud-lb/02cf91ad0aa84a0caa3d349eab194a8b.

We could still move the stack detection elsewhere so that it would be done only once per process, though.

Agreed that the part that detects the stack base and maximum size adds some complexity.

It's possible to create guard page(s) at the end of the stack, catch SIGSEGV signal, check the stack pointer, unwind/longjump, emit error message and die. Most probably system already creates guard pages for processes and threads, and "manual" guard pages are necessary only for fibers. The similar technique is used to extend thread stack in HotSpot JVM. See https://pangin.pro/posts/stack-overflow-handling

I like this solution, but I fear that it may be more complex than that in our case. We can find the stack pointer or the address that caused the fault, but it seems to me that we need to know the stack layout to decide if it's a stack overflow?

Also, I don't know what is safe to do when handling the SIGSEGV. Is it safe to call zend_error() (we don't know if we are in a consistent state; also, can we generate a backtrace?), or can we only print to stderr and abort ?

@mvorisek
Copy link
Contributor

simple code https://3v4l.org/IpRv7/rfc#vgit.master still stops on (unrecoverable) OOM error

is that intended and should it be addressed by #5135? Or is #5135 too complicated /w this PR? If yes, then the (php) stack depth should be simply checked before a call into next frame/function is made.

@arnaud-lb
Copy link
Member Author

@mvorisek this is expected, as this code snippet is not triggering a stack overflow. This PR attempts to avoid stack overflows because these result in crashes that are hard to debug for most people: the process crashes with no context related to the PHP code that was executing.

OOMs result in nicer errors, even though they are not recoverable: The error message specifies what happened and the filename / lineno that triggered it.

#5135 limits the number of VM reentries to prevent stack overflows, and does not prevent infinite recursion. Xdebug has a setting to limit the recursion depth.

OOMs are not specific to recursion. Making them recoverable would require to change every code that calls emalloc() to gracefully handle allocation failures, which may not be worth it because of the extra complexity.

@mvorisek
Copy link
Contributor

Thank you for the explanation - #5135 should be then closed (as it is an alternative impl. of this PR).

OOMs are not specific to recursion. Making them recoverable would require to change every code that calls emalloc() to gracefully handle allocation failures, which may not be worth it because of the extra complexity.

what about checking free memory (in sense of memory limit) before common operations with some arbitrary reserve like 1MB or 5% or total memory limit?

it would make 99.9% OOM errors recoverable as small unchecked allocations between the checked one would typically fix into the reserve

@Jan-E
Copy link
Contributor

Jan-E commented Jul 8, 2023

@arnaud-lb @cmb69 Somebody noticed that the use of GetCurrentThreadStackLimits also implies that PHP 8.3 will not run on Windows 7 anymore. I tested this as well on a Windows 2008 R2 Web server: PHP 8.3 will not run there either.

IMHO it should be fixed or mentioned in UPGRADING.

The user report: https://www.apachelounge.com/viewtopic.php?p=42097#42097

@arnaud-lb
Copy link
Member Author

Thank you for reporting this, but this is intentional: #9104 (comment)

I will mention this in UPGRADING

arnaud-lb added a commit that referenced this pull request Jul 8, 2023
@pronskiy pronskiy mentioned this pull request Nov 18, 2023
7 tasks
cmb69 added a commit to cmb69/php-src that referenced this pull request Sep 21, 2024
As of PHP 8.3.0, Windows 8/Server 2012 are the minimum requirement.
However, PR php#9104 only updated `_WIN32_WINNT`, but not `WINVER`[1]`
`NTDDI_VERSION`[2] nor the manifest[3].

[1] <https://learn.microsoft.com/en-us/windows/win32/winprog/using-the-windows-headers>
[2] <https://learn.microsoft.com/en-us/windows/win32/winprog/using-the-windows-headers>
[3] <https://learn.microsoft.com/en-us/windows/win32/sbscs/application-manifests>
cmb69 added a commit to cmb69/php-src that referenced this pull request Sep 21, 2024
As of PHP 8.3.0, Windows 8/Server 2012 are the minimum requirement.
However, PR php#9104 only updated `_WIN32_WINNT`, but not `WINVER`[1],
`NTDDI_VERSION`[2] nor the manifest[3].

[1] <https://learn.microsoft.com/en-us/windows/win32/winprog/using-the-windows-headers>
[2] <https://learn.microsoft.com/en-us/windows/win32/winprog/using-the-windows-headers>
[3] <https://learn.microsoft.com/en-us/windows/win32/sbscs/application-manifests>
cmb69 added a commit that referenced this pull request Sep 22, 2024
As of PHP 8.3.0, Windows 8/Server 2012 are the minimum requirement.
However, PR #9104 only updated `_WIN32_WINNT`, but not `WINVER`[1],
`NTDDI_VERSION`[2] nor the manifest[3].

[1] <https://learn.microsoft.com/en-us/windows/win32/winprog/using-the-windows-headers>
[2] <https://learn.microsoft.com/en-us/windows/win32/winprog/using-the-windows-headers>
[3] <https://learn.microsoft.com/en-us/windows/win32/sbscs/application-manifests>

Closes GH-15975.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants