Builtin function to tell you the maximum stack size of a given function #157

andrewrk · 2016-07-10T15:47:59Z

I propose a @stack_size(function) builtin which returns the number of bytes that function - which must be a compile time constant function - will ever possibly use for the deepest call invocation.

This function causes it to become a compile error if the function named - or any function called therein, invokes direct or indirect recursion on a non-compile time constant parameter. ~~Similarly, if any function causes a runtime stack allocation to occur, such as var foo: [x]u8 = undefined;, where x is a parameter to the function invoked with a non compile time constant.~~ This feature has been removed from zig

This builtin function could be used when creating a new thread, to determine the stack size. This stack would never have to grow or shrink, and it would be the maximum number of bytes that could possibly be used, likely very small.

It would also force the programmer to not use recursion on user input, a limitation that some categories of software require, and violating this would be caught by the compiler, since it makes returning a correct number from @stack_size() impossible.

Users may even want to introduce this limitation on the main thread, and they could do this by adding a top level assert such as comptime { assert(@stack_size(main) <= 16 * 1024 * 1024); } or some such.

If they do this, then we can modify the binary's stack size to be only as large as necessary, which is probably a minuscule improvement in memory footprint and startup time, but it's a strong guarantee of robustness at least in this one regard.

Another thing that would cause this function to cause a compile error is calls to an external library. External library functions could be annotated with a maximum stack size attribute, or we can have another threading abstraction that does not use @stack_size() for when we don't know the upper bound of how big a stack will grow.

Tangential idea: when Zig outputs a shared library, in the .h file it generates, it can annotate functions with additional information in the comments, such as maximum stack size, and Zig can look for these annotations when importing .h files.

This function also introduces some weird interdependencies. For example, I could do:

fn a() {
    var bytes: [@stack_size(b)] = undefined;
}
fn b() {
    a();
}

Zig would have to detect this dependency loop and report a compile error.

Implementation of this would be a bit tricky since currently it is LLVM that determines the stack size of everything, and optimizations can increase and decrease the stack size of various functions. However, I think it is possible to have a sound upper bound.

@thejoshwolfe thoughts?

The text was updated successfully, but these errors were encountered:

thejoshwolfe · 2016-07-10T17:34:48Z

My first interpretation of @stack_size(function) was that it was the immediate stack requirement just for that function, not the transitive worse case. The most concise and complete name i can think of for what you're describing is @worst_case_transitive_stack_size(function), but that's probably too long. Hmm.

Some of your wording sounds like you're going to impose this limitation everywhere, which is a bad idea. Having this feature available for new threads and for the binary's main function is cool, but we can't force users to never recurse on user input. That would make a recursive-descent parser impossible in Zig, which seems like a silly limitation.

Speaking of recursive descent parsers, the v8 javascript engine has a runtime check for a stack overflow in its recursive descent parser. This is done with pointer arithmetic on local variables before and during the recursive descent. This is pretty elegant in the sense that it results in simple machine code, but i don't like how it means it has to be limited to bytes instead of to recursion levels. Consider you were writing the Linux kernel code that limits symlink recursion to 40 levels. In that case, you would probably want to allocate a stack of something + something * 40 (rounded up to a page boundary), which brings me to my next proposal.

What if we could expose the innerworkings of this function to the user at compiletime-constant time? What if we could give the non-recursive stack requirement of each individual function, and additionally give the user a way to walk the callgraph. Then the user could detect all the paths where their recursive descent parser recurses, calculate the maximum stack requirement for a recursion level, and then translate between maximum recursion levels and bytes required for the the stack (calculating either one from the other).

We can also have the compiler do all this for us with the limitation that there needs to be a deterministic, finite answer (your proposal), but that special case might miss out on a lot of other potential from this kind of functionality. Or maybe I'm getting too excited about YAGNI-cream pies in the sky, but i think it's worth discussing at least.

kyle-github · 2017-09-16T03:19:56Z

I am not sure, but I think this is undecidable if you allow function pointers to be called. You can manually set up recursion that way. So, I think you would need to restrict any call to any function that is not a compile time constant that can be checked against the function in question.

What is the purpose for this restriction? Having done some embedded coding, I can see a use in that field.

IIRC, LLVM supports tail recursion. In that case, you could have an unknown amount of recursion at compile time yet still have a constant stack size.

andrewrk · 2017-09-16T17:59:27Z

I am not sure, but I think this is undecidable if you allow function pointers to be called. You can manually set up recursion that way. So, I think you would need to restrict any call to any function that is not a compile time constant that can be checked against the function in question.

In order for this feature to work, zig would have to know the set of functions a function pointer could potentially call. Difficult, but perhaps not impossible.

What is the purpose for this restriction?

Statically guarantee that a program will not have a stack overflow
Threads can have smaller stack sizes, lowering the cost of creating a thread

IIRC, LLVM supports tail recursion. In that case, you could have an unknown amount of recursion at compile time yet still have a constant stack size.

Yes, when performing the stack size requirements, zig should identify tail recursion and not count it.

kyle-github · 2017-09-16T22:33:33Z

(Can't figure out how to quote a list)
@andrewrk said:

Statically guarantee that a program will not have a stack overflow
Threads can have smaller stack sizes, lowering the cost of creating a thread

I think the most common use cases here will be:

Embedded systems. A very restrictive set of requirements to enable static stack size calculations would be perfectly fine in this field. Also, note that embedded is unlikely to want the whole standard library at all. Often that is completely dropped and a very small skeleton or even no standard library is used. So, making the standard lib functions conform to the restrictive requirements is really not needed. It would be better to have a tiny, very modular library instead (think musl or uLibC).
big systems like containers, phones, desktops, and servers (in order of capacity). Even a container is going to have hundreds of MB of RAM and more than one CPU in the common case. Since I can have 1000 threads easily on a 2-core laptop with a few GB of RAM, the scalability is already there with Linux and Windows. MacOS and iOS less thread friendly, but bluntly, macOS is not a server OS and neither is iOS. Android is Linux+Java both of which are thread-friendly.

If you want to be able to have a threading model like Erlang or Go, then you would need to do a lot of gymnastics anyway in the run time (and forget embedded). It is unclear that you can scale to that many independent threads without GC, realistically. (As an aside, in the embedded world if you want to retain many threads of control, people tend to use protothreads anyway. Adam Dunkels is amazing.)

My arguments boil down to two things: in the embedded space you already have so many restrictions that having harsh ones in order to calculate static stack size is fine and the result of proving the static stack size is very, very desirable and you really do not care about the standard library. In the rest of computing, you have enough RAM and CPU to not worry about it.

I have run tests on Linux and created 1000 threads (that did something, not just looping) and found that each used about 32-64k for the stack in real physical RAM. That was 1000 threads in less than 64MB of space. Is that really a problem? With the recent massive increase in physical cores (thank you, AMD) that is happening, having a pile of real threads is not a problem. In Linux I ran the same kind of test, but fork()ed and created new processes and did not get much of a difference. It is very efficient.

What kinds of restrictions would be necessary to have a static stack size calculated? I think almost anything that could be compile time could also have this property. There are surely some edge cases I am missing though :-(

andrewchambers · 2017-10-20T06:15:18Z

I think this is feature is neat for coroutine's as a library when combined with long jump. If you look at bearssl, the author wrote his own statically checked language just to get this feature. It let him do zero allocations in the library, but still use coroutines to let him yield control flow.

andrewrk · 2017-10-21T17:47:20Z

I started a discussion on the llvm-dev mailing list.
http://lists.llvm.org/pipermail/llvm-dev/2017-October/118357.html

skyfex · 2017-12-14T20:36:36Z

I've wanted this feature from a low-lever programming language a long time. It would be very useful in some embedded systems.

As thejoshwolfe mentions, it would probably a good to have both the immediate stack requirement (not sure what the use-case is though) and the worst case requirement.

There are obviously a lot of things you can't do in these functions (I see you covered everything I thought about and then some in the llvm-dev thread andrewrk). But that's OK. Sometimes it's worth it to work within certain restrictions to be able to prove certain things (like pure/functional functions and global state)

binary132 · 2018-05-05T03:22:24Z

you have enough RAM and CPU to not worry about it

This is never true at sufficient scale.

kyle-github · 2018-07-18T04:53:29Z

A group is working on this on top of LLVM:

static stack depth analysis tool

fixes test failures See #157

milkowski · 2019-11-18T11:50:56Z

Isn't it just undecidable https://en.wikipedia.org/wiki/Halting_problem in general? What makes it different?

shawnl · 2019-11-18T11:54:00Z

It is in the general case. With this feature on, We will limit call graphs to make it not turing complete. That is the idea, but I myself am not sure how that will actually work, especially with function pointers. El lun., 18 nov. 2019 15:51, Wojciech Miłkowski <notifications@github.com> escribió:

…

Isn't it just undecidable https://en.wikipedia.org/wiki/Halting_problem in general? What makes it different? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#157?email_source=notifications&email_token=AAD4W4RXW2EHT6O5LQ4RTALQUJ6SVA5CNFSM4CJDOB3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEKF4SI#issuecomment-554982985>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAD4W4W22YCYPZRDJUNQZU3QUJ6SVANCNFSM4CJDOB3A> .

matu3ba · 2021-01-20T20:55:30Z

I have run tests on Linux and created 1000 threads (that did something, not just looping) and found that each used about 32-64k for the stack in real physical RAM. That was 1000 threads in less than 64MB of space. Is that really a problem?

Anything that uses shared cache (L3) delays execution for other threads and programs, when they need to fetch back the dropped memory into cache. I think its mainly about one of the language guarantees zig wants to offer: robust (segfaulting on stack overflow is no option), optimal (dont run-time check stuff that you dont need to).

One can use pthread_attr_setstack to manually manage the stack size on Linux systems. For processes, there is setrlimit.

What kinds of restrictions would be necessary to have a static stack size calculated? I think almost anything that could be compile time could also have this property. There are surely some edge cases I am missing though :-(

The whole problem of an exact computation can be reduced to the halting problem,
since we must reason over the reachable paths of any input to estimate which functions
are used and which are unused to get the accurate stack size.

For a relative strict computation (only over-approximate arithmetic results to get the reachable paths for the function calls)
we have the following constrains

Main thread

only comptime-bounded recursion (recursion result needs to be known at comptime) => or ban recursion
no input dependencies on heap to decide function calls => simplification
no influence of aliased variables on function calls => simplification
only arithmetic or boolean operations have influence on function calls => simplification (verifying bitmasks or floats is hard)
function pointers are banned => would require precomputing all possible execution points of the function
gotos or jumps outside the current functions are banned => at least force the usage of (naked) functions as entrypoints
assembly on stack manipulation or manipulating variables that modify control flow related to function execution => less pain

Additional constraints for threads besides the main thread:

comptime-known bound of maximum threads => simplification
~~2. no detach => only when threads are finished, stack is freed. Operating system can mess this up (pending writes)~~

Interrupt size also needs to be analysed on top: The Kernel can interrupt programs at any point and add to the stack
what the Kernel thinks is necessary to resume later on the program.

Ideas

Ask the Kernel and hope it can reply
use a "safe from experience default" and hope for the best
The Kernel is written in Zig or we are on freestanding, so we know or can define the constrains for program execution:
3.1. interrupts may only use comptime-computable variable sizes
3.2. interrupts may not call non-interrupt code (functions, labels, etc forbidden)
3.3. interrupt logic must define possible overlaps of interrupts (only feasible for no operating system/no software interrupts)
3.4. retrieving interrupt routine code (this is C and assembly code) will be hard

References
embedded source,
arm KEIL claims its too complex due to asynchronous events and interrupts
and @EleanorNB and I suspect this is due to Kernel complexity.
asynchronous events are defined here

Even without the Kernel I am only aware of stackanalyzer from absint which supports this use case.
Knowledge in Astrée or AstréeA or PC-Lint from Gimpel should be helpful, but the hardest parts for accurate results are getting useful arithmetic overapproximations.

matu3ba · 2021-01-20T21:17:19Z

After quite long discussion with @EleanorNB and some afterthought I get to the following possible simplifications:
A more simple overapproximation may lead to very significant inaccuracies, but for completeness the requirements
are sketched here

The idea is to always assume the worst case in the call graph.
For programs intended to terminate eventually, we can parallize the analysis
by starting a new analysis for each subfunction.

pub fn main() {
  fn1(); // parallel run of analysis1
  fn2(); // parallel run of analysis2
  ..
} //main analysis waits for threads to finish to sum up the maximum stack size

The most important part is analyzing the Kernel implementation or getting overapproximation results from the Kernel
how much additional space on the stack is required during interrupts.
copied from above in the section Ideas:

Ask the Kernel and hope it can reply
use a "safe from experience default" and hope for the best
The Kernel is written in Zig or we are on freestanding, so we know or can define the constrains for program execution:
3.1. interrupts may only use comptime-computable variable sizes
3.2. interrupts may not call non-interrupt code (functions, labels, etc forbidden)
3.3. interrupt logic must define possible overlaps of interrupts (only feasible for no operating system/no software interrupts)
3.4. retrieving interrupt routine code (this is C and assembly code) will be hard

Additionally we need:

ban recursion, which can not be computed/an uppper bound of executions can be detected at comptime (unbounded stack size)
assembly on stack manipulation, gotos or jumps outside the current function are banned (or are annotated)

matu3ba · 2022-04-09T11:30:39Z

AdaCore has GNATstack as tool. However, the description in the link is lacking some aforementioned assumptions and/or is unprecise on what type of arithmetic properties with resulting call graph information can be derived https://www.adacore.com/gnatpro/toolsuite/gnatstack.

matu3ba · 2023-06-14T19:20:34Z

Related is https://interrupt.memfault.com/blog/measuring-stack-usage as state of art description:

1. clang/gcc -c -fstack-usage stack-usage-example.c

// stack-usage-example.c
#include <stdio.h>

int foo_2(int c) {
  int array[4];
  array[1] = c;
  array[3] = c* c;
  return array[3] - array[1];
}

int foo(int a, int b) {
  int array[10];
  array[1] = a;
  array[9] = b;

  return array[1] * array[9];
}

int main() {
  printf("%d\n", foo(1,2));
}

cat stack-usage-example.su
example/stack-usage/stack-usage-example.c:3:5:foo_2     64      static
example/stack-usage/stack-usage-example.c:10:5:foo      80      static
example/stack-usage/stack-usage-example.c:18:5:main     16      static

However, a bunch of different tooling for analysis appears to be needed.

1. Runtime analysis with stack painting (very common in embedded systems, and is used by both FreeRTOS and Zephyr RTOS): Use fill pattern, in here are also Zigs AAAAAAAA used.
- System executes code to get a reasonable estimation of stack usage
1. gdb single stepping

andrewrk added the enhancement Solving this issue will likely involve adding new logic or components to the codebase. label Jul 10, 2016

andrewrk mentioned this issue Aug 25, 2016

solve concurrency #174

Closed

8 tasks

andrewrk added this to the 0.3.0 milestone May 7, 2017

kyle-github mentioned this issue Oct 2, 2017

usecase: memory-bound design using fixed-size allocators #518

Closed

andrewrk mentioned this issue Dec 7, 2017

stack traces for errors #651

Closed

andrewrk mentioned this issue Dec 15, 2017

Std sort fails simple test case #657

Closed

thejoshwolfe mentioned this issue Jan 16, 2018

explicit tail call syntax #694

Open

andrewrk mentioned this issue Feb 21, 2018

Coroutines (async/await/promise) language support #727

Closed

andrewrk modified the milestones: 0.3.0, 0.4.0 Feb 28, 2018

This was referenced May 11, 2018

safe recursion #1006

Open

Proposal: ability to execute a function on a different stack WebAssembly/design#1207

Open

shawnl mentioned this issue Aug 19, 2018

reintroduce VLAs and alloca() on auxillary stack #1374

Closed

andrewrk mentioned this issue Oct 1, 2018

in debug builds, detect stack overflow at runtime and output a partial stack trace #1616

Open

andrewrk added a commit that referenced this issue Oct 1, 2018

increase stack size on windows for all executables

d4d22df

fixes test failures See #157

This was referenced Oct 2, 2018

syntax for guaranteed coroutine frame allocation elision #1260

Closed

eliminate stack overflow #1639

Open

andrewrk mentioned this issue Oct 24, 2018

[WIP] FreeBSD support #1661

Closed

andrewrk mentioned this issue Nov 23, 2018

Self-referential function calls segfault #1772

Closed

andrewrk modified the milestones: 0.4.0, 0.5.0 Jan 31, 2019

andrewrk mentioned this issue Feb 10, 2020

Plan to make @Frame work/error on call chain involving inline assembly, extern fns, etc? #4387

Closed

andrewrk modified the milestones: 0.6.0, 0.7.0 Feb 10, 2020

andrewrk mentioned this issue Feb 29, 2020

Buffer Overflow / Stack Canary #4542

Closed

andrewrk modified the milestones: 0.7.0, 0.8.0 Oct 9, 2020

andrewrk modified the milestones: 0.8.0, 0.9.0 May 19, 2021

andrewrk modified the milestones: 0.9.0, 0.10.0 Nov 20, 2021

rohlem mentioned this issue Feb 23, 2022

Recursion detection is over aggressive #10967

Closed

andrewrk modified the milestones: 0.10.0, 0.11.0 Apr 16, 2022

andrewrk modified the milestones: 0.11.0, 0.12.0 Apr 9, 2023

andrewrk modified the milestones: 0.13.0, 0.12.0 Jun 29, 2023

andrewrk mentioned this issue Dec 29, 2023

passing --stack to zig test makes the getrlimit/setrlimit test case fail #18395

Closed

rohlem mentioned this issue May 17, 2024

BoundedArray segfaults for items larger than stack #19954

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Builtin function to tell you the maximum stack size of a given function #157

Builtin function to tell you the maximum stack size of a given function #157

andrewrk commented Jul 10, 2016 •

edited

Loading

thejoshwolfe commented Jul 10, 2016

kyle-github commented Sep 16, 2017

andrewrk commented Sep 16, 2017

kyle-github commented Sep 16, 2017 •

edited

Loading

andrewchambers commented Oct 20, 2017

andrewrk commented Oct 21, 2017

skyfex commented Dec 14, 2017

binary132 commented May 5, 2018

kyle-github commented Jul 18, 2018

milkowski commented Nov 18, 2019

shawnl commented Nov 18, 2019 via email

matu3ba commented Jan 20, 2021 •

edited

Loading

matu3ba commented Jan 20, 2021 •

edited

Loading

matu3ba commented Apr 9, 2022 •

edited

Loading

matu3ba commented Jun 14, 2023

Builtin function to tell you the maximum stack size of a given function #157

Builtin function to tell you the maximum stack size of a given function #157

Comments

andrewrk commented Jul 10, 2016 • edited Loading

thejoshwolfe commented Jul 10, 2016

kyle-github commented Sep 16, 2017

andrewrk commented Sep 16, 2017

kyle-github commented Sep 16, 2017 • edited Loading

andrewchambers commented Oct 20, 2017

andrewrk commented Oct 21, 2017

skyfex commented Dec 14, 2017

binary132 commented May 5, 2018

kyle-github commented Jul 18, 2018

milkowski commented Nov 18, 2019

shawnl commented Nov 18, 2019 via email

matu3ba commented Jan 20, 2021 • edited Loading

matu3ba commented Jan 20, 2021 • edited Loading

matu3ba commented Apr 9, 2022 • edited Loading

matu3ba commented Jun 14, 2023

andrewrk commented Jul 10, 2016 •

edited

Loading

kyle-github commented Sep 16, 2017 •

edited

Loading

matu3ba commented Jan 20, 2021 •

edited

Loading

matu3ba commented Jan 20, 2021 •

edited

Loading

matu3ba commented Apr 9, 2022 •

edited

Loading