Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low recursion limit on Windows port with MSVC #2927

Closed
ARF1 opened this issue Mar 2, 2017 · 21 comments
Closed

Low recursion limit on Windows port with MSVC #2927

ARF1 opened this issue Mar 2, 2017 · 21 comments

Comments

@ARF1
Copy link

ARF1 commented Mar 2, 2017

I ran into a Maximum recursion depth exceeded error while developing on micropython on Windows. I was somewhat surprised as I did not think that my code was excessively complex.

I used the code from this forum post to test the recursion depth:

MicroPython v1.8.7 on 2017-03-02; win32 version
Use Ctrl-D to exit, Ctrl-E for paste mode
>>>
paste mode; Ctrl-C to cancel, Ctrl-D to finish
=== a = 0
=== fail = False
=== def foo():
===     global a, fail
===     a += 1
===     if not fail:
===         try:
===             foo()
===         except:
===             fail = True
===
=== foo()
=== print(a)
12
>>>

It seems the recursion depth is 12 on Windows? Can that be right? How is it controlled? The same forum post mentions that on the Unix build, the recursion depth is 166. Even on the esp8266 it seems to be 19.

@robert-hh
Copy link
Contributor

Cannot confirm.
My test reports 130
Fresh build of today's repository:
MicroPython v1.8.7 on 2017-03-02; win32 version
Use Ctrl-D to exit, Ctrl-E for paste mode

@ARF1
Copy link
Author

ARF1 commented Mar 2, 2017

@robert-hh I assume you used the same code I used to test. That would only leave the compiler / build-environment to account for the difference. What compiler did you use to build micropython? I used a MSVC 2015 build.

@robert-hh
Copy link
Contributor

gcc (i686-win32-dwarf-rev1, Built by MinGW-W64 project) 6.2.0
Copyright (C) 2016 Free Software Foundation, Inc.

@ARF1
Copy link
Author

ARF1 commented Mar 2, 2017

@robert-hh I do not have MinGW installed and I do not think you are happy to test building with MSVC... Would you mind uploading your binary so I can test it on my machine? E.g. using reep.io .

@pfalcon
Copy link
Contributor

pfalcon commented Mar 2, 2017

Recursion depth is controlled by stack limit - whatever fits into it, that you get.

@stinos
Copy link
Contributor

stinos commented Mar 2, 2017

For x86 the stack limit set in main() is 40000 so that would mean every iteration of foo() in the sample code costs like 3000 bytes (if I understand the units correctly). That sounds like a lot, @ARF1 are you using a debug build maybe? Or maybe the stack check code makes assumptions about stack which depend on gcc but I doubt that, then nothing would work at all for msvc build I guess. Maybe msvc builds are just more stack hungry due to whatever optimizations applied, that's certainly possible no?

@robert-hh
Copy link
Contributor

@ARF1 : The Binary is at https://github.com/robert-hh/Shared-Stuff

@ARF1
Copy link
Author

ARF1 commented Mar 2, 2017

@robert-hh Thank you very much.

It apparently really is the MSVC build environment that causes the low recursion depth:

MicroPython v1.8.7 on 2017-03-02; win32 version
Use Ctrl-D to exit, Ctrl-E for paste mode
>>>
paste mode; Ctrl-C to cancel, Ctrl-D to finish
=== a = 0
=== fail = False
=== def foo():
===     global a, fail
===     a += 1
===     if not fail:
===         try:
===             foo()
===         except:
===             fail = True
===
=== foo()
=== print(a)
===
130
>>>

Even after having looked into the MSVC build environment for quite a bit, I am absolutely stumped as to the cause of the issue.

@stinos
Copy link
Contributor

stinos commented Mar 2, 2017

Btw release msvc builds gives 29 iterations so you most likely are using a debug build.

@ARF1
Copy link
Author

ARF1 commented Mar 2, 2017

@stinos You are right. With the MSVC release build I now also get 30 iterations.

That solves my immediate problem, though I am struggling to understand where this discrepancy between the gcc build and the MSVC build originates.

I observed that after starting micropython, micropython.mem_info() on the gcc build gives:
stack: 752 out of 40000 --> 130 iterations
On the MSVC release build:
stack: 1812 out of 40000 --> 30 iterations
While on the MSVC debug build:
stack: 5488 out of 40000 --> 12 iterations

Does this have anything to do with it?

@pfalcon
Copy link
Contributor

pfalcon commented Mar 2, 2017

Look how much stack one function call takes, and you'll see how bad (good at consuming stack space?) msvc is.

@ARF1
Copy link
Author

ARF1 commented Mar 2, 2017

@pfalcon Using your code from this post, I found:

gcc: 288 bytes per call
MSVC release: 1264 bytes
MSVC debug build: 3040 bytes
esp8266 build: 304 bytes (if I remember correctly)

This very nicely accounts for the difference in maximum recursion depth. Thanks you very much!

Now the only question that remains: What is MSVC doing??? What is it storing on the stack that gcc isn't?

@stinos
Copy link
Contributor

stinos commented Mar 2, 2017

Well it would certainly be inetersting to figure out the differences in stack layout with gcc. I did some quick searches but didn't find any relevant info.

Anyway, I don't think this really is an issue. I had this problem before for larger scale applications and I now just set MICROPY_STACK_CHECK to 0 as I don't have any code relying on an exception being raised when some arbitrary limit is reached. If things really go out of hand you'll just get a stack overflow exception (in this case, after 700 iterations). If that is still not sufficient you could increase the stack size (add msvc/user.props with options as found in https://msdn.microsoft.com/en-us/library/tdkhxaks.aspx). If that's overkill, alternatively you could increase the limit used in main() or call mp_stack_set_limit() somewhere.

@pfalcon
Copy link
Contributor

pfalcon commented Mar 2, 2017

Now the only question that remains: What is MSVC doing??? What is it storing on the stack that gcc isn't?

Yeah, I'd definitely be interested to hear if someone told me. Though you may consider why you're so interested in MSVC and wouldn't just using Mingw be a better option. Maybe even tell other folks, so they know what you're trying to do. For example, @stinos works with MSVC and MicroPython at his dayjob, and that's why he maintains MSVC port.

As a bit of extended trivia, uPy also supports "stackless" mode (similar to http://www.stackless.com) which trades stack allocation for heap allocation.

@stinos
Copy link
Contributor

stinos commented Mar 3, 2017

I'm a bit intrigued by this so I thought if I could make a minimal C sample reproducing this (i.e. big difference in msvc vs gcc stack layout) it would be a good start to investigate this further. However as it turns out I don't know enough about this to do so. This:

#include <stdint.h>
#include <stdio.h>
#include <malloc.h>

char* stack_top;

void init_stack_top(void) {
  volatile int stack_dummy;
  stack_top = (char*)&stack_dummy;
}

size_t stack_usage(void) { // Assumes descending stack
  volatile int stack_dummy;
  return stack_top - (char*)&stack_dummy;
}

int recurse = 0;

void foo(void) {
    recurse += 1;
    void *p = alloca(100);
    printf("%03d %d %p\n", recurse, stack_usage(), p);
    foo();
}

int main(int argc, char **argv) {
  init_stack_top();
  foo();
}

when compiled with the same options as used when building uPy, shows 112 bytes per call for msvc but 176 bytes per call for gcc, so that's the opposite of what we get for python calls.

Also interesting: when building uPy with VS2013 I get 25 iterations out of the uPy sample code from ARF1. Now just changing the optimization to minimize size (/O1) instead of favouring speed (/O2 or higher) this is almost doubled to 43 iterations.

Another thing: uPy built with mingw gives me 136 iterations. However adding a #define MICROPY_NLR_SETJMP (1), which is what msvc uses, brings this down to 90 iterations.

Conclusion: still no idea what causes the actual differences, but multiple factors are at play here.

@dpgeorge
Copy link
Member

dpgeorge commented Mar 3, 2017

The way I usually analyse the change in stack usage for any given patch is to look at the disassembly of the function in question, look at the prelude to see how much stack is being allocated. So if you really want to spend some time on this then you can do just that: for all the C functions that participate in a recursive Python function call you'd disassemble them (on both compilers you're comparing) and work out the stack use for each function.

@stinos
Copy link
Contributor

stinos commented Mar 22, 2017

@ARF1 I went over the compiler switches again and just disabling /GL (whole program optimization, will allow scenarios like inlining of functions defined in other translation units ect) gives 52 iterations (adding /O1 on top of that only adds one extra iteration). So that is definitely the switch which has the most influence on stack usage. The trade-off is of course speed: it does produce code which is about 20% slower (only measured for call-heavy code like the sample code).

@pfalcon pfalcon changed the title Low recursion limit on Windows port Low recursion limit on Windows port with MSVC Apr 27, 2017
@pfalcon
Copy link
Contributor

pfalcon commented Apr 27, 2017

Anything can be done here or this can be closed?

@stinos
Copy link
Contributor

stinos commented Apr 27, 2017

Well I laid out the possible workarounds (loosen check, disable check or play with compiler flags) including some numbers, advantages and disadvantages. As far as I'm concerned, I think this should be done in an 'on demand' way (meaning anyone experiencing problems can refer to this issue and pick a suitable workaround if needed) and not as some general solution applied to uPy (see last comment: trading of 20% of performance can be quite the deal). So +1 for closing this.

@ARF1
Copy link
Author

ARF1 commented Apr 29, 2017

@pfalcon I probably should have been clearer in my last posts on this issue: From my side, this can be closed.

Stinos has even worked out beautifully where this oddity originates. Thank you for this!

It might be worth considering whether his findings are something that deserves a short heads-up in the docs to avoid loosing the knowledge.

@pfalcon
Copy link
Contributor

pfalcon commented Apr 29, 2017

@ARF1: Thanks, closing.

It might be worth considering whether his findings are something that deserves a short heads-up in the docs to avoid loosing the knowledge.

There's https://github.com/micropython/micropython/blob/master/windows/README.md , feel free to authorize together with @stinos a concise description of the issue, perhaps linking to this ticket.

@pfalcon pfalcon closed this as completed Apr 29, 2017
stinos added a commit to stinos/micropython that referenced this issue May 1, 2017
Add information as discussed in micropython#2927 to the readme to make the easier
to discover.
@stinos stinos mentioned this issue May 1, 2017
toolmacher pushed a commit to toolmacher/micropython that referenced this issue May 11, 2017
Add information as discussed in micropython#2927 to the readme to make the easier
to discover.
tannewt pushed a commit to tannewt/circuitpython that referenced this issue May 20, 2020
…n-master

Translations update from Weblate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants