How to debug StackOverflowException #9195

Petermarcu · 2017-10-27T16:35:14Z

@Daniel15 commented on Wed Oct 25 2017

I'm getting this error while moving a site from ASP.NET Core 1.1 on Mono to ASP.NET Core 2.0 on .NET Core 2.0:

dbug: Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker[2]
      Executed action method Daniel15.Web.Controllers.ShortUrlController.Index (Daniel15.Web), returned result Microsoft.AspNetCore.Mvc.ContentResult.
Process is terminating due to StackOverflowException.
[1]    12976 abort      LD_LIBRARY_PATH=/tmp/ssltest ASPNETCORE_ENVIRONMENT=Development =

How do I get a full stack trace for the StackOverflowException to determine where it's coming from?

The text was updated successfully, but these errors were encountered:

danmoseley · 2017-11-02T20:28:50Z

@janvorli your stack overflow work was in 2.0 I think.?

ayende · 2018-01-21T13:07:38Z

Any news about this? Any idea how to get at least some idea about what is going on?

janvorli · 2018-01-22T09:29:02Z

The only thing that could work is to run the app under lldb and when it hits the stack overflow, load the libsosplugin.so and run "clrstack -f".

ayende · 2018-01-22T09:50:02Z

@janvorli Any suggestions for doing this on Windows?
We are trying with procdump right now.
The problem is that this is happening in production, and the kind of things we can do there are limited.

cdmihai · 2018-06-01T17:45:14Z

SO questions suggest either using windbg or reproing it in VS while debugging. This is a bit hard when the issue is hard to repro and happens in processes spawned by the entry process (or when it's not happening on windows). Just printing out the stack trace would be so helpful ...

ayende · 2018-06-01T17:51:59Z

@cdmihai Presumably at this point it would be hard to print the stack trace (there is no stack with which to work, after all).
But I want to join in and comment that anything would be good here. Having even a small portion of the stack trace should usually be enough to tell us what is recursing and narrow down investigation times considerably.

patricksuo · 2018-10-18T07:04:50Z

The only thing that could work is to run the app under lldb and when it hits the stack overflow, load the libsosplugin.so and run "clrstack -f".

@janvorli How do Microsoft dev debug this kind of bug in prod?
Not every bug can reproduce easily in the local environment.

patricksuo · 2018-10-18T07:13:44Z

Having even a small portion of the stack trace should usually be enough to tell us what is recursing and narrow down investigation times considerably.

This is exactly how Golang do. (In stacktrace below, I elide some frame manually)

supei@sandbox-dev-hk:~$ cat a.go
package main

func foo()() {
	foo()
}

func main(){
	foo()
}

supei@sandbox-dev-hk:~$ go run a.go
runtime: goroutine stack exceeds 1000000000-byte limit
fatal error: stack overflow

runtime stack:
runtime.throw(0x46d1a8, 0xe)
	/home/supei/go/src/runtime/panic.go:608 +0x72
runtime.newstack()
	/home/supei/go/src/runtime/stack.go:1008 +0x729
runtime.morestack()
	/home/supei/go/src/runtime/asm_amd64.s:429 +0x8f

goroutine 1 [running]:
main.foo()
	/home/supei/a.go:3 +0x2e fp=0xc020086378 sp=0xc020086370 pc=0x44e9fe
main.foo()
	/home/supei/a.go:4 +0x20 fp=0xc020086388 sp=0xc020086378 pc=0x44e9f0
main.foo()
	/home/supei/a.go:4 +0x20 fp=0xc020086398 sp=0xc020086388 pc=0x44e9f0
main.foo()
	/home/supei/a.go:4 +0x20 fp=0xc0200863a8 sp=0xc020086398 pc=0x44e9f0
main.foo()
	/home/supei/a.go:4 +0x20 fp=0xc020086998 sp=0xc020086988 pc=0x44e9f0
main.foo()
	/home/supei/a.go:4 +0x20 fp=0xc0200869a8 sp=0xc020086998 pc=0x44e9f0
...additional frames elided...
exit status 2

ayende · 2018-10-18T09:03:58Z

In other words, like the CoreCLR allocates an OutOfMemoryException instance upfront, we can allocate some space (1KB should be more than enough) and do that there?

patricksuo · 2018-10-18T17:04:02Z

Golang has dynamic (goroutine) stack which is in heap. Golang runtime grows/shrinks stack size as needed.
In the StackOverflow scenario, the runtime will preempt the goroutine just before it requires an abnormal stack growth.

I'm not familiar with dotnet. I guess managed code run on native thread stack.
Maybe thread stack guard page mechanism is sth could help.

janvorli · 2018-10-18T17:25:22Z

I guess managed code run on native thread stack.

That's right. We already run sigsegv handler on an alternate stack to be able to at least print the message and not just silently die. This alternate stack is kept as small as possible since we need to allocate it for each thread. That size would likely not be enough to run the code that's necessary to dump the stack trace. But since we've recently switched to allocating the alternate stack space using mmap, we could actually reserve larger VM space and commit just the size needed by the regular sigsegv handling. On stack overflow, we could commit more of the space so that we have enough to dump the stack trace.
I've created #825 assigned to myself to track it.

markusschaber · 2018-11-15T09:14:34Z

I currently have a problem where I cannot even get Stack Trace with Visual Studio debugger... So anything which could help us to get a clue would be welcome... :-)

[Edit: We solved this problem in the mean time via "print-debugging" - we used log entries to nail down the exact place where the code crashes, so it's not urgent any more...]

facundofarias · 2018-12-05T07:17:19Z

+1 :|

BrunoJuchli · 2019-01-11T08:26:31Z

Does using windbg and SOS still work with core?

As described here: https://stackoverflow.com/a/49882734/684096

fwanggg · 2019-02-28T19:41:35Z

That's right. We already run sigsegv handler on an alternate stack to be able to at least print the message and not just silently die. This alternate stack is kept as small as possible since we need to allocate it for each thread. That size would likely not be enough to run the code that's necessary to dump the stack trace. But since we've recently switched to allocating the alternate stack space using mmap, we could actually reserve larger VM space and commit just the size needed by the regular sigsegv handling. On stack overflow, we could commit more of the space so that we have enough to dump the stack trace.

Where is the stacktrace dumped to, standard err/output? I am debugging in an orchestrated containerized environment, when app crashes because of StackOverFlowException the containers goes away and all is left is stderr and stdout,
2019-02-28T14:33:34.98-0500 [APP/PROC/WEB/0] ERR Process is terminating due to StackOverflowException.
What's the best way to debug SOFE in this kind of environment.

jhudsoncedaron · 2019-04-15T17:29:39Z

Wait ... you're already outputting Process is terminating due to a StackOverflowException ... Too bad we can't walk down the frames and output them. This can be done in a constant amount of RAM.

TehWardy · 2019-07-09T18:07:57Z

Got this from the console ...

Api> Route matched with {action = "Get", controller = "App"}. Executing controller action with signature Microsoft.AspNetCore.Mvc.IActionResult Get(Microsoft.AspNet.OData.Query.ODataQueryOptions`1[Core.Objects.Entities.CMS.App]) on controller Api.Controllers.AppController (Api).
Api>
Api> Process is terminating due to StackOverflowException.

Put a breakpoint in the action ... it's not getting that far ... so how do I debug stack overflows in DI ?

daiplusplus · 2021-05-26T04:11:20Z

I'd like to add that when running ASP.NET Core in an Azure App Service it's even more painful because the EventLog.xml file that Azure App Services maintains for you doesn't record any mention of the process being killed due to a stack-overflow. That's maddening. This means that every unexpected stack-overflow causes 2-3 hours of figuring out "why isn't the website working?" because there's no indication the entire process is crashing in the first place.

It seems in Azure the only solution is to enable short-term crash monitoring, then reproduce the issue (assuming you can even consistently and reliably reproduce it in the first place!), then download the multi-gigabyte-sized .dmp file that Azure Portal saves to your blob storage account, and then wait over 30 minutes for Visual Studio to chew through the .dmp file (all while VS shows an ugly pop-up informing me that a background process is "taking too long" and only giving me a (very tempting) "Terminate" button...

So I'd describe the issue more broadly as: the overall developer UX for diagnosing and investigating stack-overflow crashes in .NET Core is abysmal and this is especially disappointing given Microsoft has a generally good reputation for developer-tooling - and we never had this problem in .NET Framework 1.x, where we could at least catch( StackOverflowException ).

Out of curiosity (and I know it's off-topic), but why doesn't EventLog.xml record app-crashes due to stack-overflows?

danmoseley · 2021-05-26T04:40:41Z

@tommcdon do you know who writes this xml file? The work @janvorli did to emit the stack was a game changer but it sounds like the scenario doesn't quite work E2E here.

tommcdon · 2021-05-26T17:10:53Z

Eventlog.xml is part of the Application Event Log feature in Azure App Services. I'll find out the owners and try out the E2E scenario with StackOverFlow. It sounds like we might have a scenario gap here.

jhudsoncedaron · 2021-05-26T17:14:58Z

Or you can fix #8947 to at least allow catch (StackOverflowException) to work. The original reason for denying it is long gone.

StackOverflow has been theoretically recoverable forever. Once having rolled back 4k of stack you can call the native function _resetstkoflw https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/resetstkoflw?view=msvc-160

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the Future milestone Jan 31, 2020

yosbeleg89 mentioned this issue Feb 25, 2020

NLog.Targets.ElasticSearch (7.1.0) asp.net core 3.0 application exited with stack overflow markmcdowell/NLog.Targets.ElasticSearch#109

Closed

k15tfu mentioned this issue Apr 24, 2020

Missing call site when getting stack trace on stack overflow on Linux #35391

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to debug StackOverflowException #9195

How to debug StackOverflowException #9195

Petermarcu commented Oct 27, 2017

danmoseley commented Nov 2, 2017

ayende commented Jan 21, 2018

janvorli commented Jan 22, 2018

ayende commented Jan 22, 2018

cdmihai commented Jun 1, 2018 •

edited

Loading

ayende commented Jun 1, 2018

patricksuo commented Oct 18, 2018

patricksuo commented Oct 18, 2018

ayende commented Oct 18, 2018

patricksuo commented Oct 18, 2018

janvorli commented Oct 18, 2018

markusschaber commented Nov 15, 2018 •

edited

Loading

facundofarias commented Dec 5, 2018

BrunoJuchli commented Jan 11, 2019

fwanggg commented Feb 28, 2019

jhudsoncedaron commented Apr 15, 2019

TehWardy commented Jul 9, 2019

daiplusplus commented May 26, 2021 •

edited

Loading

danmoseley commented May 26, 2021

tommcdon commented May 26, 2021

jhudsoncedaron commented May 26, 2021

How to debug StackOverflowException #9195

How to debug StackOverflowException #9195

Comments

Petermarcu commented Oct 27, 2017

danmoseley commented Nov 2, 2017

ayende commented Jan 21, 2018

janvorli commented Jan 22, 2018

ayende commented Jan 22, 2018

cdmihai commented Jun 1, 2018 • edited Loading

ayende commented Jun 1, 2018

patricksuo commented Oct 18, 2018

patricksuo commented Oct 18, 2018

ayende commented Oct 18, 2018

patricksuo commented Oct 18, 2018

janvorli commented Oct 18, 2018

markusschaber commented Nov 15, 2018 • edited Loading

facundofarias commented Dec 5, 2018

BrunoJuchli commented Jan 11, 2019

fwanggg commented Feb 28, 2019

jhudsoncedaron commented Apr 15, 2019

TehWardy commented Jul 9, 2019

daiplusplus commented May 26, 2021 • edited Loading

danmoseley commented May 26, 2021

tommcdon commented May 26, 2021

jhudsoncedaron commented May 26, 2021

cdmihai commented Jun 1, 2018 •

edited

Loading

markusschaber commented Nov 15, 2018 •

edited

Loading

daiplusplus commented May 26, 2021 •

edited

Loading