-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Change the default behavior for StackOverflowException #4113
Comments
tldr: We get this request a lot but we do not believe that it is a recoverable scenario. Further, making it "easy" to do the wrong thing seemed more harmful than tearing down from a dangerous situation. While undergoing a stack overflow, your process is a hairs width from doing all sorts of Really, Really bad things(tm). For instance, causing any more stack to be consumed can start scribbling over heap space (since you’ve lost the hard guard page at the end of the stack). In addition as it’s basically an asynchronous exception, object state needs to be assumed trashed (how many user objects are really designed to operate correctly if their execution is terminated while manipulating state? And for value types it’s even worse, since their copy is non-atomic). "Just re add the guard page and move on", you may say. However, it’s not possible to restore the guard page until after you’ve decided how you’re dispatching the exception so you can unwind some frames and do your work. When you get the exception, you’re already in the last page and you can’t restore the frame until you’ve taken stuff off. And you still have to run user code in the guard page to figure out how to handle it. Sure, that could be solved by ignoring filters during Stack Overflow, but you’re just building special case on top of special case at that point. (stuff gets really weird when you have managed->native transitions there as well). And what about any other SEH handlers that have registered on your thread? For scenarios that want to run code beyond this exception, the path forward is "host clr from native" or better, "move the disruptive chunk of code to another process". |
This is a rather bad argument. One of the CLR's greatest features is the way it uses processor exceptions for things like dynamic reallocation of the evaluation stack and omitting
[Subjective content here:] This is the wrong question. The better question is whether or not developers are out there who could write objects that are designed for these scenarios. While individual programming languages are an ideal platform for eliminating features because they could be abused or are likely to be misunderstood, this is not that place. The CLR should only eliminate the ability to control application behavior if its own internal state would be corrupted and (potentially) unrecoverable, or if the implementation cost of a feature exceeds the benefits it would provide. [Objective:] In this case, the CLR already has implemented behavior allowing applications to recover (in some manner) from a |
Okay I think we've made progress toward agreement. I was perhaps thrown off by the use of the word "default". It seems obvious to me that changing the out of the box experience isn't ideal. However, if we want to have a mode that is "I'm a developer who promises to think about these things in a serious way to my serious application," that's probably fine. If can agree to that then you can achieve your goals vai hosting the clr. Check out: https://msdn.microsoft.com/en-us/library/ms164395(v=vs.110).aspx |
Except I can't do that from within a third-party extension to another application 😦 |
The solution to this is to change your code/algorithm from recursive to iterative. This gives you complete control over conditions of when and what happens when you exhaust your limits. |
As a library developer you cannot protect your consumers from using you improperly. If the high level of stack use in your library is overly problematic, it's likely that you'll need to do the work to re-implement things iteratively. Unfortunately, there's no magic here. We've spent a great deal of effort over the years trying to make it possible for teams to be resilient these kinds of failures (hosting API changes, CriticalFinazerObjects etc etc). It's potentially possible that one could write a collection of objects that can handle these challenges but it's the classic "thar be dragons here". IMO: In essentially all cases it's better to re-architect your code than to expect this to work coherently. |
@sharwell Is any of that helpful/useful? Just discouraging? :-) I'm somewhat at a loss for other good ideas. For what it's worth it spawned a bunch of hallway conversations... If you'd like, I can tag it as hard problem. Thoughts? |
For what my $0.02 are worth, I would suggest only offering a configurable change to the API through the hosting API. That is to say, if someone wants to break the contract in their use case, they need to explicitly opt-in from the native side, and own the fallout from that. Not that it resolves the 'hard problem' aspect, but I think it reaches a nice middle ground where people that are willing to say 'we'll break it and pay for it' can opt-in, but its non-default. Thoughts? |
Typically our stance is "we don't light up features we don't intend to fully support." This generally entails thinking about things like: how likely are we to take patches to work around issues exposed by this feature? Will this feature incur maintenance costs because of it's interaction with current/as yet envisioned features? etc etc. @kangaroo I believe that the hosting APIs already have enough surface to do more or less what is requested (but I'm no expert). However, @sharwell has another constraint in that he would prefer it exposed in an already running managed process. FWIW other applications internally that try to have stackoverflow guarantees use stack probing. @sharwell indicated that this caused an unacceptable amount of overhead. That's a bit surprising to me but I have no reason to assume he hasn't profiled his scenario (or some prototype) before saying as much. :-) |
It's a bit more complicated than that. Much of the overhead can be avoided, but it would be nice to avoid all of the overhead and just let the guard page exception handler take care of the problem if and only if it becomes one. A good solution for me would be changing the default policy for |
How much overhead do you think 'pinging' (a single wasted read/write) Edit: this is a change in behavior, possibly allow it as an option in e.g. |
Is there at least a way to check what is the current stack depth / how much stack space remains in an efficient manner? |
That would allow recursive algorithms to at least throw StackOverflowImmenientExcpetion |
Is it possible to change the behavior while managed code is executing from the current Unix hosting API? If so, it might be possible for managed code to P/Invoke the CLR's own hosting API. Note: This could be Undefined Behavior – please let me know if it is. |
One of the worst things about |
I've provided the picture above as a simple example. The exception will be caught by a debugger and viewing either the Parallel Stacks or the Call Stack window will display the appropriate information. However, the first place people may check ( |
In many cases, this is actually not working.
It is common for this exception to actually kill the debugger.
*Hibernating Rhinos Ltd *
Oren Eini* l CEO l *Mobile: + 972-52-548-6969
Office: +972-4-622-7811 *l *Fax: +972-153-4-622-7811
…On Fri, Nov 17, 2017 at 9:17 PM, Tanner Gooding ***@***.***> wrote:
[image: image]
<https://user-images.githubusercontent.com/10487869/32964520-5f7701c2-cb88-11e7-9281-8fc202b96775.png>
I've provided the picture above as a simple example.
The exception will be caught by a debugger and viewing either the Parallel
Stacks or the Call Stack window will display the appropriate information.
However, the first place people may check ($exception.StackTrace) lists
the stack trace as null. I agree with @masonwheeler
<https://github.com/masonwheeler> that ensuring the exception stacktrace
information is properly populated would be ideal.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://github.com/dotnet/coreclr/issues/652#issuecomment-345339255>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAHIs8LEfwecEKhR1tfUIzIh6GfkRZcUks5s3dvIgaJpZM4D7TXI>
.
|
@tannergooding Not sure what to say. This is not what happens to me, on a vanilla VS2017 installation, when I get a |
I want to resurrect the thread by adding that, besides the debuggability issue mentioned above, what's more prominent for online services are that you would like to have the chance to dump some important information before the app crashes, you don't always get to hook up the debugger and repro the issue |
Java has some level of support for this, right? Is there something .NET could learn from that? If not, what is wrong with what Java does in .NET's view? |
I'm proposing to consider all "fatal" exceptions and get rid of the term "fatal" which in reality can not be determined. We can only say, that the situation is fatal when OS closes the application and it's nothing to do here with exceptions |
The only inherent fatal fault is memory corruption, and that should not through an exception at all, but fail fast. Everything else (including out of memory and stack overflow) can and should be recoverable. |
For instance, an HTTP server can recover by aborting the request and returning 500 Internal Server Error to the user. |
@jinek unfortunately some of these exceptions are fatal for practical reasons. For example, how would .Net recover from an out-of-memory situation? It has already attempted to do so with a full GC collection (which always happens before an OOM), and can't run finalizers (except those that have constrained execution) because they might (and generally will) attempt to allocate. This is why .Net allocates the OOM exception before running any user code, because in an OOM condition it wouldn't be able to. Stack overflow is unlike OOM because you can computationally anticipate a stack overflow and throw an exception prior to actually reaching the stack overflow condition (effectively making it a
This would require fallible allocators in .Net, similar to how var foo = new Bar();
if (foo != null) {
} Even worse: var foo = new Bar();
if (foo != null) {
var ex = new Http500Exception();
if (ex == null) { /* Now what? */ }
throw ex;
} |
@jcdickinson The server would look like try {
await handleRequest();
} catch (Object e) {// I know this isn’t valid C#, but it is valid in IL
log(e); // native code that will not allocate
returnInternalServerError();
} |
This depends on the source of the allocation that triggers the exception. Application-specific logic can recover from OutOfMemoryException under controlled scenarios. Here's an example:
This generally makes the assumption that if one allocation fails, the next one will also fail. There are many cases where this assumption breaks down, some of which lead to error scenarios that an application could treat as recoverable:
In general, the view of OutOfMemoryException here is faulty for the same reason the CLR's current view of StackOverflowException is faulty: given that it is difficult to reason about these exceptions (and generally impossible without additional application-specific information) it incorrectly draws a conclusion that it is not possible to code an application that exposes behavior a user would expect following an internal condition that produced this exception. |
Now, please, show me how you recover from System.Exception ? (It's theoretical question, because as you don't know the type of exception here I can bring up my own type which meaning would be to break your idea of recovery) |
We can not guarantee success execution of any catch:
|
Heap and stack historically grew in opposite directions to be flexible on its sizes.
|
MessageBox allocates. That's the problem. How would it allocate when there is no more memory to allocate? It would re-throw OutOfMemoryException. |
@jcdickinson by the time |
Only not garbage collector, but stack will get unwind (GC works for heap only). Still, what I'm saying is that you can never guarantee success of catch block. Never for any exception! Still because of this we don't deny the catch block, do we? |
GC is for heap only, but the stack unwinding will remove some GC roots. |
I think this topics are very relevant #45256 |
I can confirm that stack overflow debugging does not work, as described in this thread. Some ideas on how to make this better (includes ideas collected from other contributors):
It seems that (1) would be a great way to get a production-friendly solution. If a production crash happens, the developer would enable the special mode to obtain actionable logging. And most importantly, this would fix the outage quickly! |
Another idea for an emergency mode: At each function call, probe remaining stack space. If it falls below 64 KB, throw. This overhead would be tolerable as part of an emergency mode useful for production troubleshooting and unit testing. |
I would like to propose an unspecified modification to the CLR which allows code to be executed in a manner where a
StackOverflowException
will not cause the process to terminate. To get the discussion started, here are some possibilities for alternative behavior:AppDomain
.HandleProcessCorruptedStateExceptionsAttribute
which allows specialized handlers to be added.The primary use case for this I've encountered is recursive-descent algorithms for creating and visiting parse trees, e.g. in ANTLR. In these cases, I am able to write all of the behavior for a background thread such that immediate termination of a thread does not result in an invalid execution state. For example, no operation on the thread calls
Monitor.Enter
. These algorithms are performance-sensitive, so stack depth checking throughout the code is not a desired solution. Providing the ability to handle an exceptional stack overflow situation without terminating the process (e.g. Visual Studio, if this was running as an extension) would be a much improved situation.The text was updated successfully, but these errors were encountered: