Wrong stacktrace shown when rethrowing an exception after catching another one #19979

martinholters · 2017-01-11T13:32:40Z

function foo()
    try
        error("error in foo")
    catch e
    end
end

function bar()
    try
        error("error in bar")
    catch e
        foo()
        rethrow(e)
    end
end

bar()

produces the misleading

ERROR: error in bar
Stacktrace:
 [1] foo() at ./REPL[1]:3
 [2] bar() at ./REPL[2]:5

This just hit me in a really wicked situation: The stacktrace actually lead me to a line erroneously throwing the reported exception. I fixed it (to throw the correct exception), but the same error kept showing, leaving me completely puzzled for quite some time...

JeffBezanson · 2017-04-13T21:36:49Z

We should somehow clarify that if you care about the backtrace of an exception, you need to call catch_backtrace immediately.

samoconnor · 2017-11-14T23:50:13Z

I've been trying to figure out why the backtraces are mostly corrupted when my Retry.jl package is used. I have always assumed that my naive macro hacking was messing up the line number metadata in the AST somehow. I spent a long time chasing that down, but I now think the root of the problem is this issue.

It seems there are three problems: the backtrace can be empty; the rethrown backtrace can be wrong; or the rethrown error and backtrace can be wrong.

The minimal cases I've boiled it down to follow:

`test1()` -- works

Julia Version 0.7.0-DEV.2098 (2017-10-10 11:37 UTC)

function f()
    throw("Error in f()!")
end

function test1()
    try
        f()
    catch e
        if e == "foo"
            println("Got foo!")
        end
        rethrow(e)
    end
end

ERROR: LoadError: "Error in f()!"
Stacktrace:
 [1] f() at /Users/sam/julia/v0.5/Retry/src/extest.jl:2
 [2] test1() at /Users/sam/julia/v0.5/Retry/src/extest.jl:8
...
in expression starting at /Users/sam/julia/v0.5/Retry/src/extest.jl:50

`test2()` -- No stacktrace

function test2()
    try
        f()
    catch e
        println("catch!") # Update:, yield() is enough to loose the backtrace
        rethrow(e)
    end
end

catch!
ERROR: LoadError: "Error in f()!"
in expression starting at /Users/sam/julia/v0.5/Retry/src/extest.jl:50

`test3()` -- Wrong stacktrace

function isfoo(x)
    try
        x.code == "foo"
    catch e
        false
    end
end


function test3()
    try
        f()
    catch e
        if isfoo(e)
            println("Got foo!")
        end
        rethrow(e)
    end
end

ERROR: LoadError: "Error in f()!"
Stacktrace:
 [1] isfoo(::String) at /Users/sam/julia/v0.5/Retry/src/extest.jl:29
 [2] test3() at /Users/sam/julia/v0.5/Retry/src/extest.jl:43
...
in expression starting at /Users/sam/julia/v0.5/Retry/src/extest.jl:50

`test4()` -- Wrong error rethrown

The doc for rethrow() says "The default argument is the current exception (if called within a catch block)", but in this case rethrow is throwing the exception that was caught and ignored by isfoo():

function test4()
    try
        f()
    catch e
        if isfoo(e)
            println("Got foo!")
        end
        rethrow()
    end
end

ERROR: LoadError: type String has no field code
Stacktrace:
 [1] isfoo(::String) at /Users/sam/julia/v0.5/Retry/src/extest.jl:29
 [2] test3() at /Users/sam/julia/v0.5/Retry/src/extest.jl:43

samoconnor · 2017-11-15T00:41:23Z

@JeffBezanson I'm not sure how to apply your advice that "you need to call catch_backtrace immediately".

I see that the runtime calls catch_backtrace() in _start().

It seems like maybe I need to do tmp = catch_backtrace() at the top of catch and set_global_catch_backtrace(tmp) before rethrow() to undo any backtrace corruption that happens in the catch block, but I can't see anything like set_global_catch_backtrace.

I found that exec_program in repl.c does a manual save and restore of the ptls->bt_data.

Maybe record_backtrace should save the old ptls->bt_data before overwriting it if there is already a current ptls->exception_in_transit ?

samoconnor · 2017-11-15T00:59:36Z

related:
#12541
#12485

samoconnor · 2017-11-15T01:09:28Z

Based on reading the issues references above, and looking at the code,
It seems that rethrow only works if the catch block does not:

cause a task switch,
throw an exception or
call anything that might cause a task switch or an exception.

Is that correct? @vtjnash @yuyichao @Keno @s2maki

samoconnor · 2017-11-15T08:31:42Z

related to #20784 ?
#13947

@ignore

Retry.jl used to use try/catch inside the catch block's "@ignore if ..." condition. The purpose of this was to treat if conditions that throw errors as false. e.f. @ignore if e.reason == "Foo" end might throw an exception if e does not have a reason field. Retry.jl used to allow this and treat it as false. At present doing a nested try/catch in a catch block causes the Julia stacktrace to be corrupted: JuliaLang/julia#19979 (comment) After this commit, Retry.jl not requires @ignore if conditions to return true or false and not throw exceptions. The ecode() function has been exported to allow replacing @ignore if e.code == "404" end with @ignore if ecode(e) == "404" end

@ignore

JuliaWeb/Retry.jl@6c565a3 JuliaLang/julia#19979 (comment) Retry.jl no longer allows @ignore if conditions to throw errors. To support this: - Dynamic generation of exception types is removed from AWSException.jl (All exceptions are now type AWSException with fields: code, message, info and cause. - Exception handling conditions have added type checks.

s2maki · 2017-11-27T16:53:36Z

I've definitely seen the case for test4 cause the wrong exception to be thrown, and the missing stack trace problem from test2. Our code that exhibits this behavior definitely seems to fit these examples.

JeffBezanson · 2017-11-27T22:51:38Z

I think the fastest way to fix this is to add a method of rethrow that accepts a backtrace; that will at least allow it to be preserved manually via

catch e
    bt = catch_backtrace()
    ...
    rethrow(e, bt)
end

JeffBezanson · 2017-11-27T23:20:25Z

Here's a patch for 0.6:

--- a/src/task.c
+++ b/src/task.c
@@ -576,6 +576,16 @@ JL_DLLEXPORT void jl_rethrow_other(jl_value_t *e)
     throw_internal(e);
 }
 
+JL_DLLEXPORT void jl_rethrow_other_bt(jl_value_t *e, jl_value_t *bt)
+{
+    size_t btsz = jl_array_len(bt);
+    size_t sz = btsz > JL_MAX_BT_SIZE ? JL_MAX_BT_SIZE : btsz;
+    jl_ptls_t ptls = jl_get_ptls_states();
+    memcpy(ptls->bt_data, jl_array_data(bt), sz * sizeof(void*));
+    ptls->bt_size = sz;
+    throw_internal(e);
+}
+
 JL_DLLEXPORT jl_task_t *jl_new_task(jl_function_t *start, size_t ssize)

With wrapper:

rethrow(e, bt::Array{Ptr{Void},1}) = ccall(:jl_rethrow_other_bt, Union{}, (Any,Any), e, bt)

This can be added to 0.7 as well though the patch will be significantly different. In fact we should probably fully deprecate rethrow to throw (allowing 1 or 2 arguments).

c42f · 2018-01-14T01:08:53Z

Naively it seems possible to fix this and make catch_backtrace/rethrow reliable by saving and restoring the backtrace data appropriately - somewhat similarly to jl_apply_with_saved_exception_state but with various complications.

Is it worth considering this approach or is there some fundamental reason why it won't work?

Keno · 2018-01-14T01:12:46Z

The problem is that collecting the backtrace is expensive, so you only want to do that if you're actually gonna save it. It is quite possible however to have some sort of magic symbol that the compiler looks for to see if it needs to save the backtrace (which could but need not be syntactically exposed). @JeffBezanson said there was an issue about that at some point, but I couldn't find it.

c42f · 2018-01-14T10:37:00Z

I spent a few hours poking around in the runtime. The backtrace may be expensive, but it looks like the cost of recording it is unconditionally borne inside jl_throw. The only way to prevent that is to somehow have a flag set in the error handler state. A syntactic hint like this old proposal: https://gist.github.com/vtjnash/9cd53434cc7f7e2f7a29 would allow that.

But that seems like a distraction - what I'm proposing is that if there's a current backtrace, we should record it when entering a try block, and restore it when leaving. Seems like a bit of pointer shuffling may be all that's required in jl_enter_handler/jl_pop_handler? And it should only hurt if you've already thrown an exception.

For context, I'm just mulling over what to do about Jeff's comment at #25370 (comment) and wondering if a solution to the underlying problem is within reach.

c42f · 2018-01-15T06:19:12Z

all that's required

Hah, I should have known better :-) . I started prototyping a solution, but restoring the exception can't happen inside jl_pop_handler / jl_eh_restore_state because that occurs before the code in the catch block (duh!). There's also some careful gc rooting required.

I now think we should generalize the exception_in_transit to be a stack of (exception,backtrace) states in transit. If you throw Exception2 while handling Exception1, you'd like to know about Exception1 as the root cause while further handling Exception2. This would mean jl_throw needs to push the new exception onto the stack. The lowering of catch blocks would need a final step added to pop the exception stack back to the state it was when the try was entered. I think I can see a trick to give JL_CATCH the same behavior.

To address the task switching problems, the current exception stack would need to be saved and restored during context switching which means more copies of bt_data floating around.

If we did all this, I believe catch_backtrace() can be made reliable and there would be no measurable performance penalty outside a catch block.

samoconnor · 2018-01-19T22:55:42Z

@c42f a stack of backtraces is what I had in mind #19979 (comment) ...

Maybe record_backtrace should save the old ptls->bt_data before overwriting it if there is already a current ptls->exception_in_transit ?

JeffBezanson · 2018-01-19T23:17:50Z

Yes, a stack of backtraces is a good idea. That would let us keep catch_backtrace and rethrow, and fix finally as well. rethrow() is also more efficient than having the user (or compiler) call catch_backtrace() and pass the result manually, since that requires allocating more julia objects to reflect the data safely.

samoconnor · 2018-01-19T23:34:23Z

This reminds me of an old niggling thought: can we do better that the catch + rethrow API?
Can we do it in a way that discourages accidental loss (non-rethrowing) of important errors?
Can it be done in a way that makes requests for type-based catch redundant?

peek e as an alternative to catch e but with no need to rethrow. i.e. the error still propagates up the stack.

try
    ...
peek e
    @warn "my code caused an error" e
end

catch e where automatic rethrow is the default unless delete!(e) is called

try
    ...
catch e
    e isa IOError && e.code == :FOO && delete!(e)
end

c42f · 2018-01-25T02:03:04Z

@samoconnor that's an interesting idea; I think the key thing you're doing with the peek is capturing state from the scope where the peek occurs, and communicating it to an outer scope.

I think a better way to do this might be to add a new exception to the stack - that is, instead of the peek, to simply have:

try
    ...
catch
    throw(MyException("my code caused an error"))
end

Notably, because this throw occurs before the end of the catch block, the previous exceptions in the stack are preserved. So in the outer scope you have the full context, but you can also capture and transport arbitrary information in MyException.

In general I think the pattern of logging an exception and rethrowing it isn't great: Somewhere at an outer scope the exception will also be caught and reported in some way (probably via logging but without rethrowing). Logging and rethrowing just leads to duplicate error information due to several catch/log/rethrows down the stack. This is very confusing in log files. I'd say it's much better to preserve the information and make it available as a full stack of exceptions at outer scope instead.

samoconnor · 2018-02-19T22:30:47Z

I now think we should generalize the exception_in_transit to be a stack of (exception,backtrace) states in transit.

Hi @c42f,
Have you been able to make any progress towards this?
I have a protocol layer stack where there are multiple layers that need to: catch errors, inspect error type, clean up layer-local resources, then retrhow. As things stand, the stack traces are almost always corrupted or missing by the time the exceptions reach the application layer. As things stand, I often find myself commenting-out all the intermediate try/catches to get a clean look at the original stack trace. This is especially frustrating because of catch catching all exception types. I have this hierarchy of fine grained IO exception handlers, but I have to disable it all if I make a typo in some code that leads to an argument error, or and undefined var error or a bounds error.

It would be really awesome to have rethrow working :)

(This reminds me of: #15906)

c42f · 2018-02-22T03:22:24Z

It's an interesting problem but I haven't made any implementation progress. My original motivation was to solve #25370 (comment), but in the end it didn't seem that fixing catch_backtrace() would help much. (Not to detract from the host of other good reasons to do this.)

c42f · 2018-07-02T05:06:53Z

I had a 24 hour plane journey so I started looking into this in more detail. Here's some notes I made for myself about how exceptions and backtraces are implemented in the runtime (perhaps could become devdocs if there's interest?)

Capturing backtraces

In the runtime, backtraces are recorded by calling the low level function
jl_unw_stepn. jl_unw_stepn can fill in two C arrays:

ip which is an array of native instruction pointers mixed with interpreter
state where necessary. This array tells us which code was being executed
at the time of the backtrace.
sp which is an array of pointers to native stack frames containing the
program data. This optional array tells us which data was in use at the
time of the backtrace.

For use from julia code, jl_unw_stepn is wrapped by jl_backtrace_from_here
in order to return arrays of julia types. For use internally by the runtime,
jl_unw_stepn is called directly from various functions. For instance,
throwing an exception using jl_throw records the backtrace into a raw ip
buffer attached to the thread local storage. This is a bare minimum of raw
information for efficiency and needs to be further converted into julia-level
information when calling catch_backtrace and associated functions.

Format of the raw instruction pointer array

The julia runtime executes programs as a mixture of interpreted and compiled
code. Therefore, a comprehensive instruction pointer array must contains a
mixture of interpreter program counters and native instruction pointers:

Interpreter program counters are coded as a triple consisting of a special
sentinel value (uintptr_t)-1, followed by a jl_value_t pointer to the
MethodInstance being interpreted, and an integer program counter indexing
into the code for that method.
Native program instruction pointers are a single void pointer to the native
code being executed.

To get stack traces to work when interpreted and compiled code are mixed
requires a bit of black magic to detect the interpreter frames using the raw
instruction pointer, and extract the interpreter state from the native stack.
This is done by always entering the interperter via a special thunk
enter_interpreter_frame, whose code address can be known to the runtime. The
stack layout in this hand written assembly function is also carefully
controlled so that the interpreter state can be extracted in a well defined
way.

Exception handling

Try-catch mechanism

Julia exception handling is built on top of the C setjmp/longjmp mechanism
with some additional logic in jl_eh_restore_state to restore the state of the
task when a longjmp occurs. This additional logic includes:

Removing GC roots in abandoned frames by popping the GC stack
Restoring task local GC state
Unlocking any mutexes which were held in the abandoned frames
Restoring the signal mask

The state to be restored must be recorded at the try site and be accessible at
the catch site both in C and in julia. In both cases it's ultimately stored on
the stack in a jl_handler_t which is initialized by jl_enter_handler and
consumed by jl_eh_restore_state.

The C mechanism is as follows:

Use of the JL_TRY macro introduces a local variable of type jl_handler_t,
initializes it before calling setjmp, and calls jl_eh_restore_state if no
exception is thrown.
JL_CATCH adds the else clause for entry when an exception triggers a
longjmp. This includes a call to jl_eh_restore_state which uses the
local jl_handler_t variable to restore task local state.

The julia codegen mechnism essentially mirrors the C version, but is a little
more complicated as lowering, code generation, and an LLVM optimization pass
are involved. Briefly, lowering turns try/catch into
Expr(:enter) and Expr(:leave) in the lowered linearized code. These are
then translated during LLVM codegen into calls to the C runtime functions given
above, by way of an intermediate dummy function @julia.enter and the LLVM
pass LowerExcHandlers. (Note that local jumps caused by break and @goto
also abandon a try block and are also lowered to Expr(:leave).)

Throwing exceptions

From julia

When an exception is raised with jl_throw:

A backtrace is recorded into the thread local state
Some GC safe point stuff happens which I don't understand
The exception object is recorded in the thread local state
The jl_handler_t for the current task is used to get the approriate
jmp_buf which will reset execution to the current exception handler up the
stack.

From the runtime

The runtime can raise exceptions using jl_throw; this works the same way as
from julia.

There's also some special circumstances which require extra care:

Stack overflows
Divide by zero errors
User initiated interruptions (^C)
Attempts to write into read-only memory.

These are caught by the runtime using signal handlers (for example, stack
overflows are detected by protecting a memory page on the known stack boundary,
and catching the SIGSEGV which results if the stack spills onto this page).
Crucially, all code inside a signal handler must be async-signal-safe - for
example, we cannot allocate memory using malloc or call any function which
does so.

c42f · 2018-07-02T05:08:16Z

To follow on from the above, a possible implementation plan:

First, fix the task-vs-thread local state bugs in the current implementation. This should give experience going to the next step:

Fix thread local exception problems. Either
- Move exception_in_transit from thread local to task local storage in _jl_task_t, or
- Leave as is, but persist the exception state into the task right after the catch, or during task switching.
  What's the difference here? There's two things:
- A potential perf difference, trading task switching perf for task startup time and memory usage.
- Putting exception state directly on the task, we can't throw exceptions before the root task is initialized. Is this a problem?
Fix thread local backtrace buffer. See above.

To tie things off, replace the task-based exception/backtrace with a stack:

Add stack of (exception,backtrace) pairs to the task, perhaps based on a linked list? Allow the first node to be reused so that perf doesn't crater when try is entered but no exception occurs?

c42f · 2018-07-20T02:04:03Z

I've been slowly poking away at this. It looks reasonably easy to preserve the exception and backtrace data during task switching - that's just a few changes to the runtime C code.

The more tricky problem is to insert an exception stack pop intrinsic (which I've currently called Expr(:pop_exc)) into the lowered code and more particularly to get it flowing through the various layers of optimization correctly. The desired lowering is something along the lines of

Before()
try
    InTry()
catch
    InCatch()
end
After()

lowers to

  Before()
  enter c1
  InTry()
  leave 1
  goto e1
c1:
  leave 1
  InCatch()
  pop_exc    #<- Needs to pop exception stack back to the state at associated `enter`
e1:
  After()

The state of the stack at enter can be recorded in the jl_handler_t and will be dynamically available when the associated leave executes. We need a little of this state available in the associated pop_exc.

I've got three ideas:

I've tried storing the label of the associated enter in the pop_exc expression but it ended up feeling slightly unnatural to propagate that label during SSAIR processing, particularly since it's not used as a jump target.
Another idea is to use a token which is yielded by the enter, and make pop_exc take that token as an argument. As I understand it, this is the kind of thing that the LLVM Token type was introduced for, and would then have a nice lowering to LLVM IR before we get to the LowerExcHandlers pass.
We could just re-deduce the association from the nesting context in LowerExcHandlers, provided the IR isn't reordered by any optimizations. I'm starting to think this would be the easiest as LowerExcHandlers already has similar assumptions.

@Keno any thoughts?

c42f · 2018-07-27T15:11:03Z

It turns out that of the ideas above, producing a token from enter plays extremely nicely with the implicit SSA variables in SSAIR, but also is easy to work with in both the interpreter and in codegen. So I'm going with that for now.

c42f · 2018-08-24T16:53:58Z

Here's a PR which fixes this: #28878

c42f · 2018-10-24T07:14:10Z

Fixed in #28878

kshyatt added the error handling Handling of exceptions by Julia or the user label Jan 11, 2017

martinholters mentioned this issue Apr 13, 2017

Mistaken inference error on first call #21369

Closed

yuyichao mentioned this issue Aug 6, 2017

rm(force=true) clobbers exception #23150

Closed

JeffBezanson mentioned this issue Nov 29, 2017

RFC: deprecate rethrow(), add 2-argument throw(e, bt) #24846

Closed

JeffBezanson mentioned this issue Jan 8, 2018

ConsoleLogger for more fully featured log printing #25370

Merged

c42f mentioned this issue Jan 15, 2018

Attach backtraces explicitly in error logging #25569

Merged

samoconnor mentioned this issue Jan 19, 2018

BoundsError when closing MbedTLS.SSLContext JuliaWeb/HTTP.jl#148

Closed

JeffBezanson added this to the 1.0.x milestone Jan 23, 2018

samoconnor mentioned this issue Feb 18, 2018

WIP: Add more context to server logs JuliaWeb/HTTP.jl#200

Closed

samoconnor mentioned this issue Feb 23, 2018

no stack trace for some errors outside @test #26155

Closed

KristofferC mentioned this issue Apr 5, 2018

missing backtrace after task switch #13947

Closed

JeffBezanson mentioned this issue Jun 8, 2018

finally doesn't rethrow the stacktrace #27484

Closed

c42f mentioned this issue Aug 10, 2018

Segfault with stack overflow and nested exception handlers #28577

Closed

c42f mentioned this issue Aug 24, 2018

Exception stacks #28878

Merged

10 tasks

c42f mentioned this issue Oct 16, 2018

Improve server error handling and logging. JuliaWeb/HTTP.jl#189

Closed

c42f closed this as completed Oct 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong stacktrace shown when rethrowing an exception after catching another one #19979

Wrong stacktrace shown when rethrowing an exception after catching another one #19979

martinholters commented Jan 11, 2017

JeffBezanson commented Apr 13, 2017

samoconnor commented Nov 14, 2017 •

edited

Loading

samoconnor commented Nov 15, 2017

samoconnor commented Nov 15, 2017

samoconnor commented Nov 15, 2017

samoconnor commented Nov 15, 2017 •

edited

Loading

s2maki commented Nov 27, 2017

JeffBezanson commented Nov 27, 2017

JeffBezanson commented Nov 27, 2017

c42f commented Jan 14, 2018

Keno commented Jan 14, 2018

c42f commented Jan 14, 2018

c42f commented Jan 15, 2018

samoconnor commented Jan 19, 2018

JeffBezanson commented Jan 19, 2018

samoconnor commented Jan 19, 2018

c42f commented Jan 25, 2018

samoconnor commented Feb 19, 2018

c42f commented Feb 22, 2018

c42f commented Jul 2, 2018 •

edited

Loading

c42f commented Jul 2, 2018 •

edited

Loading

c42f commented Jul 20, 2018

c42f commented Jul 27, 2018

c42f commented Aug 24, 2018

c42f commented Oct 24, 2018

Wrong stacktrace shown when rethrowing an exception after catching another one #19979

Wrong stacktrace shown when rethrowing an exception after catching another one #19979

Comments

martinholters commented Jan 11, 2017

JeffBezanson commented Apr 13, 2017

samoconnor commented Nov 14, 2017 • edited Loading

test1() -- works

test2() -- No stacktrace

test3() -- Wrong stacktrace

test4() -- Wrong error rethrown

samoconnor commented Nov 15, 2017

samoconnor commented Nov 15, 2017

samoconnor commented Nov 15, 2017

samoconnor commented Nov 15, 2017 • edited Loading

s2maki commented Nov 27, 2017

JeffBezanson commented Nov 27, 2017

JeffBezanson commented Nov 27, 2017

c42f commented Jan 14, 2018

Keno commented Jan 14, 2018

c42f commented Jan 14, 2018

c42f commented Jan 15, 2018

samoconnor commented Jan 19, 2018

JeffBezanson commented Jan 19, 2018

samoconnor commented Jan 19, 2018

c42f commented Jan 25, 2018

samoconnor commented Feb 19, 2018

c42f commented Feb 22, 2018

c42f commented Jul 2, 2018 • edited Loading

Capturing backtraces

Format of the raw instruction pointer array

Exception handling

Try-catch mechanism

Throwing exceptions

From julia

From the runtime

c42f commented Jul 2, 2018 • edited Loading

c42f commented Jul 20, 2018

c42f commented Jul 27, 2018

c42f commented Aug 24, 2018

c42f commented Oct 24, 2018

samoconnor commented Nov 14, 2017 •

edited

Loading

`test1()` -- works

`test2()` -- No stacktrace

`test3()` -- Wrong stacktrace

`test4()` -- Wrong error rethrown

samoconnor commented Nov 15, 2017 •

edited

Loading

c42f commented Jul 2, 2018 •

edited

Loading

c42f commented Jul 2, 2018 •

edited

Loading