Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Abbreviated stack traces in REPL #40537

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

BioTurboNick
Copy link
Contributor

@BioTurboNick BioTurboNick commented Apr 20, 2021

I went ahead and implemented a version of my suggestion #40138

Post updated with current state 5/5/22

The philosophy here is that for 95% of purposes, Julia Base, stdlib, and registry packages are black boxes to the user. While it's an important part of Julia that their internals are accessible, they don't need to be shown by default. Plots.jl and BenchmarkTools are excellent example of where cluttered stack traces get in the way and fill the screen:

using Plots, BenchmarkTools
@btime plot([1,2,3], seriestype=:blah);

ERROR: The backend must not support the series type Val{:blah}, and there isn't a series recipe defined.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] macro expansion
    @ ~/.julia/packages/RecipesPipeline/CirY4/src/type_recipe.jl:9 [inlined]
  [3] apply_recipe(plotattributes::AbstractDict{Symbol, Any}, #unused#::Type{Val{:blah}}, x::Any, y::Any, z::Any)
    @ RecipesPipeline ~/.julia/packages/RecipesBase/92zOw/src/RecipesBase.jl:282
  [4] _process_seriesrecipe(plt::Any, plotattributes::Any)
    @ RecipesPipeline ~/.julia/packages/RecipesPipeline/CirY4/src/series_recipe.jl:50
  [5] _process_seriesrecipes!(plt::Any, kw_list::Any)
    @ RecipesPipeline ~/.julia/packages/RecipesPipeline/CirY4/src/series_recipe.jl:27
  [6] recipe_pipeline!(plt::Any, plotattributes::Any, args::Any)
    @ RecipesPipeline ~/.julia/packages/RecipesPipeline/CirY4/src/RecipesPipeline.jl:97
  [7] _plot!(plt::Plots.Plot, plotattributes::Any, args::Any)
    @ Plots ~/.julia/packages/Plots/kyYZF/src/plot.jl:172
  [8] plot(args::Any; kw::Base.Pairs{Symbol, V, Tuple{Vararg{Symbol, N}}, NamedTuple{names, T}} where {V, N, names, T<:Tuple{Vararg{Any, N}}})
    @ Plots ~/.julia/packages/Plots/kyYZF/src/plot.jl:58
  [9] var"##core#278"()
    @ Main ~/.julia/packages/BenchmarkTools/Kt6kv/src/execution.jl:371
 [10] var"##sample#279"(__params::BenchmarkTools.Parameters)
    @ Main ~/.julia/packages/BenchmarkTools/Kt6kv/src/execution.jl:377
 [11] _run(b::BenchmarkTools.Benchmark{Symbol("##benchmark#277")}, p::BenchmarkTools.Parameters; verbose::Bool, pad::String, kwargs::Base.Pairs{Symbol, Integer, NTuple{4, Symbol}, NamedTuple{(:samples, :evals, :gctrial, :gcsample), Tuple{Int64, Int64, Bool, Bool}}})
    @ Main ~/.julia/packages/BenchmarkTools/Kt6kv/src/execution.jl:405
 [12] #invokelatest#2
    @ ./essentials.jl:723 [inlined]
 [13] #run_result#39
    @ ~/.julia/packages/BenchmarkTools/Kt6kv/src/execution.jl:32 [inlined]
 [14] run(b::BenchmarkTools.Benchmark{Symbol("##benchmark#277")}, p::BenchmarkTools.Parameters; progressid::Nothing, nleaves::Float64, ndone::Float64, kwargs::Base.Pairs{Symbol, Integer, NTuple{5, Symbol}, NamedTuple{(:verbose, :samples, :evals, :gctrial, :gcsample), Tuple{Bool, Int64, Int64, Bool, Bool}}})
    @ BenchmarkTools ~/.julia/packages/BenchmarkTools/Kt6kv/src/execution.jl:94
 [15] #warmup#47
    @ ~/.julia/packages/BenchmarkTools/Kt6kv/src/execution.jl:141 [inlined]
 [16] warmup(item::BenchmarkTools.Benchmark{Symbol("##benchmark#277")})
    @ BenchmarkTools ~/.julia/packages/BenchmarkTools/Kt6kv/src/execution.jl:141
 [17] top-level scope
    @ ~/.julia/packages/BenchmarkTools/Kt6kv/src/execution.jl:481

What does this do?
- Adds a global variable err that is set in the REPL when an exception is encountered, similar to ans, which is of type ExceptionInfo, which is just a wrapper around the array, to dispatch to the correct show method. Anything hidden in the default view will be shown here. Was added in separate PR

  • Adds a default compact stack trace display that elides all internal stack frames found in Base, an Stdlib, or an added registry package, except for the first one called into by a visible stack frame, which is the public surface of the other package
  • Does not affect non-REPL Julia sessions or remote workers, so CI and HPC runs should display full stack traces

After this PR, the above example looks like this:
image

The "Unknown" module comes from inlined stack frames, which are currently missing information. This may be removed with PRs like #41099

@KristofferC
Copy link
Member

KristofferC commented Apr 20, 2021

Adds a default compact stack trace display that selects the top-most frame that isn't in base Julia (frame.file starts with something other than ./, or starts with ./REPL)

That seems very arbitrary. Many times the error will be thrown going through a package and you need to figure out where in your code that call happened. For example:

julia> df = DataFrame(a = [1,2,3]);

julia> df[5]
ERROR: BoundsError: attempt to access 3×1 DataFrame at index [!, 5]
Stacktrace:
 [1] getindex
   @ ~/.julia/packages/DataFrames/GtZ1l/src/dataframe/dataframe.jl:429 [inlined]
 [2] getindex(df::DataFrame, col_ind::Int64)
   @ DataFrames ./deprecated.jl:72
 [3] top-level scope
   @ REPL[5]:1

would show you

ERROR: BoundsError: attempt to access 3×1 DataFrame at index [!, 5]
Stacktrace:
 [2] getindex(df::DataFrame, col_ind::Int64)
   @ DataFrames ./deprecated.jl:72

Pretty useless?

@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Apr 20, 2021

Any decision would be arbitrary, the question IMO is where a signal/noise tradeoff can reasonably be set.

And if one does need the full stacktrace, either because this decision is imperfect or they really need the full trace, it just takes one more command to see it.

Could add "~/.julia/packages" to the list. Most of the time a registry package's internal stack frames won't be interesting to a user. Though in the case of something like BenchmarkTools, sometimes the evaluated code is shown as originating from inside BenchmarkTools. (e.g. @btime a[3]). Maybe the right choice there is to not show any stack frames by default, because the problem, as in your DataFrames example, is really at the top level.

@vchuravy
Copy link
Member

I am wondering if we could make this dependent on whether I started Julia interactively or from as a script? My horror scenario is that I am several hours into a HPC run and all I get is a stacktrace that is truncated.

@BioTurboNick
Copy link
Contributor Author

@vchuravy FWIW, this is what happens in this current PR when an error occurs in pmap:

using Distributed
pmap(x -> 4y, [1, 2, 3])

#=
ERROR: On worker 2:
UndefVarError: y not defined
Stacktrace:
 [1] #6
   @ ./REPL[10]:1
 [2] #106
   @ /mnt/c/Users/ncb20/source/repos/julia/usr/share/julia/stdlib/v1.7/Distributed/src/process_messages.jl:278
 [3] run_work_thunk
   @ /mnt/c/Users/ncb20/source/repos/julia/usr/share/julia/stdlib/v1.7/Distributed/src/process_messages.jl:63
 [4] macro expansion
   @ /mnt/c/Users/ncb20/source/repos/julia/usr/share/julia/stdlib/v1.7/Distributed/src/process_messages.jl:278 [inlined] [5] #105
   @ ./task.jl:411
Location:
 [7] pmap(f::Function, p::WorkerPool, c::Vector{Int64}; distributed::Bool, batch_size::Int64, on_error::Nothing, retry_delays::Vector{Any}, retry_check::Nothing)
   @ Distributed /mnt/c/Users/ncb20/source/repos/julia/usr/share/julia/stdlib/v1.7/Distributed/src/pmap.jl:126
Use `lasterr()` to retrieve the full stack trace.
=#

Appears the worker doesn't use the abbreviated code path. Though it also doesn't use the relative paths like the REPL process, which is good to know.

@antoine-levitt
Copy link
Contributor

antoine-levitt commented Apr 20, 2021

This proposal is extreme but goes in the right direction IMO.

+1 on only doing this in interactive contexts. CI is another example where you want full stacktraces.

lasterr() is a bit annoying to type, but we already use ctrl q for interacting with stack traces, we could use it here to trigger lasterr().

One intermediate option would be to display two frames (the first and the last) for each module. That way you truncate long intra-module call chains, but you keep the flow between the different packages.

@PaulSoderlind
Copy link
Contributor

PaulSoderlind commented Apr 20, 2021

From the perspective of an ordinary user, I believe that something like this is needed.

Motivation: I teach a rather basic Julia class for finance/statistics students - and one of the most common complaints from the students is that the error messages are very hard to understand: information overload and not geared towards where their errors are likely to be (in their own code, not in Base or packages).

...and they write scripts, and use the REPL only for trivial things

@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Apr 20, 2021

lasterr() is a bit annoying to type, but we already use ctrl q for interacting with stack traces, we could use it here to trigger lasterr().

If the error information is packed into a specific struct with a show method, instead of a vector, then this could be changed to just typing err, I think? Yep, done

One intermediate option would be to display two frames (the first and the last) for each module. That way you truncate long intra-module call chains, but you keep the flow between the different packages.

I just figured out how to look up the module of a stack frame, so that's possible. An issue there is that inlined frames don't retain that information. (At least, via frame.linfo.def.module)

Adding: Confirmed that the abbreviated code path is not used here, as well: ./julia --eval "a() = f; a()"

@BioTurboNick BioTurboNick marked this pull request as draft April 20, 2021 22:58
@antoine-levitt
Copy link
Contributor

Indeed module information is quite brittle, and not detected reliably. A simpler and more robust method is to just use the file information. Eg collapse callchains that take place inside a single folder. That already takes care of long callchains, and keeps track of the flow among the various folders of more complex packages, which is quite nice. I tested it on some moderately complex stacktraces I had on hand, it did cut the stacktrace by a factor of ~2 and increased readability. Still results in quite large stacktraces though.

@KristofferC
Copy link
Member

Though in the case of something like BenchmarkTools, sometimes the evaluated code is shown as originating from inside BenchmarkTools. (e.g. @Btime a[3])

Yes, with e.g. callbacks it is often the case that user code takes the path through a package or Base before getting executed.

@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Apr 21, 2021

julia> df = DataFrame(a = [1,2,3]);

julia> df[5]
ERROR: BoundsError: attempt to access 3×1 DataFrame at index [!, 5]
Stacktrace:
 [1] getindex
   @ ~/.julia/packages/DataFrames/GtZ1l/src/dataframe/dataframe.jl:429 [inlined]
 [2] getindex(df::DataFrame, col_ind::Int64)
   @ DataFrames ./deprecated.jl:72
 [3] top-level scope
   @ REPL[5]:1

would show you

ERROR: BoundsError: attempt to access 3×1 DataFrame at index [!, 5]
Stacktrace:
 [2] getindex(df::DataFrame, col_ind::Int64)
   @ DataFrames ./deprecated.jl:72

Pretty useless?

After filtering out packages and stdlib, now it just shows this:

ERROR: BoundsError: attempt to access 3×1 DataFrame at index [!, 5]
Use `err` to retrieve the full stack trace.

If it's in function a from your local repository:

ERROR: LoadError: BoundsError: attempt to access 3×1 DataFrame at index [!, 5]
Location:
 [2] a()
   @ Main /mnt/c/Users/<user>/source/repos/MyPackage.jl/src/test.jl:1
Use `err` to retrieve the full stack trace.
in expression starting at /mnt/c/Users/<user>/source/repos/MyPackage.jl/src/test.jl:2

If instead you were deving DataFrames:

ERROR: BoundsError: attempt to access 3×1 DataFrame at index [!, 5]
Location:
 [1] getindex(df::DataFrame, #unused#::typeof(!), col_ind::Int64)
   @ DataFrames ~/.julia/dev/DataFrames/src/dataframe/dataframe.jl:502
Use `err` to retrieve the full stack trace.

I do understand the friction here, but I think it's possible to minimize that and I'd like to try.

@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Apr 22, 2021

Perhaps it would be worth stepping back here, after thinking about this some more. This could instead default to showing all stack frames in code you're controlling, and just hide base/stdlibs/added packages frames. (But they could still be revealed with err).

So it could look like this instead:

ERROR: BoundsError: attempt to access 3-codeunit String at index [4]
Stacktrace:
 [4] a
   @ .\REPL[6]:1 [inlined]
 [5] b
   @ .\REPL[7]:1 [inlined]
 [6] c
   @ .\REPL[8]:1 [inlined]
 [7] d
   @ .\REPL[9]:1 [inlined]
 [8] e()
   @ Main .\REPL[10]:1
Use `err` to retrieve the full stack trace.

Would that be a better balance than just showing the top one? It drops 8 of the 18 lines that are currently output (6 in Base, 2 for top level scope)

Then, if stack frames in the middle are omitted, that can be indicated with vertical dots:

using BenchmarkTools
o = [1;2;3]
a() = @btime ppp()
ppp() = o[4]

e()
ERROR: BoundsError: attempt to access 3-element Vector{Int64} at index [4]
Stacktrace:
  [2] ppp()
    @ Main ./REPL[52]:1
    ⋮
 [12] a()
    @ Main ./REPL[60]:1
 [13] b
    @ ./REPL[17]:1 [inlined]
 [14] c
    @ ./REPL[18]:1 [inlined]
 [15] d
    @ ./REPL[19]:1 [inlined]
 [16] e()
    @ Main ./REPL[20]:1

This example cuts number of lines output by 23.

@antoine-levitt
Copy link
Contributor

antoine-levitt commented Apr 22, 2021

That looks very nice! I would print an ellipsis every time something is omitted (ie in your example, also before [2]). Also I wonder if it would not be good to print the first and last frame skipped (in your example, [3] and [11]), so the user knows what's going on in between, possibly with a special printing to make it easier to distinguish? Eg something like

ERROR: LoadError: BoundsError: attempt to access 3-element Vector{Int64} at index [4]
Stacktrace:
  (1) getindex @ Base
  [2] ppp()
    @ Main ~/scratch.jl:4
  (3) var"##core#278"() @ Main
    ⋮
 (11) macro expansion @ 
 [12] a()
    @ Main ~/scratch.jl:3
 [13] b
    @ ~/scratch.jl:6 [inlined]
 [14] c
    @ ~/scratch.jl:7 [inlined]
 [15] d
    @ ~/scratch.jl:8 [inlined]
 [16] e()
    @ Main ~/scratch.jl:9
 [17] top-level scope
    @ ~/scratch.jl:10
 (18) include(fname::String) @ Base.MainInclude
    ⋮
in expression starting at /home/antoine/scratch.jl:10

(of course here the benchmarktools macro expansion is not detected as belonging to the benchmarktools module...)

@BioTurboNick
Copy link
Contributor Author

That looks very nice! I would print an ellipsis every time something is omitted (ie in your example, also before [2]). Also I wonder if it would not be good to print the first and last frame skipped (in your example, [3] and [11]), so the user knows what's going on in between, possibly with a special printing to make it easier to distinguish?

Maybe... My thinking is that most of the time, the problems will be in code someone is actively developing. If they need the intermediate stack frames at all, wouldn't they want to see the whole thing anyway?

Here's a more real-world example:

a() = first(findall([1;2;3]) do x
       x[2]
       end)

ERROR: BoundsError
Stacktrace:
  [2] #10
    @ ./REPL[24]:2 [inlined]
    ⋮
 [13] a()
    @ Main ./REPL[24]:1

The missing stack frames are:

 [3] _broadcast_getindex_evalf
    @ ./broadcast.jl:670 [inlined]
    ⋮
 [12] findall(testf::Function, A::Vector{Int64})
    @ Base ./array.jl:2175

There's definitely value to [12], but maybe not to [3]?

@BioTurboNick
Copy link
Contributor Author

It's too bad the stack frame doesn't show the name of the macro being expanded.

@antoine-levitt
Copy link
Contributor

There's definitely value to [12], but maybe not to [3]?

Agreed, but kind of hard to tell in advance. I agree it takes up space and makes it harder to see that it's not user code. So maybe just have the name of the modules involved within the ellipsis? So like

    ⋮ (Base, BenchmarkTools)

which has the merit of showing why the frames are omitted. Maybe even the function names? In your last example that would be

    ⋮ Base: getindex, #5, #97, iterate, iterate, grow_to!, collect, findall

which is kind of nice?

It's too bad the stack frame doesn't show the name of the macro being expanded.

Yes, the stacktraces still have a lot of rough edges around kwargs, closures and macros unfortunately...

@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Apr 22, 2021

The function names may expose too much internal stuff? I think perhaps the external surface is the most important, which would be the first call from "your" code into someone else's code. But willing to hear from others if they'd find that useful.

Showing the omitted modules is a great idea though. Inlined functions are an issue... could try to extract the package name from the location, but that may not be the best idea?

image

Another example of a typical error from Plots:
image
This PR eliminates the stack trace when the error is at the toplevel, saving the user from 18 lines of unnecessary, likely confusing output. (The lines happen when I resize my terminal window, sorry about that.)

@oscardssmith
Copy link
Member

The hard thing to get to have reasonable behavior is things like map(f, [1,2,3]) where f is a user defined function. In cases like this, the stack frame you want is probably the one where map gets applied to a value, even if that doesn't let you see how you got there.

@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Apr 23, 2021

The hard thing to get to have reasonable behavior is things like map(f, [1,2,3]) where f is a user defined function. In cases like this, the stack frame you want is probably the one where map gets applied to a value, even if that doesn't let you see how you got there.

I'd be happy to test out edge cases. Let's see if I can construct one that does poorly.

(Two issues I found checking this: Really do need to include top-level scope because a top-level macro will show it's own code as the top-level vs. the REPL, and should see the first frame after top-level.)

f() = map([1;2]) do x
       x[2]
       end
f()
ERROR: BoundsError
Stacktrace:
 [1] getindex
   @ ./number.jl:96 [inlined]
 [2] #100
   @ ./REPL[63]:2 [inlined]
   ⋮
   Base
   ⋮
 [6] map
   @ ./abstractarray.jl:2301 [inlined]
 [7] f()
   @ Main ./REPL[63]:1
 [8] top-level scope
   @ REPL[64]:1
Use `err` to retrieve the full stack trace.

Those seem fine to me? I don't think the missing frames for collect_similar, _collect, and iterate help. But, here's a case that might be confusing:

struct Foo end
map(Foo()) do x
       x[2]
       end
ERROR: MethodError: no method matching iterate(::Foo)
Closest candidates are:
  iterate(::Union{LinRange, StepRangeLen}) at range.jl:737
  iterate(::Union{LinRange, StepRangeLen}, ::Int64) at range.jl:737
  iterate(::T) where T<:Union{Base.KeySet{<:Any, <:Dict}, Base.ValueIterator{<:Dict}} at dict.jl:693
  ...
Stacktrace:
   Base
   ⋮
 [3] map(f::Function, A::Foo)
   @ Base ./abstractarray.jl:2330
 [4] top-level scope
   @ REPL[65]:1
Use `err` to retrieve the full stack trace.

Here it isn't immediately clear why it's complaining about not having an iterate method, but it's clear enough about needing one to be defined on Foo to work. If someone wanted to understand the reason it's looking for iterate, four keystrokes reveals the exact call site.

@BioTurboNick
Copy link
Contributor Author

@mcabbott suggested this would be easier to demonstrate for people to try out in a separate package, so I'll do that.

@rfourquet
Copy link
Member

Maybe this feature could be controlled via an option in the options field of LineEditREPL ?

@MichaelHatherly
Copy link
Member

https://github.com/MichaelHatherly/InteractiveErrors.jl might be of some relevance since it does a similar thing with abbreviating stacktraces. (warning, self-promotion alert 😆)

@BioTurboNick
Copy link
Contributor Author

That's cool @MichaelHatherly ! Maybe too complex for a default though? :-)

@MichaelHatherly
Copy link
Member

MichaelHatherly commented Apr 23, 2021

That's cool @MichaelHatherly ! Maybe too complex for a default though? :-)

Thanks, yes too complex as a default, or to be included in base in any way. Take whatever inspiration you want from there.

The one thing that I personally find unimportant in stacktraces is the printout of the complete method signature. Maybe that's a controversial view though, does it actually provide anything more useful than just looking at the file and line itself for the typical user? When some highly parameterised signature deep inside the diffeq ecosystem (just to pick a specific example) gives you an error it just gets in the way of more useful info, like source location. Since moving to solely using InteractiveErrors for everything I've really not missed big stacktraces or found that I actually needed the method signature to diagnose things.

@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Apr 23, 2021

Okay, demonstration code is available in this package: https://github.com/BioTurboNick/AbbreviatedStackTraces.jl

I may not register it because it's not really polished.

That version doesn't get bypassed by headless execution or worker processes though, because I wasn't able to successfully overwrite the REPL methods needed to do so. (Advice there would be appreciated!) But it does show the default REPL error printing behavior, for those who want to see it in action. Fixed, has the full behavior of this PR.

@antoine-levitt
Copy link
Contributor

Thanks for making it easier to try it out!

The one thing that I personally find unimportant in stacktraces is the printout of the complete method signature. Maybe that's a controversial view though, does it actually provide anything more useful than just looking at the file and line itself for the typical user?

I think this has been discussed before. I believe the main argument against removing it is that types are important to know exactly which method has been called, and so it might be hard to eg debug a stacktrace from a user which does not have this information. If however a mechanism like the one in this PR is merged with a default, simpler stacktrace, and the full one available (with a clear indication to the user how to get it), then it would make a lot of sense to omit type information in the simple stacktrace.

@simeonschaub simeonschaub added error handling Handling of exceptions by Julia or the user speculative Whether the change will be implemented is speculative labels Apr 27, 2021
@timholy
Copy link
Member

timholy commented Jul 6, 2023

This form of abbreviation omits certain frames, and is both less and more aggressive than #49102. Was more abbreviation deemed unnecessary?

@BioTurboNick
Copy link
Contributor Author

This is definitely not completed.

@timholy timholy reopened this Jul 7, 2023
@BioTurboNick
Copy link
Contributor Author

In 1.10, the sum(p[]) stacktrace drops from 18 frames to 14.

The @btime plot([1, 2, 3], seriestype = :blah) stacktrace of 17 frames is unchanged.

@ParadaCarleton
Copy link
Contributor

Is the plan still to merge this in 1.10?

@BioTurboNick
Copy link
Contributor Author

No, 1.10 has already had feature freeze. I believe the idea was that more work would start being done here for 1.10, which is currently exemplified by truncated types and some removed frames that are essentially duplicates. I hope more can be done for 1.11.

@BioTurboNick
Copy link
Contributor Author

Just updated to reflect changes in AbbreviatedStackTraces.jl:

  • Made internal lines less distracting by omitting the line number, using italics, making commas gray
  • Added capability to add back all public frames by checking names(module) with ENV["JULIA_STACKTRACE_PUBLIC"] = true

Example:
image

@JeffBezanson
Copy link
Member

From triage:

  • Please summarize the design of what this PR currently does.
  • It doesn't really work for me to have a complex list of rules to determine what frames to show. It's just too hard to have a mental model of what it's showing.
  • On triage we liked the idea of doing something very simple, like showing just the top ~3 frames along with an easy way to see the full thing. The reason is that I can at least understand what it is doing. We also discussed reversing the order ala python.
  • Environment variables don't seem useful to me, since errors happen unexpectedly. I don't know in advance what files will be involved in errors so that I can set up env vars.

@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Sep 28, 2023

Summary of PR

1. Which frames has the user indicated are important to them, and which are not?

We can use already-available information to determine a default: Any modules in code with a root path of ./ or located in .julia (other than dev) are Julia Base, Stdlibs, and packages that have been added from the registry. This is external code the user is consuming as black boxes.

By contrast, any modules in code in .julia/dev, which the user has explicitly indicated is under development, any code located on the REPL, and any code located outside of these locations is likely in some stage of development or custom use.

I can remove this aspect, but it seemed natural that if someone wanted to turn on debug log information using ENV["JULIA_DEBUG"], that could inform stacktraces as well. However, I do agree that this is not an ideal mechanism, and it makes the decision more complex than necessary.

An include/exclude apparatus that could be accessed by a user or package to customize the behavior would be better. VSCodeServer, for instance, would need to mark its frames as for exclusion using these rules; that is fine, the package would know it is doing something nonstandard and exclude itself. One could also imagine a package having a debug mode, which could register itself as to be included when that has been turned on. You could also imagine that Revise could add inclusion rules when e.g. Base/Core.Compiler is tracked. Currently there's just a method an IDE can overwrite to provide a test for its paths, but I haven't considered that a final design.

2. Once we know which frames are the user's, which other frames are informative and should be included, or are included but shouldn't be?

  • Include: The immediate lower frame after a user frame indicates what the user code did that caused a problem.
  • Include: With broadcasting, the broadcasting internals need to be skipped to get to the function the user called into.
  • Include: OPTIONAL (but could be made default if you like), bring back all frames belonging to public functions in their respective modules.
  • Exclude: Code evaluated into Main (deemed a user module) but really belongs to another package. BenchmarkTools macros do this.
  • Exclude: If the top-level frame is the REPL, or a broadcast, it's not necessary.

3. Alternatives

  • Reversing the order - Sure, that's not bad; though if I'm on the REPL I might still need to scroll up anyway, and reducing screen flooding is its own good.
  • First three frames - I can see the logic; could easily combine with what I've proposed, too. But is too much being lost if it's only three frames?
  • Verbosity control - could consider several levels of verbosity and a default, would be willing to build the options if we could settle on what's desired.

@JeffBezanson
Copy link
Member

What does the internal printing mean? I guess it means frames were omitted, and it summarizes which modules they are from?

The example above is certainly shorter and more manageable, but it does seem kind of random to me. I'm not convinced it's better than just limiting the printing to one screen and calling it a day.

@ParadaCarleton
Copy link
Contributor

  • Verbosity control - could consider several levels of verbosity and a default, would be willing to build the options if we could settle on what's desired.

I think the ideal is 3 options--full, package, and user--with user as the default. Packages are less likely to have bugs than user code, but more likely than Base, so I might want to see that too.

@BioTurboNick
Copy link
Contributor Author

What does the internal printing mean? I guess it means frames were omitted, and it summarizes which modules they are from?

The example above is certainly shorter and more manageable, but it does seem kind of random to me. I'm not convinced it's better than just limiting the printing to one screen and calling it a day.

I admit I'm not sure what makes it seem random?

In my original proposal, the intent was to show the user frames from code they wrote, and the entry point into code they didn't write. In the screenshot just above, frames 1, 6, and 19 would not be present in this mode. Only top level and plot would be visible. I think from the point of view of the user, this would be fairly intuitive in practice - frames from code they're working with is shown, other code isn't.

With the new public frames option, the trace can be read as a trace of public API calls, so a user could still see the gist of how the code flows between modules in the code they didn't write. It wouldn't be my preference, but for people uncomfortable with hiding everything, I thought this might be a reasonable compromise.

Calling out the omitted frames as internal to my mind communicates that the omitted code consists of implementation details of foreign code that the user doesn't typically need to be concerned with. And yes, it summarizes the modules included in the omitted set; it seemed important to at least ensure that a user could know what packages are involved in an error.

I'd like to recall that the original motivation for this work was attempting to reconcile 1) the strong opinion among part of the community that hiding frames was very bad and would make it harder to debug user errors, with 2) the strong desire among another part of the community to have shorter traces to make day-to-day work on the REPL less annoying, and so users wouldn't have massive traces to report.

If opinions on that have changed to the point that everything in the middle could just be dropped entirely... I'd be surprised but I wouldn't strongly object. I just thought I'd found a nice compromise that gives most people most of what they want most of the time :-).

@fatteneder
Copy link
Member

  • Verbosity control - could consider several levels of verbosity and a default, would be willing to build the options if we could settle on what's desired.

I think the ideal is 3 options--full, package, and user--with user as the default. Packages are less likely to have bugs than user code, but more likely than Base, so I might want to see that too.

Perhaps one could attach those options as fields to the err variable so that typing err.<level> allows quick inspection of other levels.

@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Jun 27, 2024

I would like to propose something to get this PR (possibly with the discussed tweaks) into Julia for 1.12.

This functionality should be:

  1. disabled by default
  2. when a stacktrace exceeds a certain length in an interactive context, the trace will be appended with a note about controlling stacktrace display.

This way, people could obtain the benefits of having this available natively and automatically, without needing an external package that invalidates Base methods. Because it isn't default, it doesn't invalidate any current documentation or doctests that show stacktraces. The user is in control of enabling it and won't be surprised by a change that then has to be supported. The functionality could be adjusted freely in new versions since it is labeled as experimental. It wouldn't conflict with the more... careful... approach that some of the core devs want to take to shortening stack traces. And it could even be stripped out without much disruption.

Would this proposal satisfy everyone?

And if it proves itself, later on it could be made default.

@timholy
Copy link
Member

timholy commented Jun 29, 2024

Since I haven't had time to work on any of the alternatives, I won't stand in the way of this. But I'll let others chime in more specifically.

@JeffBezanson
Copy link
Member

I'm sorry, I still don't like having such a complex and ugly algorithm for determining which frames are visible. The idea of a hidden stack frame clearly does not have any obvious or canonical definition. I think this code would be very hard to maintain and the complexity is just too much. Nobody should have to update this code if e.g. the implementation of broadcasting changes.

I would greatly prefer just truncating it to the screen size. In fact we might want to do what we do for arrays, and omit items from the middle rather than either end, since that will include the line from a script where the error started plus where it ended up.

@antoine-levitt
Copy link
Contributor

I would greatly prefer just truncating it to the screen size. In fact we might want to do what we do for arrays, and omit items from the middle rather than either end, since that will include the line from a script where the error started plus where it ended up.

This would likely be quite a lot worse than the current behavior, as most often the interesting frames are about 1/4th (which nontrivial operation failed, often coming after several layers of boring indirection) and 3/4th in the stack trace (the general location in the user code, coming before several layers of loading). Cutting the middle part would probably be a great way to demonstrate the need for this PR though, as it would emphasize exactly what this feature aims to cut :)

@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Jul 19, 2024

I would greatly prefer just truncating it to the screen size. In fact we might want to do what we do for arrays, and omit items from the middle rather than either end, since that will include the line from a script where the error started plus where it ended up.

This would likely be quite a lot worse than the current behavior, as most often the interesting frames are about 1/4th (which nontrivial operation failed, often coming after several layers of boring indirection) and 3/4th in the stack trace (the general location in the user code, coming before several layers of loading). Cutting the middle part would probably be a great way to demonstrate the need for this PR though, as it would emphasize exactly what this feature aims to cut :)

Yes, I believe it would. :-)

I'm sorry, I still don't like having such a complex and ugly algorithm for determining which frames are visible. The idea of a hidden stack frame clearly does not have any obvious or canonical definition. I think this code would be very hard to maintain and the complexity is just too much. Nobody should have to update this code if e.g. the implementation of broadcasting changes.

This is why I asked about putting it in as an "experimental" feature. There may well be ways to simplify the algorithm going forward by identifying formal ways to mark what can be included/excluded. public may help. I'm sorry @JeffBezanson , but it feels like you're nitpicking a feature that completely solves a major pain point with Julia UX just because it isn't perfect yet and you don't personally see how to get there from here.

@PaulSoderlind
Copy link
Contributor

I surely don't know how to solve this (technically), but I believe it is important to get something done. New users complain a lot (yes, a lot) about the stack traces. I believe they want something that is targeted to their own code, not base or packages. Some kind of filtering should be possible, or?

@BioTurboNick
Copy link
Contributor Author

Alright, let me lay this out.

I love Julia and want everyone else to love it too.

Stack traces are a major problem with Julia usage that turns people off of using it.

I put in significant effort in to find a complete solution to the problem. I listened to everyone's needs and implemented something that meets all of them. Unfortunately, solving all those concerns means it is complex, so hearing that complexity is now a barrier is... frustrating.

I'm willing to put more effort into it going forward. You have someone who can maintain it right here. And I'm happy to find ways to simplify it for better maintainability if that's really the major objection. I'm also happy to find better solutions for e.g. the broadcast frames than what's present here. And making it experimental and off by default means it could be stripped out at any time.

I'm willing to bet users will be so much happier with Julia with this feature available. And it will buy you time to maybe solve it in some perfect way 5 years from now.

Please help me help you.

@antoine-levitt
Copy link
Contributor

I haven't seen it mentioned in this thread, but the PR is available as a package here : https://github.com/BioTurboNick/AbbreviatedStackTraces.jl. I've been using it (with the "minimal" setting) for years now, I've extremely rarely needed to look at the complete stack trace, and I bless this package every time I have to look at a default stack trace eg on a student computer. To people who have reservations about the tradeoff (which I understand completely, adding complexity and arbitrariness is never a great solution), try it for a few weeks and see it for yourselves.

@JeffBezanson
Copy link
Member

This is just a philosophical difference; I like things to be simple (as opposed to "easy"). This also highlights how it is too bad there is so much stuff in this repo; ideally this could be developed elsewhere as part of e.g. the REPL and I may never even see this PR :) It's not your fault. I'm not holding out for some future more-perfect version of this, because I doubt such a thing exists or is desirable. In fact, if one wants this feature I bet this PR is about as good as it can get.

try it for a few weeks and see it for yourselves

OK, I guess I owe at least that. I'm trying this branch and I think I see some things that are bugs:

julia> sum([])
ERROR: MethodError: no method matching zero(::Type{Any})

Closest candidates are:
  zero(::Type{Union{Missing, T}}) where T
   @ Base missing.jl:105
  zero(::Type{Union{}}, Any...)
   @ Base number.jl:310
  zero(::Type{Missing})
   @ Base missing.jl:104
  ...

Stacktrace:
         ⋮ internal @ Base, Unknown
    [13] sum(a::Vector{Any})
       @ Base ./reducedim.jl:996
Use `err` to retrieve the full stack trace.

This does not include the location of my REPL input. It also prints Unknown a fair amount; I'm not sure why but I'd guess it's due to some inlined frames? I don't think it's helpful to print Unknown in that case, or possibly ever.

julia> 1 .+ [2,""]
ERROR: MethodError: no method matching +(::Int64, ::String)

Closest candidates are:
  +(::Any, ::Any, ::Any, ::Any...)
   @ Base operators.jl:587
  +(::Real, ::Complex{Bool})
   @ Base complex.jl:319
  +(::Number, ::LinearAlgebra.UniformScaling)
   @ LinearAlgebra ~/src/julia-1.11/julia/usr/share/julia/stdlib/v1.11/LinearAlgebra/src/uniformscaling.jl:145
  ...


In this case there is no stack trace and no hint about err.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
error handling Handling of exceptions by Julia or the user speculative Whether the change will be implemented is speculative triage This should be discussed on a triage call
Projects
None yet
Development

Successfully merging this pull request may close these issues.