Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server: Remove panics in tab module #1748

Merged
merged 14 commits into from
Oct 6, 2022

Conversation

har7an
Copy link
Contributor

@har7an har7an commented Sep 22, 2022

Removes all calls to unwrap() in the tab module of zellij-server. Makes all relevant functions return Results instead, attaching error context where possible to trace error origins better. Also updates the code in server::screen to accept the new Results and attach context there, too.

Related to #1734.

I want to test what it looks like first by provoking some errors in the code to see whether it's really any good.

@har7an har7an marked this pull request as draft September 22, 2022 11:39
@har7an har7an temporarily deployed to cachix September 22, 2022 11:40 Inactive
@har7an
Copy link
Contributor Author

har7an commented Sep 22, 2022

Note to self: Must replace calls to context() with with_context(|| format!(...)) to get the implicit formatting.

@har7an har7an force-pushed the errors/dont-panic-in-server-tab branch from 33537e8 to 54d7903 Compare September 22, 2022 12:46
@har7an har7an temporarily deployed to cachix September 23, 2022 05:26 Inactive
@har7an har7an force-pushed the errors/dont-panic-in-server-tab branch from d3b691b to 445a51c Compare September 23, 2022 05:32
@har7an har7an temporarily deployed to cachix September 23, 2022 05:32 Inactive
@har7an har7an marked this pull request as ready for review September 23, 2022 05:34
@har7an har7an force-pushed the errors/dont-panic-in-server-tab branch from 445a51c to 835b198 Compare September 23, 2022 06:19
@har7an har7an temporarily deployed to cachix September 23, 2022 06:19 Inactive
@har7an
Copy link
Contributor Author

har7an commented Sep 23, 2022

@imsnif I changed the dependencies here in that I added the backtrace feature to anyhow. Is that okay from your side? The benefit for us is that we now get a backtrace from where we first converted an error to a anyhow::Error (See "Stack backtrace" in the crash dump), rather than a backtrace from where the panic originated (See "Panic backtrace" in the crash dump). This is helpful because since now we "bubble up" the errors, the panic happens in an entirely different location than where the error originates. As a consequence we have two backtraces in the output now (if we run with RUST_BACKTRACE=1):

zellij crash dump

❯ RUST_BACKTRACE=1 ./target/debug/zellij

Error occurred in server:

  × Thread 'screen' panicked.
  ├─▶ Originating Thread(s)
  │   	1. stdin_handler_thread: AcceptInput
  │   	2. screen_thread: ScrollUpAt
  │   
  ├─▶ At zellij-server/src/lib.rs:695:18
  ╰─▶ Program terminates: a fatal error occured
      
      Caused by:
          0: failed to handle scrollwheel up at position Position { line: Line(18), column: Column(52) } for client 1
          1: failed to get pane at position Position { line: Line(18), column: Column(52) }
          2: Ouch
      
      Stack backtrace:
         0: anyhow::private::format_err
                   at /var/home/andi/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/anyhow-1.0.57/src/lib.rs:665:13
         1: zellij_server::tab::Tab::get_pane_id_at
                   at ./zellij-server/src/tab/mod.rs:1961:9
         2: zellij_server::tab::Tab::get_pane_at
                   at ./zellij-server/src/tab/mod.rs:1933:32
         3: zellij_server::tab::Tab::handle_scrollwheel_up
                   at ./zellij-server/src/tab/mod.rs:1867:29
         4: zellij_server::screen::screen_thread_main::{{closure}}
                   at ./zellij-server/src/screen.rs:1258:64
         5: zellij_server::screen::screen_thread_main
                   at ./zellij-server/src/screen.rs:1258:17
         6: zellij_server::init_session::{{closure}}
                   at ./zellij-server/src/lib.rs:689:17
         7: std::sys_common::backtrace::__rust_begin_short_backtrace
                   at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/sys_common/backtrace.rs:122:18
         8: std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}
                   at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/thread/mod.rs:498:17
         9: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
                   at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/panic/unwind_safe.rs:271:9
        10: std::panicking::try::do_call
                   at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:492:40
        11: __rust_try
        12: std::panicking::try
                   at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:456:19
        13: std::panic::catch_unwind
                   at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panic.rs:137:14
        14: std::thread::Builder::spawn_unchecked_::{{closure}}
                   at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/thread/mod.rs:497:30
        15: core::ops::function::FnOnce::call_once{{vtable.shim}}
                   at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/ops/function.rs:227:5
        16: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
                   at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/boxed.rs:1861:9
            <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
                   at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/boxed.rs:1861:9
            std::sys::unix::thread::Thread::new::thread_start
                   at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/sys/unix/thread.rs:108:17
        17: start_thread
        18: __clone3
      
      Panic backtrace:
         0: zellij_utils::errors::Panic::show_backtrace
                   at zellij-utils/src/errors.rs:130:21
         1: <zellij_utils::errors::Panic as core::fmt::Display>::fmt
                   at zellij-utils/src/errors.rs:112:18
         2: <T as alloc::string::ToString>::to_string
                   at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/string.rs:2405:9
         3: miette::handlers::graphical::GraphicalReportHandler::render_causes
                   at /var/home/andi/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/miette-3.3.0/src/handlers/graphical.rs:206:51
         4: miette::handlers::graphical::GraphicalReportHandler::render_report
                   at /var/home/andi/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/miette-3.3.0/src/handlers/graphical.rs:112:9
         5: zellij_utils::errors::not_wasm::fmt_report
                   at zellij-utils/src/errors.rs:536:9
         6: zellij_utils::errors::not_wasm::handle_panic
                   at zellij-utils/src/errors.rs:524:42
         7: zellij_server::start_server::{{closure}}
                   at zellij-server/src/lib.rs:227:13
         8: std::panicking::rust_panic_with_hook
                   at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:702:17
         9: std::panicking::begin_panic_handler::{{closure}}
                   at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:588:13
        10: std::sys_common::backtrace::__rust_end_short_backtrace
                   at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/sys_common/backtrace.rs:138:18
        11: rust_begin_unwind
                   at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:584:5
        12: core::panicking::panic_fmt
                   at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/panicking.rs:143:14
        13: core::result::unwrap_failed
                   at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/result.rs:1785:5
        14: core::result::Result<T,E>::expect
                   at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/result.rs:1035:23
        15: <core::result::Result<T,anyhow::Error> as zellij_utils::errors::FatalError<T>>::fatal
                   at zellij-utils/src/errors.rs:244:13
        16: zellij_server::init_session::{{closure}}
                   at zellij-server/src/lib.rs:689:17
        17: std::sys_common::backtrace::__rust_begin_short_backtrace
                   at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/sys_common/backtrace.rs:122:18
        18: std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}
                   at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/thread/mod.rs:498:17
        19: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
                   at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/panic/unwind_safe.rs:271:9
        20: std::panicking::try::do_call
                   at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:492:40
        21: __rust_try
        22: std::panicking::try
                   at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:456:19
        23: std::panic::catch_unwind
                   at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panic.rs:137:14
        24: std::thread::Builder::spawn_unchecked_::{{closure}}
                   at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/thread/mod.rs:497:30
        25: core::ops::function::FnOnce::call_once{{vtable.shim}}
                   at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/ops/function.rs:227:5
        26: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
                   at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/boxed.rs:1861:9
            <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
                   at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/boxed.rs:1861:9
            std::sys::unix::thread::Thread::new::thread_start
                   at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/sys/unix/thread.rs:108:17
        27: start_thread
        28: __clone3
      
  help: If you are seeing this message, it means that something went wrong.
        Please report this error to the github issue.
        (https://github.com/zellij-org/zellij/issues)
        
        Also, if you want to see the backtrace, you can set the `RUST_BACKTRACE` environment variable to `1`.

We could drop the second stack trace (since IMO it doesn't really contain anything hinting towards the error source), but I thought we keep it in case the anyhow backtrace is garbage or non-existent.

Another bonus is that since the backtrace is part of the Error we unwrap, it is now also printed to the logs like so:

zellij error log

ERROR  |zellij_utils::errors::not| 2022-09-23 08:01:02.602 [screen    ] [zellij-utils/src/errors.rs:506]: Panic occured:
             thread: screen
             location: At zellij-server/src/lib.rs:695:18
             message: Program terminates: a fatal error occured

Caused by:
    0: failed to handle scrollwheel up at position Position { line: Line(18), column: Column(52) } for client 1
    1: failed to get pane at position Position { line: Line(18), column: Column(52) }
    2: Ouch

Stack backtrace:
   0: anyhow::private::format_err
             at /var/home/andi/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/anyhow-1.0.57/src/lib.rs:665:13
   1: zellij_server::tab::Tab::get_pane_id_at
             at ./zellij-server/src/tab/mod.rs:1961:9
   2: zellij_server::tab::Tab::get_pane_at
             at ./zellij-server/src/tab/mod.rs:1933:32
   3: zellij_server::tab::Tab::handle_scrollwheel_up
             at ./zellij-server/src/tab/mod.rs:1867:29
   4: zellij_server::screen::screen_thread_main::{{closure}}
             at ./zellij-server/src/screen.rs:1258:64
   5: zellij_server::screen::screen_thread_main
             at ./zellij-server/src/screen.rs:1258:17
   6: zellij_server::init_session::{{closure}}
             at ./zellij-server/src/lib.rs:689:17
   7: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/sys_common/backtrace.rs:122:18
   8: std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/thread/mod.rs:498:17
   9: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/panic/unwind_safe.rs:271:9
  10: std::panicking::try::do_call
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:492:40
  11: __rust_try
  12: std::panicking::try
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:456:19
  13: std::panic::catch_unwind
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panic.rs:137:14
  14: std::thread::Builder::spawn_unchecked_::{{closure}}
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/thread/mod.rs:497:30
  15: core::ops::function::FnOnce::call_once{{vtable.shim}}
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/ops/function.rs:227:5
  16: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/boxed.rs:1861:9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/boxed.rs:1861:9
      std::sys::unix::thread::Thread::new::thread_start
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/sys/unix/thread.rs:108:17
  17: start_thread
  18: __clone3

So we have all useful information in the logs directly and it doesn't get lost. I like it much better now, what's your opinion?

@har7an
Copy link
Contributor Author

har7an commented Sep 28, 2022

ping, @imsnif

@har7an har7an force-pushed the errors/dont-panic-in-server-tab branch from 835b198 to 452d030 Compare October 4, 2022 12:16
@har7an har7an temporarily deployed to cachix October 4, 2022 12:16 Inactive
@har7an
Copy link
Contributor Author

har7an commented Oct 4, 2022

I'll merge this now (if CI passes) since with the recent activity around error handling I expect merge conflicts to become much more frequent. We can still revert it before the next release when necessary, or I'll patch it as required.

Note to self: Discuss whether we should drop strip = yes in Cargo.toml to keep stack traces useful.

@imsnif
Copy link
Member

imsnif commented Oct 4, 2022

@har7an - please please please wait for the KDL PR before merging!
I have a very long TODO list before we can release the next version and merging this will very likely block a lot of it.

for converting `Result` types that don't satisfy `anyhow`s trait
constraints (`Display + Send + Sync + 'static`) conveniently.

An example of such a Result is the `SendError` returned from
`send_to_plugins`, which sends `PluginInstruction`s as message type.
One of the enum variants can contain a `mpsc::Sender`, which is `!Sync`
and hence makes the whole `SendError` be `!Sync` in this case. Add an
implementation for this case that takes the message and converts it into
an error containing the message formatted as string, with the additional
`ErrorContext` as anyhow context.
and apply error reporting via `anyhow` instead. Make all relevant
functions return `Result`s where previously a panic could occur and
attach error context.
to accept an optional 4th parameter, a literal "?". If present, this
will append a `?` to the given closure verbatim to handle/propagate
errors from within the generated macro code.
and apply appropriate error context and propagate errors further up.
created with `anyhow!`. Since these errors don't have an underlying
cause, we describe the cause in the macro instead and then attach the
error context as usual before `?`ing the error back up.
to capture error backtraces at the error origins (i.e. where we first
receive an error and convert it to a `anyhow::Error`). Since we
propagate error back up the call stack now, the place where we `unwrap`
on errors doesn't match the place where the error originated. Hence, the
callstack, too, is quite misleading since it contains barely any
references of the functions that triggered the error.

As a consequence, we have 2 backtraces now when zellij crashes: One from
`anyhow` (that is implicitly attached to anyhows error reports), and one
from the custom panic handler (which is displayed through `miette`).
in the output of miette. Since we record backtraces with `anyhow` now,
we end up having two backtraces in the output: One from the `anyhow`
error and one from the actual call to `panic`. Adds a comment explaining
the situation and another "section" to the error output of miette: We
print the backtrace from anyhow as "Stack backtrace", and the output
from the panic handler as "Panic backtrace". We keep both for the
(hopefully unlikely) case that the anyhow backtrace isn't existent, so
we still have at least something to work with.
and leave the `panic`ing to the calling function instead.
which extended `active_tab!` by passing the client IDs to the closure.
However, this isn't necessary because closures capture their environment
already, and the client_id needn't be mutable.
@har7an har7an force-pushed the errors/dont-panic-in-server-tab branch from 452d030 to e4d8d6a Compare October 5, 2022 13:34
@har7an har7an temporarily deployed to cachix October 5, 2022 13:34 Inactive
that defaults to some default client_id if it isn't valid (e.g. when the
ScreenInstruction is sent via CLI).
@har7an har7an temporarily deployed to cachix October 6, 2022 05:06 Inactive
@har7an har7an temporarily deployed to cachix October 6, 2022 05:44 Inactive
@har7an har7an force-pushed the errors/dont-panic-in-server-tab branch from 332b4b2 to 1f610a7 Compare October 6, 2022 05:53
@har7an har7an temporarily deployed to cachix October 6, 2022 05:53 Inactive
@har7an har7an merged commit 6715f46 into zellij-org:main Oct 6, 2022
@har7an har7an deleted the errors/dont-panic-in-server-tab branch October 6, 2022 06:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants