Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server pty_thread panicked at failure to spawn for new pane #979

Closed
Tracked by #1100
matu3ba opened this issue Jan 1, 2022 · 3 comments
Closed
Tracked by #1100

Server pty_thread panicked at failure to spawn for new pane #979

matu3ba opened this issue Jan 1, 2022 · 3 comments

Comments

@matu3ba
Copy link
Contributor

matu3ba commented Jan 1, 2022

Zellij should not crash, if it cant spawn another instance.

  1. It should log this with ideally sufficient information and
  2. report the problem to the user so that the user can check the logs and diagnose the possible cause.
  3. Even if the error is unrecoverable, the panic handler should dump at least some information, ie the bubbled up error somewhere.

The offensive code is this

    let mut child = unsafe {
        let command = &mut Command::new(cmd.command);
        if let Some(current_dir) = cmd.cwd {
            command.current_dir(current_dir);
        }
        command
            .args(&cmd.args)
            .pre_exec(move || -> std::io::Result<()> {
                if libc::login_tty(pid_secondary) != 0 {
                    panic!("failed to set controlling terminal");
                }
                close_fds::close_open_fds(3, &[]);
                Ok(())
            })
            .spawn()
            .expect("failed to spawn")
    };

I think one should not crash the server with expect, as a failure to spawn is not an unrecoverable situation. Unrecoverable would be messed up file descriptors or broken tty.

/tmp/zellij-1000/zellij-log/ showed nothing useful.

Basic information

zellij --version: zellij 0.23.0
tput lines: 57
tput cols: 240
uname -av or ver(Windows): Linux pc 5.15.11-arch2-1 #1 SMP PREEMPT Wed, 22 Dec 2021 09:23:54 +0000 x86_64 GNU/Linu

List of programs you interact with as, PROGRAM --version: output cropped meaningful, for example:
nvim --version: 3 or 4 tabs, 4 panes on tab1 one I tried to create a new pane with 2xmpv, firejail+firefox, okular. tab2 with another instance of nvim, tab3 with a second instance of nvim. Not sure if tab4 had something else running.
alacritty --version: alacritty 0.9.0 (fed349aa)

Further information
Reproduction steps, noticeable behavior, related issues, etc
Let zellij run for a long time. Swap in between panes and split them. Eventually spawning might not be successful and zellij crashes with the following output:

Error occurred in server:
Originating Thread(s):
1. stdin_handler_thread: AcceptInput
2. pty_thread: SpawnTerminal

Error: thread 'pty' panicked at 'failed to spawn: Os { code: 2, kind: NotFound, message: "No such file or directory" }': /home/user/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/zellij-server-0.23.0/src/os_input_output.rs:128
   0: zellij_utils::errors::handle_panic
   1: std::panicking::rust_panic_with_hook
             at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:628:17
   2: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:521:13
   3: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/sys_common/backtrace.rs:139:18
   4: rust_begin_unwind
             at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:517:5
   5: core::panicking::panic_fmt
             at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/panicking.rs:100:14
   6: core::result::unwrap_failed
             at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/result.rs:1616:5
   7: <zellij_server::os_input_output::ServerOsInputOutput as zellij_server::os_input_output::ServerOsApi>::spawn_terminal
   8: zellij_server::pty::Pty::spawn_terminal
   9: std::sys_common::backtrace::__rust_begin_short_backtrace
  10: core::ops::function::FnOnce::call_once{{vtable.shim}}
  11: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/alloc/src/boxed.rs:1691:9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/alloc/src/boxed.rs:1691:9
      std::sys::unix::thread::Thread::new::thread_start
             at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/sys/unix/thread.rs:106:17
  12: start_thread
  13: __GI___clone
@jaeheonji
Copy link
Member

I agree there is some offensive code (especially panic!) in the zellij-server code.
So I'm getting ready to refactor the code that causes the crash.

The purpose of the refactoring is to return all panic codes as an Err, and the client expects to show an error with a graceful exit (exit 1). (of course, the goal is to leave a detailed log in zellij-log.)

Thanks for finding the dangerous code! This will definitely help we write better safety code.

@imsnif
Copy link
Member

imsnif commented Jan 1, 2022

We also need an error reporting system though. Bubbling the errors up is important work, but won't solve this completely.

@raphCode
Copy link
Contributor

Can this be closed since #995 should have fixed it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants