Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zellij start up extremely slow on WSL2 opensuse tumbleweed (since v0.41.1) #3722

Open
milanglacier opened this issue Nov 4, 2024 · 27 comments

Comments

@milanglacier
Copy link

milanglacier commented Nov 4, 2024

Basic information

I downloaded the zellij from github release, version 0.41.1 x86-64 arch.

stty size: 64 254

uname -av or ver(Windows): WSL2 opensuse tumbleweed

Issue description

Start zellij is extremly slow on WSL2 Opensuse tumbleweed.

This only happens since v0.41.1. Previously I am using v0.40.1 with no issues, it instantly starts up.

When launch zellij, the start screen hangs on this:

Loading Zellij
done

And I need to wait more than 10 seconds until zellij really starts up.

Note that this long startup time issue happened only occasionally (the chance is like about 50% or 33% for each time I tried to launch zellij)

Minimal reproduction

No configuration is needed, just use the default configuration (put a blank file as ~/.config/zellij/config.kdl) and one can observe such problem.

Other relevant information

There are no log shown in the runtime directory ($XDG_RUNTIME_DIR), just the file socket. As there are nothing error happened, just extremely long startup speed.

I am using opensuse tumbleweed on WSL2, the distribution version is 20241103-0

@milanglacier milanglacier changed the title zellij start up extremely slow on WSL2 opensuse tumbleweed zellij start up extremely slow on WSL2 opensuse tumbleweed (v0.41.1) Nov 4, 2024
@milanglacier milanglacier changed the title zellij start up extremely slow on WSL2 opensuse tumbleweed (v0.41.1) zellij start up extremely slow on WSL2 opensuse tumbleweed (since v0.41.1) Nov 4, 2024
@milanglacier
Copy link
Author

milanglacier commented Nov 4, 2024

I suspect if this is related to load the cached WASM file located at ~/.cache/zellij.

In fact if I removed the ~/.cache/zellij at everytime quitting zellij (so that zellij need to compile the WASM plugins everytime it launches). The startup speed is actually not that slow (usually 2-3 s), even significantly faster than the normal startup (with cached WASM file)!

aka something like this

alias zellij="rm -rf $HOME/.cache/zellij && zellij"

@imsnif
Copy link
Member

imsnif commented Nov 5, 2024

Hey, thank you for the report! Just to be clear: I do not have access to a windows machine (this is actually the main reason Zellij does not itself run on windows), so I cannot troubleshoot this properly. Just to say: the Zellij startup is supposed to be pretty much instantaneous. Even 2-3 seconds means something odd is going on.

Zellij usually logs to /tmp/zellij-<gid>/zellij-log/zellij.log. Maybe there's something useful there. Otherwise - how did you install Zellij?

@milanglacier
Copy link
Author

milanglacier commented Nov 5, 2024

Otherwise - how did you install Zellij?

just from the GitHub release. My CPU arch type is X86-64, btw.

@milanglacier
Copy link
Author

milanglacier commented Nov 5, 2024

Hi, I just shared the zellij log file:

zellij.log

the problem happens at here


[server_router] [zellij-server/src/route.rs:1228]: Server ready, retrying sending instruction. 

[zellij-server/src/route.rs:1141]: Server not ready, trying to place instruction in retry queue... 

It basically repeats this two commands over and over until finally success.

Thank you for your time and effort for digging into the rabbit holes.

@imsnif
Copy link
Member

imsnif commented Nov 5, 2024

Seems like for some reason it takes a very long time to create the IPC pipe on the filesystem. Does it maybe work better if you change $XDG_RUNTIME_DIR to something else before running Zellij? (make sure Zellij sees $XDG_RUNTIME_DIR as the new value ofc).

@imsnif
Copy link
Member

imsnif commented Nov 5, 2024

Putting this together with what you wrote above about the cache folder - could it be that filesystem access (and specifically writing) is not very fast in this case for some reason?

@milanglacier
Copy link
Author

milanglacier commented Nov 5, 2024

I previously used Zellij v0.40.1 which started instantly, but after updating, the startup speed has degraded significantly.

Does it maybe work better if you change $XDG_RUNTIME_DIR to something else before running Zellij?

I tried modifying $XDG_RUNTIME_DIR (which WSL sets to /mnt/wslg/runtime-dir by default) by removing this environment variable to force Zellij to use its default socket file location (aka /tmp/zellij-1000/v0.41.1). However, this didn't improve the startup performance.

could it be that filesystem access (and specifically writing) is not very fast in this case for some reason

I don't believe this is a filesystem I/O issue since ~/.cache resides within the VM itself rather than on a mounted directory. Additionally, the previous version worked without any performance problems, suggesting the slowdown may be related to recent changes in Zellij rather than filesystem access.

@imsnif
Copy link
Member

imsnif commented Nov 8, 2024

Friend - while I totally believe your experience, I feel it's not useful to reduce it to saying that it was different in the previous version and so it must be something in the version upgrade. Other than the usual correlation/causation maxim, this doesn't give us a lot. I could just as easily say that the new version works well in other instances of WSL and so it must be something in the system.

The new version includes a great deal of changes. It could definitely be something in the new version, but without access to the problem to troubleshoot it, the best I can do is offer my guesses as I did above - trusting that you are the expert on your system, you have access to it, and so you might be able to use these educated guesses to troubleshoot this further.

If I get new information, maybe I'll have more ideas. But right now the best idea I have is to drill down into the code, place some extra logs in order to find out where this is coming from. This is not something that's practical to do in an issue thread, unfortunately. So I'm left with "Right now, with the info I have, this seems like an issue with slow access to the fs". It could have something to do with Zellij, it could have something to do with some sort of security layer gone awry, it could be a matter of some sort of priority queue - I honestly don't know. I am not there and do not know this system.

Again - given more information, I might have more ideas. But right now I'm unfortunately out of them. My apologies.

@imsnif
Copy link
Member

imsnif commented Nov 8, 2024

One last idea btw is to try to install Zellij in a different way. Either compiling from source (cargo install --locked zellij) or downloading a precompiled version from the releases (not sure which one you did).

This could happen (to either direction) if there's some weird linking behavior going on. But really, I'm just guessing.

@rnbguy
Copy link

rnbguy commented Nov 8, 2024

I am having the same issue on ArchLinux x86_64.

I tried:

  • pacman -S zellij
  • github release binary zellij-x86_64-unknown-linux-musl
  • cargo build --release on source

The problem is persistent regardless of the console/terminal applications -- gnom-console, kitty, rio etc.

Looking at the log, it is the same issue as mentioned above

...
WARN   |zellij_server::route     | 2024-11-08 21:17:05.262 [server_router] [zellij-server/src/route.rs:1228]: Server ready, retrying sending instruction.
WARN   |zellij_server::route     | 2024-11-08 21:17:05.268 [server_router] [zellij-server/src/route.rs:1141]: Server not ready, trying to place instruction in retry queue...
WARN   |zellij_server::route     | 2024-11-08 21:17:05.268 [server_router] [zellij-server/src/route.rs:1228]: Server ready, retrying sending instruction.
WARN   |zellij_server::route     | 2024-11-08 21:17:05.273 [server_router] [zellij-server/src/route.rs:1141]: Server not ready, trying to place instruction in retry queue...
WARN   |zellij_server::route     | 2024-11-08 21:17:05.273 [server_router] [zellij-server/src/route.rs:1228]: Server ready, retrying sending instruction.
WARN   |zellij_server::route     | 2024-11-08 21:17:05.278 [server_router] [zellij-server/src/route.rs:1141]: Server not ready, trying to place instruction in retry queue...
WARN   |zellij_server::route     | 2024-11-08 21:17:05.278 [server_router] [zellij-server/src/route.rs:1228]: Server ready, retrying sending instruction.
WARN   |zellij_server::route     | 2024-11-08 21:17:05.283 [server_router] [zellij-server/src/route.rs:1141]: Server not ready, trying to place instruction in retry queue...
WARN   |zellij_server::route     | 2024-11-08 21:17:05.283 [server_router] [zellij-server/src/route.rs:1228]: Server ready, retrying sending instruction.
WARN   |zellij_server::route     | 2024-11-08 21:17:05.289 [server_router] [zellij-server/src/route.rs:1141]: Server not ready, trying to place instruction in retry queue...
WARN   |zellij_server::route     | 2024-11-08 21:17:05.289 [server_router] [zellij-server/src/route.rs:1228]: Server ready, retrying sending instruction.
WARN   |zellij_server::route     | 2024-11-08 21:17:05.294 [server_router] [zellij-server/src/route.rs:1141]: Server not ready, trying to place instruction in retry queue...
WARN   |zellij_server::route     | 2024-11-08 21:17:05.294 [server_router] [zellij-server/src/route.rs:1228]: Server ready, retrying sending instruction.
WARN   |zellij_server::route     | 2024-11-08 21:17:05.299 [server_router] [zellij-server/src/route.rs:1141]: Server not ready, trying to place instruction in retry queue...
WARN   |zellij_server::route     | 2024-11-08 21:17:05.299 [server_router] [zellij-server/src/route.rs:1228]: Server ready, retrying sending instruction.
WARN   |zellij_server::route     | 2024-11-08 21:17:05.304 [server_router] [zellij-server/src/route.rs:1141]: Server not ready, trying to place instruction in retry queue...
WARN   |zellij_server::route     | 2024-11-08 21:17:05.304 [server_router] [zellij-server/src/route.rs:1228]: Server ready, retrying sending instruction.
WARN   |zellij_server::route     | 2024-11-08 21:17:05.310 [server_router] [zellij-server/src/route.rs:1141]: Server not ready, trying to place instruction in retry queue...
...

Also, I noticed, I get some other errors when I quit zellij

...
ERROR  |zellij_utils::input::layo| 2024-11-08 21:25:31.527 [screen    ] [zellij-utils/src/input/layout.rs:1120]: Failed to read layout dir: Os { code: 2, kind: NotFound, message: "No such file or directory" } 
ERROR  |zellij_utils::input::layo| 2024-11-08 21:25:32.492 [screen    ] [zellij-utils/src/input/layout.rs:1120]: Failed to read layout dir: Os { code: 2, kind: NotFound, message: "No such file or directory" } 
INFO   |zellij_client            | 2024-11-08 21:25:32.492 [main      ] [zellij-client/src/lib.rs:584]: Bye from Zellij! 
INFO   |zellij_server::plugins::w| 2024-11-08 21:25:32.493 [wasm      ] [zellij-server/src/plugins/wasm_bridge.rs:318]: Bye from plugin 0 
INFO   |zellij_server::plugins::w| 2024-11-08 21:25:32.493 [wasm      ] [zellij-server/src/plugins/wasm_bridge.rs:318]: Bye from plugin 1 
INFO   |zellij_server::plugins   | 2024-11-08 21:25:32.493 [wasm      ] [zellij-server/src/plugins/mod.rs:894]: wasm main thread exits 
ERROR  |zellij_server::os_input_o| 2024-11-08 21:25:32.493 [screen    ] [zellij-server/src/os_input_output.rs:917]: Failed to apply cached resizes: failed to send message to pty writer 
ERROR  |zellij_server::os_input_o| 2024-11-08 21:25:32.493 [screen    ] [zellij-server/src/os_input_output.rs:906]: Failed to cache resizes: failed to send message to pty writer 
ERROR  |zellij_server::os_input_o| 2024-11-08 21:25:32.493 [screen    ] [zellij-server/src/os_input_output.rs:917]: Failed to apply cached resizes: failed to send message to pty writer 
ERROR  |zellij_server::os_input_o| 2024-11-08 21:25:32.493 [screen    ] [zellij-server/src/os_input_output.rs:906]: Failed to cache resizes: failed to send message to pty writer 
ERROR  |zellij_server::os_input_o| 2024-11-08 21:25:32.493 [screen    ] [zellij-server/src/os_input_output.rs:917]: Failed to apply cached resizes: failed to send message to pty writer 
ERROR  |zellij_server::os_input_o| 2024-11-08 21:25:32.493 [screen    ] [zellij-server/src/os_input_output.rs:906]: Failed to cache resizes: failed to send message to pty writer 
ERROR  |zellij_server::os_input_o| 2024-11-08 21:25:32.493 [screen    ] [zellij-server/src/os_input_output.rs:917]: Failed to apply cached resizes: failed to send message to pty writer

Update: I just stumbled on a panic at zellij startup

...
WARN   |zellij_server::route     | 2024-11-08 21:52:02.100 [server_router] [zellij-server/src/route.rs:1228]: Server ready, retrying sending instruction.
WARN   |zellij_server::route     | 2024-11-08 21:52:02.105 [server_router] [zellij-server/src/route.rs:1141]: Server not ready, trying to place instruction in retry queue...
ERROR  |zellij_utils::errors::not| 2024-11-08 21:52:02.105 [server_router] [zellij-utils/src/errors.rs:690]: Panic occured:
             thread: server_router
             location: At zellij-server/src/lib.rs:588:38
             message: Program terminates: a fatal error occured

Caused by:
    0: failed to handle instruction for client 1
    1: couldn't get reference to read-locked session
INFO   |zellij_server::plugins   | 2024-11-08 21:52:02.106 [wasm      ] [zellij-server/src/plugins/mod.rs:894]: wasm main thread exits
...

@milanglacier
Copy link
Author

milanglacier commented Nov 9, 2024

Again - given more information, I might have more ideas. But right now I'm unfortunately out of them. My apologies.

I completely understand it and appreciate your work on this project. And take your time and do what you think are the best for this project.

I hope I can find more information but it is just a hopeful thought as I have no clue on which direction should I dig into.

@milanglacier
Copy link
Author

milanglacier commented Nov 9, 2024

updates:

I think conpty is less likely to be the issue.

I installed some linux native GUI terminal emulators and run them directly using wslg. In that case I believe the standard unix pty should be used instead of the windows conpty. But I still observe the same slow startup issue.

initial comment:

ERROR |zellij_server::os_input_o| 2024-11-08 21:25:32.493 [screen ] [zellij-server/src/os_input_output.rs:917]: Failed to apply cached resizes: failed to send message to pty writer

Is it possible due to the implementation of windows' conpty?

Windows' conpty is notorious for its subpar implementation compared to a standard unix pty.

There are a lot of bugs related to Windows' conpty. If you search the topic conpty on the github issues for some popular terminal emulator (wezterm, alacritty, windows terminal), you will see it. The last time I faced a conpty issue is its incompatability with nerd font v3 (conpty made a fix release after nerd font v3 publish).

So maybe Zellij, or the library Zellij uses, is calling some API that behaves well on the Unix pty, but unfortunately bad for windows Conpty? However, as I think that conpty is some low-level things and should not have direct interaction with an app like Zellij?

@imsnif
Copy link
Member

imsnif commented Nov 9, 2024

@rnbguy - I guess this is not inside WSL but that arch is directly installed on the machine?

How did you both install Zellij? cargo, precompiled binary from the releases, or?

@rnbguy
Copy link

rnbguy commented Nov 9, 2024

I am using Arch Linux natively on a machine. I tried all the binaries I can think of -- arch repo, source compile, GitHub release 😄 the issue still persists.

@imsnif
Copy link
Member

imsnif commented Nov 9, 2024

@rnbguy - are you comfortable installing from main? If so, and you can still reproduce it with main, maybe I'll do a debug branch and we can try some back and forth debugging here. Knowing that this also happens on arch narrows down the problem a lot.

@rnbguy
Copy link

rnbguy commented Nov 9, 2024

I actually compiled from main before 😄

I did a git pull origin main now and compiled once again. The problem persists. ☹️

@rnbguy
Copy link

rnbguy commented Nov 9, 2024

Another update. I have a VPS -- ArchLinux x86_64. Zellij wasn't updated on it and was with version 0.40.1. I just tried 20 times on 0.40.1, and I didn't see the above logs. Later, I updated Zellij to 0.41.1, and I hit on the problem on the 3rd attempt.


Also, I see these panics when I quit Zellij on VPS. I saw them in 0.40.1 too.

INFO   |zellij_server::plugins   | 2024-11-09 13:17:14.538 [wasm      ] [zellij-server/src/plugins/mod.rs:894]: wasm main thread exits
ERROR  |zellij_utils::errors::not| 2024-11-09 13:17:14.539 [async-std/runti] [zellij-utils/src/errors.rs:690]: Panic occured:
             thread: async-std/runtime
             location: At zellij-server/src/plugins/wasm_bridge.rs:637:46
             message: called `Result::unwrap()` on an `Err` value: failed to send message to screen

Caused by:
    0: Originating Thread(s)

    1: failed to send message to channel
ERROR  |zellij_utils::errors::not| 2024-11-09 13:17:14.539 [async-std/runti] [zellij-utils/src/errors.rs:690]: Panic occured:
             thread: async-std/runtime
             location: At zellij-server/src/plugins/wasm_bridge.rs:600:72
             message: called `Result::unwrap()` on an `Err` value: PoisonError { .. }
ERROR  |zellij_utils::errors::not| 2024-11-09 13:17:14.539 [async-std/runti] [zellij-utils/src/errors.rs:690]: Panic occured:
             thread: async-std/runtime
             location: At zellij-server/src/plugins/wasm_bridge.rs:696:80
             message: called `Result::unwrap()` on an `Err` value: PoisonError { .. }
ERROR  |zellij_utils::errors::not| 2024-11-09 13:17:14.540 [async-std/runti] [zellij-utils/src/errors.rs:690]: Panic occured:
             thread: async-std/runtime
             location: At zellij-server/src/plugins/wasm_bridge.rs:600:72
             message: called `Result::unwrap()` on an `Err` value: PoisonError { .. }
ERROR  |zellij_utils::errors::not| 2024-11-09 13:17:14.540 [async-std/runti] [zellij-utils/src/errors.rs:690]: Panic occured:
             thread: async-std/runtime
             location: At zellij-server/src/plugins/wasm_bridge.rs:600:72
             message: called `Result::unwrap()` on an `Err` value: PoisonError { .. }

@imsnif
Copy link
Member

imsnif commented Nov 9, 2024

@rnbguy - I'm also on arch and do not experience these. So we have something to go on. I'll start with some guesses: are you running with a custom config? Does ~/.config/zellij and ~/.config/zellij/config.kdl exist? If not, could you try to create them and see if it does something?

@rnbguy
Copy link

rnbguy commented Nov 9, 2024

hey, the config.kdl exists. I also removed it and reinitialized zellij - but that doesn't resolve the issue. Note, this issue is not happening all the time - but I do come across often.

@imsnif
Copy link
Member

imsnif commented Nov 11, 2024

He @rnbguy - I just created a debug branch with some more log statements to help us narrow down the problem. Could you compile and run debug-slow-start (happy to lend a hand or answer questions doing this), reproduce the problem and paste the logs here? Would be easiest if you clear the logs before each attempt so that things won't get messed up.

@rnbguy
Copy link

rnbguy commented Nov 12, 2024

Thanks 🙂 I attached the /tmp/zellij-1000/zellij-log/zellij.log from debug-slow-start branch after hitting the above issue by running cargo run --release.
zellij.log

@imsnif
Copy link
Member

imsnif commented Nov 12, 2024

Thanks @rnbguy ! This narrows it down quite a bit. I hope you don't mind a few more rounds of back-and-forth? I added some log lines to help find the issue. Could you try pulling the latest changes in the branch and testing again?

@rnbguy
Copy link

rnbguy commented Nov 12, 2024

I am ok with back-and-forth 🙂

Here is the new log zellij.log.

@imsnif
Copy link
Member

imsnif commented Nov 12, 2024

Hey @rnbguy - thanks for the super fast turnaround. I pushed some more logs and a potential fix to the branch. Want to check it out and see if it still happens to you? Either way, if you could attach the logs that would be great.

@rnbguy
Copy link

rnbguy commented Nov 12, 2024

It looks like the latest commit resolved the issue! I don't see it anymore 👌 🙌

I am still sharing the log file - as it contains the other panic that I was referring to when I quit zellij: zellij.log

Again, many thanks for the quick work ! 🙏

@imsnif
Copy link
Member

imsnif commented Nov 13, 2024

Amazing - I just merged the fix: #3767

Hope to release a patch version with this and other fixes soon. Thanks for the help troubleshooting!

@milanglacier
Copy link
Author

I can confirm the latest release v0.41.2 fixes the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants