Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blank screen crash #5290

Closed
harryausten opened this issue May 1, 2020 · 14 comments
Closed

Blank screen crash #5290

harryausten opened this issue May 1, 2020 · 14 comments

Comments

@harryausten
Copy link

harryausten commented May 1, 2020

After upgrading from sway version 1:1.4-8 to 1:1.4-9 on Arch Linux, launching sway results in a blank grey screen with a frozen cursor. My system is then completely locked up and I can't even back out to a tty (with Ctrl+Alt+F#).

Note: the only change between these versions of sway is that it was rebuilt with a newer json-c library (0.13.1-3 -> 0.14-1).

I opened a bug report on the Arch Linux Bugtracker (https://bugs.archlinux.org/task/66482), where @maximbaz was kind enough to run through a few initial debugging steps. After discovering that the issue still occured after I built the latest master branches of sway and wlroots, he suggested that I open this GitHub issue.

Since then I have been trying to obtain any useful information about the crash on my machine (note: lshw and pacman -Qm output, along with an image of my frozen display can be seen attached to my messages on the aforementioned Arch Linux bug report). Since the crash completely locks up my machine, trying to output sway's stdout and stderr to a file doesn't seem to work (i.e. running sway &> sway.log). It seems that the file is not being properly flushed before the crash and therefore upon reboot the file is empty 😢

This crash is recreatable using the default /etc/sway/config config file. I did manage to discover that commenting out the bar block allowed sway to properly start up, however running the swaymsg reload command afterwards appeared to trigger the same crash.

Please let me know if there is anything else I can do to debug this issue. Any help would be greatly appreciated.

@emersion
Copy link
Member

emersion commented May 1, 2020

sway &> sway.log

Logs are written to stderr. Try sway -d >sway.log 2>&1

It seems that the file is not being properly flushed before the crash and therefore upon reboot the file is empty cry

Can you ssh into your machine to obtain the debug log?

@harryausten
Copy link
Author

sway.log
Using ssh to grab the log did indeed work! I didn't even think of that. Seems as though my system and indeed sway are still functioning, even though my display is frozen.

@harryausten
Copy link
Author

Here is a minimal log using the default config. All I did was start up sway and was greeted with the blank screen.
default-sway.log

@ephase
Copy link

ephase commented May 1, 2020

Hi,

I've got the exact same issue on Arch with the same type of hardware (Ryzen 5 3600, MSI B450 chipset, AMD Navi GPU). I tried to install sway-git and wlroot-git From AUR but nothing work. The only solution is to downgrade sway and json-c package bu system hang on reboot (Kernel Panic)

@fluix-dev
Copy link
Contributor

fluix-dev commented May 2, 2020

I am experiencing this issue as well. I originally noticed it by running waybar after an upgrade, however it also appears using the default config. (It also occurs after completing commands in or pressing Esc to exit rofi or dmenu)

Using the main config, the crashing portion seems to be starting swaybar because removing the bar {...} section doesn't freeze on startup. Below are another sway log as well as a core dump taken using:

gcore <sway pid>
gdb /usr/bin/sway <core file>
bt full
To get the dump I used SysRq to take control of the keyboard and switch to a tty.

This looks like it does indeed involve json. @Xyene has taken a brief look and mentioned it may be caused by this busy loop.

Edit: My processor is also a AMD Ryzen 5 3600X (12) @ 3.800GHz.

@harryausten
Copy link
Author

This looks like it does indeed involve json. @Xyene has taken a brief look and mentioned it may be caused by this busy loop

That loop appears to be trying to execute a random number generating function. Is this problem caused by RDRAND? I know some AMD CPUs have some kind of an issue with the RDRAND instruction. In fact I've sometimes been passing nordrand to my kernel command line to try to reduce the number of warnings.

@fluix-dev
Copy link
Contributor

fluix-dev commented May 2, 2020

We believe so. The random number generator on this specific cpu (Ryzen 3600) is indeed occasionally broken.

He's better at C, etc. so he's taking a look, but right now it looks like a problem outside of sway specifically. He or I will update when we can.

Edit: I have also set the nordrand but it seems the json library has some issues with it.

@harryausten
Copy link
Author

Doing a bit of research, it seems that the RDRAND issue is fixed in newer versions of AMD's AGESA hardware initialisation software, that comes bundled with motherboard BIOS firmware. I think I might try upgrading my BIOS version (as it is currently running quite an old version that it originally came with) to see if that fixes the issue. It would be nice if it also had the added benefit of stopping programs from complaining all the time about RDRAND.

@Xyene
Copy link
Member

Xyene commented May 2, 2020

I've submitted a patch upstream in json-c/json-c#589. Since it fixes the symptoms of the issue (and there's not much we can do on sway's end anyway), I'm going to mark this issue as closed 🙂

Edit: if you apply the patch above, you'll probably also need to apply json-c/json-c#590 to prevent an occasional segfault.

@Xyene Xyene closed this as completed May 2, 2020
@harryausten
Copy link
Author

Hello again. I can confirm that updating the BIOS for my Gigabyte B450i AORUS PRO motherboard from version F40 (released 16/05/2019, containing AGESA 1.0.0.2) to version F50 (released 27/11/2019, containing AGESA 1.0.0.4 B) has fixed the RDRAND instruction (no more warnings from programs/kernel complaining that HW RNG doesn't work) and sway and json-c now function as expected 😄

A big thank you to everyone who helped with this issue 👍

@ephase
Copy link

ephase commented May 2, 2020

@harryausten do you have some problem with AGESA 1.0.0.4 B? Here I downgraded several week ago because problem with zstd package format on Archlinux and some game instability (with Shadow of the Tomb Raider for example), I made a post on Arch Forums here.

@harryausten
Copy link
Author

@harryausten do you hqve some problem with AGESA 1.0.0.4 B? Here I downgraded several week ago because problem with zstd package format on Archlinux and some game instability (with Shadow of the Tomb Raider for example), I made a post on Arch Forums here.

I have successfully installed .zst package updates from the Arch Linux repositories since updating my BIOS so I assume I don't have the same issue as you?

I can't say I've seen any issues with gaming yet, although I've only played Terraria so far, so not exactly a demanding game... I'll update if I discover any issues 👍

@fluix-dev
Copy link
Contributor

If you are having any issues, and thus don't want to update your BIOS, using the patches that Xyene mentioned should work. I have tried them both (and am actually using them now) and it fixed the issue.

Only mildly complex part is applying said patches and building json-c but documentation for this exists.

@ephase
Copy link

ephase commented May 2, 2020

Thanks for replying. So I think this is a problem with my hardware (CPU / Motherboard) and open an issue on vendor website is the thing to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants