Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whole system crash after upgrading SteamOS #20

Closed
StoneMoe opened this issue Dec 18, 2022 · 32 comments
Closed

Whole system crash after upgrading SteamOS #20

StoneMoe opened this issue Dec 18, 2022 · 32 comments
Labels
bug Something isn't working enhancement New feature or request wontfix This will not be worked on
Milestone

Comments

@StoneMoe
Copy link

StoneMoe commented Dec 18, 2022

Step to reproduce:

  1. pacman -Sy glibc
  2. Change SteamOS from Stable to Preview Channel
  3. Apply SteamOS Update
  4. SteamOS won't boot anymore
@StoneMoe
Copy link
Author

So I reinstalled rootfs partitions via recovery image, and It can be reproduced again with following steps:

  1. Reinstall(keep home partition) Steam Deck
  2. Switch to Preview Channel and reboot to new system partition
  3. try to mount rwfus again with rwfus -iI
  4. whole system crash on the fly with massive glibc version mismatch errors
  5. force reboot and it won't boot anymore.

@StoneMoe
Copy link
Author

I think the Jank warning section in README maybe miss the real point.

We still need to follow ArchLinux's system maintenance guide, Partial Upgrade is not supported by rolling release distro and will always break things, install packages without -u won't help.

We need to rethink how to keep the system from crashing.

@ValShaped
Copy link
Owner

ValShaped commented Dec 19, 2022

You have a good point. This project doesn't stop you from shooting yourself in the foot, which is exceedingly easy if you're switching between SteamOS branches. I'm not sure how we could safely recover from that, yet. Thanks for reporting!

@StoneMoe
Copy link
Author

StoneMoe commented Dec 21, 2022

I can confirm that each branch of SteamOS uses its own pacman repo setting currently , and packages in the repo is synced with the latest version of current branch. packages in repository is not always synced with current branch snapshot.

for rwfus:
Find a way to detect major errors, and disable mount service automatically, and show a notification to user about unmouting after a successful bootup.

@sim590
Copy link

sim590 commented Dec 23, 2022

Is this fixed ? I just made an update of SteamOS after installing rwfus and now my Steam Deck is stuck at the Steam Deck logo and wont boot.

Although, I want to point out that it was not a update channel switch that caused my Steam Deck to crash since I was on Beta already when I first installed rwfus. It's only the latest update of that channel that did cause my Steam Deck to not boot.

@sim590
Copy link

sim590 commented Dec 23, 2022

Any idea how to undo this ? Can I somehow chroot into the Steam Deck and revert what I did ? May be using the Steam Deck bios or something?

Nvm. I see that you did give some instructions on how to do that above. I'm gonna try that.`

@sim590
Copy link

sim590 commented Dec 24, 2022

After reinstalling SteamOS and updating to the latest (with the SteamOS 3.4 update), I tried to install rwfus and then I got errors about glibc 2.34 not found. The system then stopped working right after I ran the rwfus --install command and I had to cold shutdown. At this point, the Steam Deck wouldn't boot again, so I had to redo the SteamOS reinstall. It seems to me that rwfus is completely breaking SteamOS after the SteamOS 3.4 latest update on the beta branch.

@ValShaped
Copy link
Owner

ValShaped commented Dec 24, 2022

Reopening this, because two people having similar issues, one without prompting, can't be a coincidence. I'm visiting family right now, so I can't promise a quick resolution, but I'll look into it when I have the chance.

@ValShaped ValShaped reopened this Dec 24, 2022
@ValShaped
Copy link
Owner

ValShaped commented Dec 24, 2022

@sim590 Did you happen to install any packages which rely on glibc before updating? I'm failing to reproduce this with rwfus dev on SteamOS 3.4.3 Preview

@ValShaped ValShaped reopened this Dec 24, 2022
@StoneMoe
Copy link
Author

FYI: Recent SteamOS update pulled new snapshot of Arch Linux, which contains a GLIBC update(2.34 to 2.36). so a old glibc remained in rwfus overlay will conflict with newer ABI requirement

see this Stable Channel Release Note
or this Preview Channel Release Note

@StoneMoe
Copy link
Author

@sim590 btw you can fix your Deck without reinstalling by:

  1. Boot into recovery image
  2. Remove /home/.steamos/offload/opt/rwfus/service/rwfusd.sh in your home partition
  3. Reboot
  4. Uninstall rwfus and all rwfus files
  5. Reinstall rwfus and your packages based on latest SteamOS

@StoneMoe
Copy link
Author

This issue may also related to Valve's pacman repository since their repo is not really consistent

see https://www.reddit.com/r/SteamDeck/comments/zrc2ep/comment/j175s33/?utm_source=share&utm_medium=web2x&context=3

@derram
Copy link

derram commented Dec 24, 2022

Had to reinstall SteamOS after the recent update because my Deck was stuck on the logo at boot as well.

@sim590
Copy link

sim590 commented Dec 25, 2022

@StoneMoe

@sim590 btw you can fix your Deck without reinstalling by:

  1. Boot into recovery image
  2. Remove /home/.steamos/offload/opt/rwfus/service/rwfusd.sh in your home partition
  3. Reboot
  4. Uninstall rwfus and all rwfus files
  5. Reinstall rwfus and your packages based on latest SteamOS

Thanks! That worked like a charm. However, would there have been another way to do this in order to keep my installed packages? The idea of using rwfus is to do just that, so I would like a way to do that while waiting for a fix, but I don't really know how. I guess that the part where I did rwfus --remove is where I lost my installed packages. Could there be a way to reinstall glibc in the partition without removing it?

FYI: Recent SteamOS update pulled new snapshot of Arch Linux, which contains a GLIBC update(2.34 to 2.36). so a old glibc remained in rwfus overlay will conflict with newer ABI requirement

see this Stable Channel Release Note or this Preview Channel Release Note

Yeah. That must be what happened. Hope this doesn't happen too often until we find a way to solve this cleanly.

@ValShaped:

@sim590 Did you happen to install any packages which rely on glibc before updating? I'm failing to reproduce this with rwfus dev on SteamOS 3.4.3 Preview

Here is my pacman.log. I did add two markups in order to see clearly where I did install rwfus and where I fixed the system and reinstalled rwfus (just now). Just look for a lines with many leading #.

@StoneMoe
Copy link
Author

@sim590 there is really no good way to avoid this conflict issue, especially a system-wide dependency like GLIBC.

do a full system package upgrade(pacman with -Syu) on current SteamOS branch before upgrading your SteamOS maybe a possible way to avoid this, though there is no guaranty on this :D

@ValShaped
Copy link
Owner

pacman -Syu will duplicate all the updated packages, and bloat your rwfus storage dramatically. It looks like pacman reinstalled glibc. Try installing the package with the --needed flag; that tells pacman to ignore up-to-date packages and not reinstall.

I'll update the install notes.
regarding a fix for this -- would wrapping pacman be acceptable? I find it kinda gross, but if it keeps people from having to wipe rwfus every update (i.e. the entire reason why rwfus is a thing in the first place) I'd be glad to do it.

@sim590
Copy link

sim590 commented Dec 29, 2022

@ValShaped I did explicitly run pacman -S glibc since this is the fix that I found in order to get my locale working. Otherwise, I could not change my locale settings. If I recall correctly, locale-gen would simply not generate the locale I asked for (decommenting the appropriate line in the locale.gen file). Others suggested to reinstall glibc and it did indeed fix it for me, so I didn't wonder further and went with the solution.

Actually, this here explains pretty much what I experienced before reinstalling glibc:

https://steamcommunity.com/app/1675200/discussions/0/5135803832902211410/

@ValShaped ValShaped added this to the v1.0.0 milestone Jan 4, 2023
@ValShaped
Copy link
Owner

ValShaped commented Jan 16, 2023

I don't think I can fix this. It's one case where I think the smart thing to do would be steamos-devmode enable, grab the latest glibc, and then relock steamos-readonly if you want it locked. It'll get fixed in a SteamOS update sometime, and when it does, you won't have to worry about carrying it to the new update.
That goes against Rwfus's core mission, but I originally made Rwfus for the specific case of taking user-mode software (Yakuake, in specific) along for the ride, not necessarily core libraries or operating system components.

@StoneMoe
Copy link
Author

StoneMoe commented Jan 18, 2023

I think we can try recording current SteamOS version to /opt/rwfus/os-version
if recorded version isn't match with the version in /etc/os-release on boot-time
then we know SteamOS is upgraded, so we can echo 1 > /opt/rwfus/os-upgrading Before mount overlayfs
then we can try to run some basic system command like cd/ls/ps and check their exit code to make sure basic system command are working, then echo 0 > /opt/rwfus/os-upgrading to indicate a successfully system boot.

but if cat /opt/rwfus/os-upgrading is 1 also opt/rwfus/os-version doesn't match with current SteamOS before overlayfs mounting, then we know there is a critical system failure due to rwfus mount happened previously, and user hard-reset Steam Deck, so we can skip overlayfs mount this time, to make sure system safely booting

@StoneMoe
Copy link
Author

Installing user-mode softwares on a rolling release, like Arch Linux, also possibly required to a core libraries upgrading afaik.
unless user never upgrading his user-mode softwares before SteamOS upgrading its libraries.
So it's would be great to provide a graceful way to avoid this problem.

@ValShaped
Copy link
Owner

Bearing in mind that Valve doesn't tend to update packages on non-main without pushing an atomic update, it's unlikely that a package will have a core dependency in the SteamOS repos that's from a newer SteamOS version, unless the user is sandbagging an update, or on the main OS update channel.

Regarding your solution of flag files, I think the presence or absence of those files would be sufficient, however, I'm not sure of the operating environment after a mismatched glibc trashes everything.

@StoneMoe
Copy link
Author

I'm not sure of the operating environment after a mismatched glibc trashes everything.

The goal is to make sure user can run basic commands, so at least they can fix their system without a recovery media drive, more integrity tests can be introduced later as this project goes on

@ValShaped
Copy link
Owner

Since Rwfus is written entirely in Bash, it's reliant on everything else still functioning, as far as I know. That's what "I'm not sure of the operating environment" refers to; Is Rwfus capable of surviving? How would I know if it did?
Unfortunately, the VM I use to test Rwfus decided to irrecoverably die (I'm not even sure how! it's a VM, but even restoring from a known-good state has failed!) during the migration from SteamOS 3.3.3 to 3.4. I can't do anything too adventurous with Rwfus until I have time to make a new one. Sorry about the lack of progress on this issue, I know it's as frustrating for you as it is for me.

@ValShaped
Copy link
Owner

I'm adding an explicit check for glibc's presence in the next beta; if present, Rwfus will not mount the overlays and log an error.

@ValShaped ValShaped added the bug Something isn't working label Feb 6, 2023
@ValShaped ValShaped added the enhancement New feature or request label Feb 6, 2023
@ValShaped
Copy link
Owner

As it turns out, the default pacman.conf explicitly ignores glibc updates. Deactivating after an unsuccessful boot should go on the roadmap, but for a variety of reasons I can't support reinstalling glibc.

@L1Z3
Copy link

L1Z3 commented Feb 7, 2023

Hey, I installed glibc the other day. What steps can I take to undo this so I don't get a system crash the next update? (I would guess a simple sudo pacman -R glibc wouldn't end well, and I'd prefer not to redo my rwfus install since I have quite a few packages installed that took quite a lot of effort to install due to weird dependency issues.)

Edit: I'm just reinstalling rwfus, I don't wanna risk having to reformat my Deck.

@ValShaped
Copy link
Owner

ValShaped commented Feb 9, 2023

I'm sure there's an easy way to do it, but I'm not confident enough with sed to make it happen. pacman -Qql will list all files owned by a package; you can prepend those paths with /opt/rwfus/mount/upper[/usr/lib/...] and target the files inside the overlay, and pass all those paths to rm

@OpenBagTwo
Copy link

So I appreciate this, but fwiw, I had installed glibc via rwfus (I'm guessing from when I installed base-devel) and upon upgrading from 3.4.4 to 3.4.6 experienced no crashes.

Ironically, when I pulled in the latest changes to rwfus' dev branxh and ran rwfus - u, that almost did break my system given just how much I've installed to the overlay (for example, I was doing all this work in gaming mode through sway, and I'm grateful that sway and alacritty persisted while I got everything sorted out.

Once I downgraded to the last tag and reran rwfus - u I confirmed that my version of glibc was 2.36-6.

What I suspect is going on here is that I haven't added any new mirrors--I still get all my packages from the Valve repos. Please correct me if I'm wrong, but I believe this means that system upgrades with rwfus are perfectly safe for me, since Valve isn't including anything in the persistent filesystem that's not sourced from there, so long as I perform a pacman -Syu first

I'm being a little glib (🥁) here, as I probably don't want to actually go through with an update that installs overlays for newer versions of packages that are solely part of the persistent filesystem, but:

  1. Is that reasoning sound?
  2. If so, should the instructions be amended to suggest against adding new mirrors?
  3. And finally, should there be a way to tell rwfus to "do as I say" and enable rwfus even if one has installed glibc into the overlay?

@ValShaped
Copy link
Owner

ValShaped commented Mar 19, 2023

@OpenBagTwo I'm glad it worked out for you! Unfortunately, the issue isn't with adding software from other repos -- that should be perfectly fine, as long as that software isn't glibc. The issue lies when Valve updates their glibc dependency in the SteamOS base image, and a Rwfus user has it installed in the overlay. When Rwfus activates, glibc is downgraded to whatever version the user has installed. It's a critical flaw in the design of Rwfus, and one I really don't know how to fix well enough for an inexperienced user.

As far as I know, Valve doesn't update glibc in the SteamOS base image very often, but they did just before this issue was opened, as part of the SteamOS 3.4 beta (and for everyone else as part of the SteamOS 3.4 update). Different SteamOS branches use different pacman repositories (currently they're *-rel and *-main, previously *-beta and *-main), so there's no clean way to update glibc except to pull in the glibc version from the branch you intend to switch to before switching.

As for a "do as I say" option, I could add something to the config file for that, sure. I'll work on that when I get home this evening.

Also, if you're running out of space, you can update the Disk_Image_Size config option in /etc/opt/rwfus.conf to whatever size you want, (i.e. 16G,) then run rwfus -u to increase the maximum size the image will grow to.

@OpenBagTwo
Copy link

Are you saying that as long as your glibc is at least as new as the one that Valve uses, you'll be okay? I guess the part I'm confused by is the recommendation to avoid pacman -Syu--as long as you're not removing the Valve repo, then wouldn't keeping your system up-to-date be perfectly safe? I mean, obviously there can be a package conflict that involves some component of SteamOS that's not compatible with a newer package version, but that shouldn't happen when moving to newer versions of those packages, right?

@ValShaped
Copy link
Owner

ValShaped commented Mar 19, 2023

That recommendation is there to keep people from accidentally installing base SteamOS packages in their Rwfus directory. If you already have base SteamOS packages in yours, go ahead and ignore it.

However, I consider that basically equivalent to enabling steamos-devmode and running everything on the bare filesystem, with the added benefit of having to get a physical mouse and keyboard out and enter single-user mode whenever it breaks.

@ValShaped ValShaped mentioned this issue Sep 16, 2023
@ValShaped ValShaped modified the milestones: v1.0.0, 0.4.2 Sep 16, 2023
@ValShaped
Copy link
Owner

ValShaped commented Sep 17, 2023

SteamOS 3.5 has a /nix directory in the base image, which can be used with SteamOS-Offload's nix.mount to keep a persistent Nix store. If your packages are available on NixOS it might be a good idea to install and use Nix.

nix-installer handles that setup automatically.

As a package manager, Nix is equipped to do dependency management across packages with incompatible dependencies (like, for example, versions of glibc,) which Rwfus has no clean way to do. As such, the workaround instituted in 0.4.2/0.5.0 will be staying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

6 participants