Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows Update Destroys WSL System #1195

Closed
ronindesign opened this issue Oct 12, 2016 · 7 comments
Closed

Windows Update Destroys WSL System #1195

ronindesign opened this issue Oct 12, 2016 · 7 comments

Comments

@ronindesign
Copy link

A brief description

  1. Updated Windows to Windows 10 Insider Preview 14931
  2. Tried to open bash.exe, result same as described in Error 0x80040154 #896
    C:\WINDOWS\system32>bash.exe                                                               
    Error: 0x80040154                                                                          
  3. As per Error 0x80040154 #896, re-checked box to enable WSL in Windows Features.
  4. WSL Re-installs, I provide same username as previously existed: C:\Users\MyWinUser\AppData\Local\lxss\home\nixuser
  5. Retry starting bash:
    C:\WINDOWS\system32>bash.exe                                                               
    bash: exec: zsh: not found                                                                 
  6. This is a direct result of my user's shell being set to /bin/zsh, intentionally (i.e. using chsh -s /usr/bin/zsh). However, since WSL was destroyed, ZSH binary doesn't exist (i.e. needs to be reinstalled).
  7. I can't login to reinstall zsh because bash keeps trying to load my default shell, which no longer exists.
    This effectively locks up bash since I've no idea how to change the default shell for my username to revert to /bin/bash. If ZSH was simply launched via config, e.g. in .bashrc or .profile, I could comment this out and let the shell run to install zsh. Instead, I would assume the only place to edit default shell would be in /etc/passwd, yet still, there I have: nixuser:x:1000:1000:"",,,:/home/nixuser:/bin/bash
    I'm not sure if /etc/passwd is showing the result of the reinstalled WSL resetting the entry for my user when I provided it in the WSL install prompt on first run, or what, but if this is the case, why is it still looking for zsh on init??

Expected results

  1. Run Windows Update, then after:
    • Still have WSL installed, with packages, environment, etc unchanged
    • Still have my WSL / nix username and related configs unchanged
    • Continue without having to worry about what else got destroyed

Actual results (with terminal output if applicable)

  1. I try to start Bash on Windows and get error:
    C:\WINDOWS\system32>bash.exe                                                               
    Error: 0x80040154                                                                          
  2. I find WSL feature unchecked in Windows Features
  3. I have to reinstall WSL
  4. After install and reboot, I try starting bash to get locked out of user account due to logic loop:
    C:\WINDOWS\system32>bash.exe                                                               
    bash: exec: zsh: not found                                                                 
  5. zsh will never be found because I cannot login to install it in the first place.
  6. I can't change the shell back to bash to get access, because I can't get into WSL in the first place.

Your Windows build number

14931.1000

Steps / All commands required to reproduce the error from a brand new installation

  1. Start on pre-14931.1000 build.
  2. Enable WSL feature from Windows Features, install, setup username.
  3. Launch bash.exe
  4. Install ZSH (or probably any other shell than bash): sudo apt-get -y install zsh
  5. In new bash prompt, change default shell for your username: chsh -s /usr/bin/zsh
  6. Exit all bash prompts. Update Windows to 14931.1000 build.
  7. After update finishes, try to run bash and find error:
    C:\WINDOWS\system32>bash.exe                                                               
    Error: 0x80040154                                                                          
  8. Check Windows Features to find that WSL has been removed / uninstalled.

Required packages and commands to install

sudo apt-get -y install zsh
chsh -s /usr/bin/zsh

@benhillis
Copy link
Member

@ronindesign I understand your frustration. Let me clarify a couple of things for you.

  1. The issue with the Windows Subsystem for Linux optional Windows feature being disabled is a bug. It's not a bug in WSL, but a generic Windows upgrade issue that effects all optional components. I heard from the deployment team that the bug has been fixed and should be making its way to Windows Insider builds soon. They will also be backporting the fix to the Anniversary Update.
  2. We do not currently read the default shell of the /etc/passwd file when you run bash.exe. We do read the $HOME environment variable, but launching bash.exe will always launch /bin/bash as the shell. There are some historical reasons behind this. Until recently we didn't have great support for shells other than /bin/bash. Instead of allowing the user to get into a bad state where bash.exe launched an non-functional shell we opted to always launch /bin/bash. Now that WSL is maturing we are starting to rethink old assumptions like this. For example, perhaps we should be running the /bin/login process which handles setting up user configuration (instead of launching /bin/bash directly).

I hope that helps clarify things. Again, sorry for the frustration.

@alexanderwhatley
Copy link

I got this exact problem as well, but when I reinstalled WSL, it was able to find all of the old software that I installed.

@ronindesign
Copy link
Author

ronindesign commented Oct 12, 2016

EDIT: I did have exec zsh in C:\Users\MyWinUser\AppData\Local\lxss\home\nixuser\.bashrc. As soon as I removed this, I was able to run bash.exe as usual. See comment below: #1195 (comment)

My current solution is:

  1. After Windows Update, reinstall WSL from Windows Features window in settings, reboot as prompted.
  2. Try to open Bash on Windows prompt or run C:\WINDOWS\System32\bash.exe from command prompt.
  3. Fresh WSL install should prompt for accepting agreement, accept it.
  4. Enter your previously configured / installed username (case sensitive), mine was nixuser for example.
  5. Finish user setup prompts by entering password, etc. When it's done, the prompt should crash or error out (probably trying to load existing users's default shell), this is OK.
  6. Open regular windows command prompt (might need elevated permissions, might need to cd C:\WINDOWS\System32).
  7. Run bash.exe directly with option: bash.exe -c "sudo -i", (If bash binary is in your PATH, you can probably do `bash -c "sudo-i" from any directory in your command prompt)
  8. Enter your user's password (your WSL user should automatically have been added to sudoers during WSL first-run config).
  9. Your prompt should now launch bash into the root account, where you can re-install ZSH to allow access and normal launch behavior of Bash app. You should simply be able to exit or close prompt now and run Bash prompt directly to login with your default user.

Example:

C:\WINDOWS\system32>bash -c "sudo -i"
[sudo] password for static:
root@MyComputer:~# zsh
The program 'zsh' is currently not installed. You can install it by typing:
apt-get install zsh
root@MyComputer:~# apt-get install -y zsh
Reading package lists... Done
...
Setting up zsh (5.0.2-3ubuntu6) ...
(some warnings about update-alternatives, probably because of existing config stuff lying around...)
root@MyComputer:~# which zsh
/usr/bin/zsh
root@MyComputer:~# exit
logout
C:\Windows\System32>exit

Additional Thoughts

  • Lots of different approaches using bash -c "<command>" to get zsh installed are probably possible, e.g. maybe bash -c "sudo apt-get install -y zsh" directly. Not wasting any more time on this to test.
  • The Windows Update (and apparent uninstall / removal of WSL) doesn't seem to have deleted much (if any) content from C:\Users\MyWinUser\AppData\Local\lxss.
  • I have no clue the larger implications of what or how the WSL reinstall changed my existing data in C:\Users\MyWinUser\AppData\Local\lxss I may very well run into any unknown number of bugs or issues as a result of missing packages, configurations, etc. Not necessarily as a direct result of the WSL reinstall, but simply do to Ubuntu clean-up when it sees all these packages have been removed. I'm just not familiar enough with Ubuntu to understand the gravity of having previous packages removed.
  • If this is going to be a recurring event, it might be worth investing time in a simple Ansible playbook for restoring "working" state to my local Bash on Windows WSL environment.

@ronindesign
Copy link
Author

ronindesign commented Oct 12, 2016

@benhillis firstly, thanks for the quick response, I appreciate it!

Secondly:

  1. Yep, that makes sense. While searching, I found other references to missing / removed applications resulting from Windows Update. Glad fixing is in the works.
  2. You're absolutely correct, I may have glazed over this in my recount. bash.exe definitely launches /bin/bash, this is why bash -c "sudo i" or bash -c "sudo apt-get install -y zsh" work. It's very possible I have configured bash to autorun ZSH on init, as recommended by some blogs and other issue here.

EDIT: I was incorrectly looking in the wrong.bashrc:
C:\Users\MyWinUser\AppData\Local\lxss\root\.bashrc
instead of:
C:\Users\MyWinUser\AppData\Local\lxss\home\nixuser\.bashrc

In this file, I have:

# Launch Zsh on session start
if [ -t 1 ]; then                 
    cd ~                          
    exec zsh                      
fi

If I would have found this and commented out, it would have definitely solved my issue, I'm sure. Sorry about my failure here and the confusion.

However, I probably would have been saved a lot of trouble if bash.exe did not close out entirely when it received the error on exec zsh in .bashrc that zsh: not found.

And in fact, I can reproduce for any not existing command (and probably for any command that returns unsuccessful):

  1. Add exec fakeTestCommand to ~/.bashrc and save.
  2. Try to open Bash on Windows app. It immediately closes.
  3. Try manually to run: C:\Windows\System32>bash.exe, result:
    bash: exec: fakeTestCommand: not found

This is natural behavior for /bin/bash shell, but maybe it's fatal to bash.exe during init processing of start-up scripts (e.g. .bashrc, .profile, /etc/profile.d/...).

Maybe this issue should be changed or a new issue created for this behavior?

Had bash.exe not hard faulted during init process, it would have tried to exec zsh, return unsuccessful error, but I would still be left on fall back of /bin/bash from where I could have reinstalled zsh manually.

@alexanderwhatley Thanks for the response! For me, dpkg -l doesn't show any of the packages I had installed, e.g. htop. Did you have to do any intermediary steps to restore packages? Re-symlink or even restart / reconfigure something? It's very possible they're still installed at C:\Users\MyWinUser\AppData\Local\lxss\rootfs, but maybe just now showing up internally to WSL or reinstalled package manager?

@rodrymbo
Copy link

rodrymbo commented Oct 12, 2016

Suggestions:

With Linux, when you run into trouble like this, one solution is to log in as root (if/where allowed, e.g. console) and use the superuser to tweak the user's .bashrc or change the user's passwd or install the zsh package or whatever. One can do that with WSL by using the setdefaultuser option with lxrun. One can then set the default user back to whatever your user is and proceed.

The other suggestion is to try the Linux solution before resorting to looking at files in %LOCALAPPDATA%\lxss. Editing files there almost always results in them disappearing from the WSL environment (because extended attributes don't get set properly). Sure, deleting things that way might help rescue something, but trying other solutions (which includes logging in as root) might also solve the problem.

Keep a backup of one's home directory and other changes one makes, in case some Insider Preview breaks things. That way if you have to delete everything (yes, including %LOCALAPPDATA%\lxss) and start over, there will be minimal losses.

Some teacher of mine a while ago said one should always use fully qualified paths when calling programs in bash scripts. One reason is to speed up the process, so bash doesn't have to look at all the places in the PATH; another is so you get the one you want if there is more than one on the PATH. While one is at it, one can do an existence test on that path, and if the file is not there (for whatever reason) take corrective action or just do nothing... (Murphy was an optimist.)

I'm not minimizing the issues cited, just suggesting some additional troubleshooting strategies that might help get past them.

@ronindesign
Copy link
Author

@rodrymbo Thanks for the insight, I'm sure it will be useful to others as well.

Re: lxrun setdefaultuser - I found this but didn't think it would work to set root as default user, since I remember something about Ubuntu not having a root user that could be changed to, and that basically you needed / should use sudo for everything. However, obviously the user exists since sudo -i works... I figured I would have had to create a new user to set as the default, which I was worried might risk further mangling of fs since I had no clue the state of the WSL after reinstall. I'll remember to try this if in the future it happens.

Re: disappearing files in WSL - I've definitely ran into issues with files being effected when editing from the outside. For example, I have lxss\home\nixuser\.ssh junctioned to my Win user dir at C:\Users\WinUser\.ssh so that my IDE and other apps can use the same set of keys. I ran into the issue you described when windows apps try to update .ssh\known_hosts with new entries. Then back inside WSL, when using ssh user@host, I always get prompts to add the host to my known_hosts file, which always fail. Not all win apps cause this issue, I believe in my case, the prominent one was PuTTY. If I recreate the file and copy the contents back into it from the WSL side, it works fine again, but definitely not convenient and does require manual monitoring and/or intervention.

Re: bash scripts - All good points, I unfortunately have minor real world experience with bash scripts. In terms of absolute path, I assume checking for existence of /usr/bin/zsh and gracefully continuing (do nothing) would have been ideal here...

@rodrymbo
Copy link

rodrymbo commented Oct 14, 2016

The documentation on /setdefaultuser root doesn't say explicitly that it would work, but I'd say trying it would be a good option, especially if one is at the point of nuking the whole directory and starting over anyway. For the most part, logging directly in to root, especially from ssh, is blocked as a security measure, not because root doesn't exist. In particular, in regular Linux, most of the time one can log into root from the physical command-line console (if one knows the password). Since /setdefaultuser means the default user gets to log in without a password, all that's left is to try it. It probably would be good if the docs somewhere suggested that explicitly as a troubleshooting option.

Yes, ssh is quite picky (I mean REALLY picky, and rightly so) about permissions and such in the ~/.ssh directory. Files found in /mnt (DrvFS) have all permission bits set (for example) so when ssh checks to make sure no one but the user can read the private key or write to the 'authorized_keys' or 'known_hosts' files, it will/should reject anything linked from DrvFS. So ~/.ssh is not a good candidate for that sort of thing. Add to it that files written to the VolFS filesystem directly from Windows (almost always) become inaccessible in WSL (at least at present) and we end up with lots of roadblocks to keeping the two environments coordinated.

Copying files from DrvFS into your home directory (for example) from within WSL works because WSL maintains extended attributes. If one is lazy (which is why scripts were invented) one could copy the files and then change any permissions that need changing, via a script. One could put such a script into, say, .bashrc (or the corresponding file for zsh), so it runs each time one logs in. That might or might not work for your situation, but it is an example. There are tests one can do in such a script to see which file is newer (for example) or to merge changes, so there is scope for learning a thing or two (by which I mean I would learn a thing or two trying to make it work).

As for bash scripts, this is called "Bash on Windows" after all, so one could expect a bit of bash, even on the way to using zsh. :P But don't the points I made, such as using the full path to executables and testing for existence before accessing a file, apply also to zsh scripts? Anyway, every bash book I've seen includes examples of how to test for existence (I like to copy from the example, rather than try to do it from memory) so it isn't hard to come up with a script segment to put into .bashrc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants