-
Notifications
You must be signed in to change notification settings - Fork 655
a reboot should occur after a kernel panic #1785
Comments
oh horrible! I'm not totally sure that having my servers flapping due to a kernel bug is the right answer as a default though :/ |
Devices freezing on kernel panic during early init seems to have one global fix above. These Chromebox reboot tests had an unacceptable rate of failure so this was (at the time, grr) the only fix. Yeah, the rate of failure is quite horrible. Meantime, found an especially useful parameter in Sven, I can't stop myself from arguing about this becoming a default for the entire OS. Rebooting on any non-oops kernel panics or unfed watchdog timers is really a basic standard in the embedded Linux industry. Kubernetes will trigger a reboot after a kernel panic on oops. Regardless, we should let Rancher OS recover from this unrecoverable state. Forget about a case where infinite flapping could occur. Even if the environment is on-metal or in-datacenter the bar should still be set to prevent the need to power cycle for a hung operating system. </rant> May be we could change these these kernel configuration values kernel hands off to system-docker?
Hmm.. It looks like the kernel parameter panic=10 (set by Grub) will also set
|
I like the idea that we set reboot on oops after we've gotten the system to a reasonably sane place. For example, if you create a VM with not enough memory, RancherOS now panics - it used to carry on, and you'd have something that looks like its an ok system, but with some random system service not existing, or not running - and I'd hate to flap on that. mucho 💯 to setting auto-reboot, though I'm thinking perhaps at the point where user-docker is up? |
I'm confirming using v1.3.0-rc1 problem still exist. How to reproduce :
RancherOS will die with kernel panic and stuck on that condition until manually push reset/CTRL+ALT+DEL. Auto reboot will be great to auto refresh everything without touching the bare metal placed somewhere else. Thank you |
please confirm kernel parameters looking for similar line:
|
[ 0.000000] Command line: BOOT_IMAGE=../vmlinuz-4.15.9-rancher printk.devkmsg=on rancher.state.dev=LABEL=RANCHER_STATE rancher.state.wait console=tty0 rancher.autologin=tty1 initrd=../initrd-v1.3.0-rc1 |
From d263be4 it looks like it went into
The file
Last question, how did you install Docker OS to disk? Did you upgrade from a previous version or was this a fresh installation? If it was a fresh installation, we need to figure out why this specific parameter was left out, because it is grouped up with the other (and seemingly unique) parameters found in |
i do fresh install, burn iso to usb disk with rufus 2.18 ( choose mode iso burn than dd ), then do 'sudo ros config syslinux' to add autologin without touching any parameters i will try another mode for rufus burning maybe rufus tampering kernel because rufus complaint something about vmlinuz is missing when first burning i'll confirm soon |
Confirming, re-download v3.1.0-rc1, test several times on virtualbox image : https://pasteboard.co/Hduq5rl.png but when installed on hardisk, no more panic parameter https://pasteboard.co/HduqgPN.png ( we're seeing duplicated autologin because i choose autologin from rancheros boot screen ) Here's when i choose default on boot screen, still there's nothing about panic params :
May be you forgot to transfer params from iso to hdd when installing? |
Fixed by fe5d2dd @kingsd041 can you help me confirm this? |
@niusmallnan
|
Close it. |
RancherOS Version: (ros os version)
All version -- This is hardware & kernel related
Where are you running RancherOS? (docker-machine, AWS, GCE, baremetal, etc.)
Baremetal
Background
Some platforms (take HP/ASUS Chromebox for example) are susceptible to pre-init kernel panics when the "Verified Boot" process (UEFI) is replaced with a "Legacy Boot" process.
Issue
Rancher OS does not soft-reset after a kernel panic. A reboot of the system should follow after a kernel panic occurs.
The text was updated successfully, but these errors were encountered: