Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade of zfs to new version corrupts dkms kernel module configuration in Fedora #9891

Closed
jochendemuth opened this issue Jan 25, 2020 · 29 comments · Fixed by #10327
Closed
Labels
Component: Packaging custom packages

Comments

@jochendemuth
Copy link

System information

Type Version/Name
Distribution Name Redhat Fedora
Distribution Version Workstation 31
Linux Kernel 5.4.13
Architecture x86_64
ZFS Version 0.8.2 / 0.8.3
SPL Version

Describe the problem you're observing

With every upgrade of zfs through the official repo on Fedora all zfs devices disappear. Here is the specific series of events that lead to this issue:

  • zfs is successfully installed with dkms module, verified with 'dkms status' command. kernel version upgrades will automatically compile and install new zfs kernel module specific to kernel versions.
  • zfs team make new version of zfs available in their repo
  • running dnf update, will download new zfs package (in this case version 0.8.3) and related dependencies
  • as part of the zfs package upgrade, the existing older version of zfs (in this case version 0.8.2) gets removed. Removing the package, among other things, removes the folder containing the source code of the older zfs version (in this case '/usr/src/zfs-0.8.2'), and the existing kernel module
  • The dkms configuration, which depends on the source folder, is corrupted. This is verified via 'dkms status' which in this case returns the following:
    $ dkms status
    Error! Could not locate dkms.conf file.
    File: /var/lib/dkms/zfs/0.8.2/source/dkms.conf does not exist.
  • the dnf continues to install the new version of zfs (in this case 0.8.3), sets up the the new version with dkms (I assume using a 'dkms add zfs/0.8.3' command or similar)
  • the remaining key step of the zfs package installation, the compilation of the zfs kernel module fails due to the corrupted dkms configuration.
  • following a reboot, the zfs based deviced have disappeared, because no existing kernel module exists. This can be verified with
    $ zpool status
    The ZFS modules are not loaded.
    Try running '/sbin/modprobe zfs' as root to load them.
    $ modprobe zfs
    modprobe: FATAL: Module zfs not found in directory /lib/modules/5.4.13-201.fc31.x86_64

Describe how to fix the issue

enhance the removal script of fedora package add a command which removes this zfs version from the dkms configuration, e.g. 'dkms remove zfs/0.8.2' or similar before the source code/folder is removed from hard drive (otherwise the dkms remove command fails because the dkms configuration is already corrupted).

Describe how to reproduce the problem

$ dnf update

Include any warning/errors/backtraces from the system logs

@eggehad
Copy link

eggehad commented Jan 27, 2020

Same thing happened to me. My dkms configuration was corrupted during the zfs update, which in turn broke my nvidia drivers, which then took out my plasma desktop. Took hours to recover. Thanks for your help above, I was able to manually work through the issues and get up and running again.

@behlendorf behlendorf added the Component: Packaging custom packages label Jan 27, 2020
@johnnyjacq16
Copy link

A temporary work around to this problem, is to remove zfs completly with dnf remove zfs zfs-dkms also remove zfs directory from /var/lib/dkms/ doing this remove zfs version 0.8.2 then install zfs which will install the new version 0.8.3. This way however presents new problems as well.

This issues is if you have more than one kernel installed, first you will need to check that the zfs modules has been built for each installed kernel and not just the running kernel, to do this run dkms status and compare the kernel version with what is /boot/vmlinuz/[kernel version number].

dkms status
zfs, 0.8.3, 5.4.10-200.fc31.x86_64, x86_64: installed
zfs, 0.8.3, 5.4.12-200.fc31.x86_64, x86_64: installed
zfs, 0.8.3, 5.4.13-201.fc31.x86_64, x86_64: installed
ls
/boot/vmlinuz-5.4.10-200.fc31.x86_64  
/boot/vmlinuz-5.4.12-200.fc31.x86_64  
/boot/vmlinuz-5.4.13-201.fc31.x86_64

When you have more than one kernel installed the issue is that there is a format exec error on zfs which is cause by a mismatch of the zfs module and zfs dependency module version built for each kernel.
zfs and zfs dependency modules:
icp.ko.xz spl.ko.xz zavl.ko.xz zcommon.ko.xz zfs.ko.xz zlua.ko.xz znvpair.ko.xz zunicode.ko.xz

The mismatch:

modinfo icp -k 5.4.10-200.fc31.x86_64
filename:       /lib/modules/5.4.10-200.fc31.x86_64/weak-updates/icp.ko.xz
version:        0.8.3-1
license:        CDDL
author:         OpenZFS on Linux
srcversion:     FF2D0FC75B0239BC41E38DE
depends:        spl,zcommon
retpoline:      Y
name:           icp
vermagic:       5.4.13-201.fc31.x86_64 SMP mod_unload 
parm:           icp_aes_impl:Select aes implementation.
parm:           icp_gcm_impl:Select gcm implementation.

As show above the vermagic is different from the kernel version, similar for other kernels as well except the actual kernel the zfs module was built against.

root@fedoraOS ~]# modinfo icp -k 5.4.12-200.fc31.x86_64 
filename:       /lib/modules/5.4.12-200.fc31.x86_64/weak-updates/icp.ko.xz
version:        0.8.3-1
license:        CDDL
author:         OpenZFS on Linux
srcversion:     FF2D0FC75B0239BC41E38DE
depends:        spl,zcommon
retpoline:      Y
name:           icp
vermagic:       5.4.13-201.fc31.x86_64 SMP mod_unload 
parm:           icp_aes_impl:Select aes implementation.
parm:           icp_gcm_impl:Select gcm implementation.
[root@fedoraOS ~]# 
[root@fedoraOS ~]# 
[root@fedoraOS ~]# modinfo icp -k 5.4.13-200.fc31.x86_64 

The below match
[root@fedoraOS ~]# modinfo icp -k 5.4.13-201.fc31.x86_64 
filename:       /lib/modules/5.4.13-201.fc31.x86_64/extra/icp.ko.xz
version:        0.8.3-1
license:        CDDL
author:         OpenZFS on Linux
srcversion:     FF2D0FC75B0239BC41E38DE
depends:        spl,zcommon
retpoline:      Y
name:           icp
vermagic:       5.4.13-201.fc31.x86_64 SMP mod_unload 
parm:           icp_aes_impl:Select aes implementation.
parm:           icp_gcm_impl:Select gcm implementation.

Other zfs dependency modules and zfs:

[root@fedoraOS ~]# modinfo zavl -k 5.4.10-200.fc31.x86_64 
filename:       /lib/modules/5.4.10-200.fc31.x86_64/weak-updates/zavl.ko.xz
version:        0.8.3-1
license:        CDDL
author:         OpenZFS on Linux
description:    Generic AVL tree implementation
srcversion:     98E85778E754CF75DEF9E8E
depends:        spl
retpoline:      Y
name:           zavl
vermagic:       5.4.13-201.fc31.x86_64 SMP mod_unload 
[root@fedoraOS ~]# 

[root@fedoraOS ~]# modinfo zavl -k 5.4.12-200.fc31.x86_64 
filename:       /lib/modules/5.4.12-200.fc31.x86_64/weak-updates/zavl.ko.xz
version:        0.8.3-1
license:        CDDL
author:         OpenZFS on Linux
description:    Generic AVL tree implementation
srcversion:     98E85778E754CF75DEF9E8E
depends:        spl
retpoline:      Y
name:           zavl
vermagic:       5.4.13-201.fc31.x86_64 SMP mod_unload 

[root@fedoraOS ~]# modinfo zavl -k 5.4.13-201.fc31.x86_64 
filename:       /lib/modules/5.4.13-201.fc31.x86_64/extra/zavl.ko.xz
version:        0.8.3-1
license:        CDDL
author:         OpenZFS on Linux
description:    Generic AVL tree implementation
srcversion:     98E85778E754CF75DEF9E8E
depends:        spl
retpoline:      Y
name:           zavl
vermagic:       5.4.13-201.fc31.x86_64 SMP mod_unload 


[root@fedoraOS ~]# modinfo zfs -k 5.4.10-200.fc31.x86_64 
filename:       /lib/modules/5.4.10-200.fc31.x86_64/weak-updates/zfs.ko.xz
version:        0.8.3-1
license:        CDDL
author:         OpenZFS on Linux
description:    ZFS
alias:          devname:zfs
alias:          char-major-10-249
srcversion:     C6177AA5049CC30B672B1CA
depends:        zlua,spl,znvpair,zcommon,icp,zunicode,zavl
retpoline:      Y
name:           zfs
vermagic:       5.4.13-201.fc31.x86_64 SMP mod_unload
......... 

[root@fedoraOS ~]# modinfo zfs -k 5.4.12-200.fc31.x86_64 
filename:       /lib/modules/5.4.12-200.fc31.x86_64/weak-updates/zfs.ko.xz
version:        0.8.3-1
license:        CDDL
author:         OpenZFS on Linux
description:    ZFS
alias:          devname:zfs
alias:          char-major-10-249
srcversion:     C6177AA5049CC30B672B1CA
depends:        zlua,spl,znvpair,zcommon,icp,zunicode,zavl
retpoline:      Y
name:           zfs
vermagic:       5.4.13-201.fc31.x86_64 SMP mod_unload 
........

[root@fedoraOS ~]# modinfo zfs -k 5.4.13-201.fc31.x86_64 
filename:       /lib/modules/5.4.13-201.fc31.x86_64/extra/zfs.ko.xz
version:        0.8.3-1
license:        CDDL
author:         OpenZFS on Linux
description:    ZFS
alias:          devname:zfs
alias:          char-major-10-249
srcversion:     C6177AA5049CC30B672B1CA
depends:        zlua,spl,znvpair,zcommon,icp,zunicode,zavl
retpoline:      Y
name:           zfs
vermagic:       5.4.13-201.fc31.x86_64 SMP mod_unload 
.......

This has something to with the **weak-modules script of dkms f**or info below with the fix is show below:
https://access.redhat.com/solutions/3536351

@johnnyjacq16
Copy link

johnnyjacq16 commented Feb 6, 2020

The issue
pwd
/lib/modules/5.4.14-200.fc31.x86_64/weak-updates
ls -l
total 8
lrwxrwxrwx. 1 root root 51 Feb 6 15:27 icp.ko.xz -> /lib/modules/5.4.17-200.fc31.x86_64/extra/icp.ko.xz
lrwxrwxrwx. 1 root root 51 Feb 6 15:27 spl.ko.xz -> /lib/modules/5.4.17-200.fc31.x86_64/extra/spl.ko.xz
lrwxrwxrwx. 1 root root 52 Feb 6 15:27 zavl.ko.xz -> /lib/modules/5.4.17-200.fc31.x86_64/extra/zavl.ko.xz
lrwxrwxrwx. 1 root root 55 Feb 6 15:27 zcommon.ko.xz -> /lib/modules/5.4.17-200.fc31.x86_64/extra/zcommon.ko.xz
lrwxrwxrwx. 1 root root 51 Feb 6 15:27 zfs.ko.xz -> /lib/modules/5.4.17-200.fc31.x86_64/extra/zfs.ko.xz
lrwxrwxrwx. 1 root root 52 Feb 6 15:27 zlua.ko.xz -> /lib/modules/5.4.17-200.fc31.x86_64/extra/zlua.ko.xz
lrwxrwxrwx. 1 root root 55 Feb 6 15:27 znvpair.ko.xz -> /lib/modules/5.4.17-200.fc31.x86_64/extra/znvpair.ko.xz
lrwxrwxrwx. 1 root root 56 Feb 6 15:27 zunicode.ko.xz -> /lib/modules/5.4.17-200.fc31.x86_64/extra/zunicode.ko.xz

@johnnyjacq16
Copy link

Temporary fix:
rm -vf *
removed 'icp.ko.xz'
removed 'spl.ko.xz'
removed 'zavl.ko.xz'
removed 'zcommon.ko.xz'
removed 'zfs.ko.xz'
removed 'zlua.ko.xz'
removed 'znvpair.ko.xz'
removed 'zunicode.ko.xz'

ln -s /lib/modules/5.4.14-200.fc31.x86_64/extra/icp.ko.xz icp.ko.xz 
ln -s /lib/modules/5.4.14-200.fc31.x86_64/extra/spl.ko.xz spl.ko.xz           
ln -s /lib/modules/5.4.14-200.fc31.x86_64/extra/zavl.ko.xz zavl.ko.xz          
ln -s /lib/modules/5.4.14-200.fc31.x86_64/extra/zcommon.ko.xz zcommon.ko.xz         
ln -s /lib/modules/5.4.14-200.fc31.x86_64/extra/zfs.ko.xz zfs.ko.xz                   
ln -s /lib/modules/5.4.14-200.fc31.x86_64/extra/zlua.ko.xz zlua.ko.xz          
ln -s /lib/modules/5.4.14-200.fc31.x86_64/extra/znvpair.ko.xz znvpair.ko.xz         
ln -s /lib/modules/5.4.14-200.fc31.x86_64/extra/zunicode.ko.xz zunicode.ko.xz 

ls -l
total 8
lrwxrwxrwx. 1 root root 51 Feb 6 23:48 icp.ko.xz -> /lib/modules/5.4.14-200.fc31.x86_64/extra/icp.ko.xz
lrwxrwxrwx. 1 root root 51 Feb 6 23:48 spl.ko.xz -> /lib/modules/5.4.14-200.fc31.x86_64/extra/spl.ko.xz
lrwxrwxrwx. 1 root root 52 Feb 6 23:48 zavl.ko.xz -> /lib/modules/5.4.14-200.fc31.x86_64/extra/zavl.ko.xz
lrwxrwxrwx. 1 root root 55 Feb 6 23:48 zcommon.ko.xz -> /lib/modules/5.4.14-200.fc31.x86_64/extra/zcommon.ko.xz
lrwxrwxrwx. 1 root root 51 Feb 6 23:48 zfs.ko.xz -> /lib/modules/5.4.14-200.fc31.x86_64/extra/zfs.ko.xz
lrwxrwxrwx. 1 root root 52 Feb 6 23:49 zlua.ko.xz -> /lib/modules/5.4.14-200.fc31.x86_64/extra/zlua.ko.xz
lrwxrwxrwx. 1 root root 55 Feb 6 23:49 znvpair.ko.xz -> /lib/modules/5.4.14-200.fc31.x86_64/extra/znvpair.ko.xz
lrwxrwxrwx. 1 root root 56 Feb 6 23:49 zunicode.ko.xz -> /lib/modules/5.4.14-200.fc31.x86_64/extra/zunicode.ko.xz

modinfo spl
filename: /lib/modules/5.4.14-200.fc31.x86_64/weak-updates/spl.ko.xz
version: 0.8.3-1
license: GPL
author: OpenZFS on Linux
description: Solaris Porting Layer
srcversion: 09B6CA3EE9E5F5E76F82D5C
depends:
retpoline: Y
name: spl
vermagic: 5.4.14-200.fc31.x86_64 SMP mod_unload
parm: spl_taskq_thread_bind:Bind taskq thread to CPU by default (int)
parm: spl_taskq_thread_dynamic:Allow dynamic taskq threads (int)
parm: spl_taskq_thread_priority:Allow non-default priority for taskq threads (int)
parm: spl_taskq_thread_sequential:Create new taskq threads after N sequential tasks (int)
parm: spl_taskq_kick:Write nonzero to kick stuck taskqs to spawn more threads
parm: spl_max_show_tasks:Max number of tasks shown in taskq proc (uint)
parm: spl_kmem_cache_expire:By age (0x1) or low memory (0x2) (uint)
parm: spl_kmem_cache_magazine_size:Default magazine size (2-256), set automatically (0) (uint)
parm: spl_kmem_cache_reclaim:Single reclaim pass (0x1) (uint)
parm: spl_kmem_cache_obj_per_slab:Number of objects per slab (uint)
parm: spl_kmem_cache_obj_per_slab_min:Minimal number of objects per slab (uint)
parm: spl_kmem_cache_max_size:Maximum size of slab in MB (uint)
parm: spl_kmem_cache_slab_limit:Objects less than N bytes use the Linux slab (uint)
parm: spl_kmem_cache_kmem_limit:Objects less than N bytes use the kmalloc (uint)
parm: spl_kmem_cache_kmem_threads:Number of spl_kmem_cache threads (uint)
parm: spl_kmem_alloc_warn:Warning threshold in bytes for a kmem_alloc() (uint)
parm: spl_kmem_alloc_max:Maximum size in bytes for a kmem_alloc() (uint)
parm: spl_hostid:The system hostid. (ulong)
parm: spl_hostid_path:The system hostid file (/etc/hostid) (charp)
parm: spl_panic_halt:Cause kernel panic on assertion failures (uint)
parm: spl_schedule_hrtimeout_slack_us:schedule_hrtimeout_range() delta/slack value in us, default(0)

@danstiner
Copy link

danstiner commented Feb 9, 2020

I've seen this happen for every new zfs version on Fedora. The quick fix for me has been:

sudo rm /var/lib/dkms/zfs/0.8.2/source
sudo dkms autoinstall
sudo reboot

I believe what happens is the upgrade leaves behind that source symlink to /usr/src/zfs-0.8.2 which has been deleted as part of the upgrade. This messes up dkms and it is unable to do anything. I'm guessing the fix is to delete that old symlink as part of a post install script but I'm not that familiar with rpm packaging. Some evidence towards this idea:

$ sudo dkms autoinstall
Error! Could not locate dkms.conf file.
File: /var/lib/dkms/zfs/0.8.2/source/dkms.conf does not exist.

$ ls -lah /var/lib/dkms/zfs/0.8.2/source
lrwxrwxrwx. 1 root root 18 Nov 11 11:46 /var/lib/dkms/zfs/0.8.2/source -> /usr/src/zfs-0.8.2

@luckied
Copy link

luckied commented Feb 15, 2020

shouldn't this issue be opened in fedora since it's the dkms script causing the issue, not openzfs per se?

@jochendemuth
Copy link
Author

@luckied No, this is a problem in the deployment logic (the use of dkms) not with dkms itself.
Arguably, dkms could handle the issue more gracefully, but it seems appropriate to update/correct the use of dkms here.

Although the rpm package scripts call dkms to build and deploy zfs kernel modules at installation time, they don't (or incorrectly) call dkms at rpm removal time.

This issue is slightly complicated with the overlap of a different issue: a new zfs kernel module version could possibly be incompatible with an older installed kernel version. So, while zfs-version-x works with kernel-version-a, but not kernel-version-a+1, zfs-version-y may work with kernel-version-a+1, but not kernel-version-a. In such a case the removal of older kernel modules would result in older kernel versions no longer work with zfs.
I am not sure if that ever happened or can reasonably expected to happen, but a computer system would require both the zfs-version-x and zfs-version-y installed at the same time.
For the sake of expedient resolution of this very real upgrade problem I would recommend setting this possible complication aside.

@mhjacks
Copy link

mhjacks commented Feb 24, 2020

I was recently added as maintainer of dkms in Fedora, and just pushed an update for f31 (other versions are still in the pipeline) - I don't think that update will do anything to help this, though. I was just ponted here; I'll read up on this and hopefully comment more later.

@gregory-lee-bartholomew
Copy link
Contributor

gregory-lee-bartholomew commented Feb 24, 2020

I too have seen this. As others have pointed out "dkms status" will show incomplete output if there is a problem

I've worked around the problem simply by deleting the old directory from /var/lib/dkms/zfs (e.g. rm -rf /var/lib/dkms/zfs/0.8.1) and then re-running the dkms install command (e.g. dkms install -m zfs -v 0.8.2 -k 5.3.7-200.fc30.x86_64) manually when necessary.

My guess is that dkms just needs to delete /var/lib/dkms/zfs/<old-version> when it detects that a new version of zfs is being used.

You might want to avoid running "dkms remove" if that command also deletes the previously compiled zfs module from /lib/modules/<old-kernel-version>.

I think it would be best to leave the old version of the compiled module around as a fall-back. The previous module should also be in the old initramfs, but I still think it should be left in /lib/modules/<old-kernel-version> in case someone wants to rebuild their old initramfs for whatever reason.

Uninstalling the old kernel though should, of course, remove the old zfs driver from /lib/modules/<old-kernel-version>.

My two cents.

@chenxiaolong
Copy link

I ran into this again during the 0.8.4 upgrade on my Fedora 32 box. I've decided to stop using dkms and just modified this repo's scripts a bit to build a proper akmod package: https://gist.github.com/chenxiaolong/d7f8321a5d45e4770d1dd31e7bc6ce66

Fedora's akmod system isn't perfect either, but I've had much better luck with it cleaning things up properly (due to each kernel's zfs modules owned by a separate kmod-zfs-$(uname -r) package).


In case anyone else wants to try this, first clone the repo and apply the patch on top of 0.8.4:

git clone https://github.com/openzfs/zfs
cd zfs
git checkout zfs-0.8.4
curl -L -O https://gist.github.com/chenxiaolong/d7f8321a5d45e4770d1dd31e7bc6ce66/raw/724a53267547ebdb2fc9f2c6bcb2e7bab9aba088/0001-rpm-Add-support-for-building-an-akmods-package.patch
git am 0001-rpm-Add-support-for-building-an-akmods-package.patch

Then build the RPMs:

./autogen.sh
./configure
make rpm-kmod

And install them:

sudo dnf install ./akmod-zfs-0.8.4-1.fc32.x86_64.rpm ./zfs-kmod-common-0.8.4-1.fc32.noarch.rpm

For the non-kernel-module RPMs, I'm just using the packages from the official repo.

@gregory-lee-bartholomew
Copy link
Contributor

It sounds like a good idea. It looks like there may be a problem getting it to automatically include the updated kernel module in the initramfs on upgrade though because, at least on my fedora 31 system, akmodsposttrans.install is run after dracut.install:

$ ls -1 /usr/lib/kernel/install.d
00-entry-directory.install
10-devicetree.install
20-grubby.install
50-depmod.install
50-dracut.install
51-dracut-rescue.install
90-loaderentry.install
95-akmodsposttrans.install

@chenxiaolong
Copy link

Hmm, that's a good point. I hadn't personally run into that because I don't boot from zfs. My initramfs with dkms didn't have the zfs modules either.

@ColMelvin
Copy link
Contributor

The problem stems from this conditional in %preuninstall:

# If we're here then we're doing an uninstall (not upgrade).
CONFIG_H="/var/lib/dkms/%{module}/%{version}/*/*/%{module}_config.h"
SPEC_META_ALIAS="@PACKAGE@-@VERSION@-@RELEASE@"
DKMS_META_ALIAS=`cat $CONFIG_H 2>/dev/null |
    awk -F'"' '/META_ALIAS/ { print $2; exit 0 }'`
if [ "$SPEC_META_ALIAS" = "$DKMS_META_ALIAS" ]; then

The if statement should be true, but ends up being false. When I run the command manually (before upgrading), I get the following:

[root@test ~]$ cat /var/lib/dkms/zfs/0.8.3/*/*/zfs_config.h | awk -F'"' '/META_ALIAS/ { print $2; exit 0 }'

[root@test ~]$

After a slight modification to show all matches, it becomes clear that a bad match is to blame:

[root@test ~]$ cat /var/lib/dkms/zfs/0.8.3/*/*/zfs_config.h | awk -F'"' '/META_ALIAS/ { print $2 }'

zfs-0.8.3-1

zfs-0.8.3-1
[root@test ~]$

I should have a fix shortly. Unfortunately, this fix changes the uninstall script, so it will not be available until you upgrade from the next release of the RPM. Distributions can make this happen sooner by incrementing the Release field and releasing an update to the RPM spec only.

ColMelvin added a commit to ColMelvin/zfs that referenced this issue May 14, 2020
Due to a mismatch between the text and a regex looking for that text,
the `%preuninstall` script would never run the `dkms remove` command
necessary to avoid corrupting the DKMS data configuration.  Increase
regex specificity to avoid this issue.

Closes: openzfs#9891
ColMelvin added a commit to ColMelvin/zfs that referenced this issue May 14, 2020
Due to a mismatch between the text and a regex looking for that text,
the `%preuninstall` script would never run the `dkms remove` command
necessary to avoid corrupting the DKMS data configuration.  Increase
regex specificity to avoid this issue.

Closes: openzfs#9891
Signed-off-by: Chris Lindee <chris.lindee+github@gmail.com>
behlendorf pushed a commit that referenced this issue May 15, 2020
Due to a mismatch between the text and a regex looking for that text,
the `%preuninstall` script would never run the `dkms remove` command
necessary to avoid corrupting the DKMS data configuration.  Increase
regex specificity to avoid this issue.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Lindee <chris.lindee+github@gmail.com>
Closes: #9891
Closes #10327
as-com pushed a commit to as-com/zfs that referenced this issue Jun 20, 2020
Due to a mismatch between the text and a regex looking for that text,
the `%preuninstall` script would never run the `dkms remove` command
necessary to avoid corrupting the DKMS data configuration.  Increase
regex specificity to avoid this issue.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Lindee <chris.lindee+github@gmail.com>
Closes: openzfs#9891
Closes openzfs#10327 
(cherry picked from commit 4d6043f)
tonyhutter added a commit to tonyhutter/zfs that referenced this issue Sep 15, 2020
Due to a mismatch between the text and a regex looking for that text,
the `%preuninstall` script would never run the `dkms remove` command
necessary to avoid corrupting the DKMS data configuration.  Increase
regex specificity to avoid this issue.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Lindee <chris.lindee+github@gmail.com>
Closes: openzfs#9891
Closes openzfs#10327
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Sep 15, 2020
Due to a mismatch between the text and a regex looking for that text,
the `%preuninstall` script would never run the `dkms remove` command
necessary to avoid corrupting the DKMS data configuration.  Increase
regex specificity to avoid this issue.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Lindee <chris.lindee+github@gmail.com>
Closes: openzfs#9891
Closes openzfs#10327
@anthr76
Copy link

anthr76 commented Oct 16, 2020

This issue continues to happen. Has this been resolved for anyone?

@sparksh
Copy link

sparksh commented Oct 16, 2020

The problem still occurs. I ran "dnf update" on a Fedora system with kernel-5.8.13 that brought in zfs-0.8.5 only. That appeared to work: I didn't check, but I was able to reboot the system. (I was getting complaisant.) A subsequent update brought in kernel-5.8.14: That's when the zfs modules failed to build. The mess was exactly as described above and I've seen it on all previous zfs updates.

I recovered by deleting the old zfs artifacts from the dkms tree by hand. Then I ran dkms specifying the new kernel and zfs versions.

I like to delete all the weak-updates symbolic links as well. It appears that weak modules are an attempt to make new zfs modules work with old kernels. I don't understand how this idea could ever be expected to work.

I like the suggestion by Gregory that would allow multiple versions of zfs to be installed and present in the dkms tree as long as the old kernels are around that use them. When an old kernel is removed, the dkms tree should be maintained in parallel. The old version of zfs should be deleted when it has no dependants.

@yougotborked
Copy link

I just ran into this issue again on Fedora upgrading to 5.8.18. It would be great to get a fix.

@gregory-lee-bartholomew
Copy link
Contributor

I like to delete all the weak-updates symbolic links as well. It appears that weak modules are an attempt to make new zfs modules work with old kernels. I don't understand how this idea could ever be expected to work.

Indeed, these weak modules seem to be causing a lot of the problems people are reporting on Red Hat based systems. According to the dkms man page, it looks like adding NO_WEAK_MODULES="yes" to dkms.conf ought to fix the problem.

@behlendorf
Copy link
Contributor

Adding NO_WEAK_MODULES="yes" does sound like the right thing to do. If someone can verify this works as intended and open a PR though would be great.

@gregory-lee-bartholomew
Copy link
Contributor

Adding NO_WEAK_MODULES="yes" does sound like the right thing to do. If someone can verify this works as intended and open a PR though would be great.

I would love to, but I don't currently have a system that is exhibiting the problem and I don't know how to recreate it.

@anthr76
Copy link

anthr76 commented Dec 13, 2020

I’m currently experiencing this issue and would try this aforementioned fix. I’m currently on mobile. Once I log in I’ll see what I can do to test.

behlendorf pushed a commit that referenced this issue Dec 15, 2020
Fedora does not guarantee a stable kABI, so weak modules should be dis-
abled. See the dkms man page for a more detailed explanation of the weak
module feature.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Closes #9891
Closes #11128
Closes #11242
Closes #11335
behlendorf pushed a commit that referenced this issue Dec 23, 2020
Fedora does not guarantee a stable kABI, so weak modules should be dis-
abled. See the dkms man page for a more detailed explanation of the weak
module feature.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Closes #9891
Closes #11128
Closes #11242
Closes #11335
@torriem
Copy link

torriem commented Jan 15, 2021

This is still a problem! Just happened to me again on Fedora 32 when I updated zfs from the repos today!

@gregory-lee-bartholomew
Copy link
Contributor

Hi @torriem:

Can you verify that it is the same problem by posting the output from:

$ sudo lsinitrd /boot/<path-to-non-working-initramfs> | grep zfs.ko

If the problem is with weak modules, the wrong kernel version will be listed in the output of the above command. See #11242 (comment) for an example.

Thanks.

@torriem
Copy link

torriem commented Jan 18, 2021

Yes that's correct. That's exactly what I saw. The latest version of the ZFS modules from the currently running kernel got symlinked into all the other kernels' initrds with weak-updates folder. I can't show you that now because I had dkms rebuild the current version of zfs for the older kernels I had installed still, and then I rebuilt the initrds with dracut. Which is what I have to do every time zfs gets updated.

@torriem
Copy link

torriem commented Jan 18, 2021

Oh I'm in luck I found it in the terminal history.

lsinitrd /boot/nitramfs-5.8.15-201.fc32.x86_64.img | grep zfs.ko shows:

lrwxrwxrwx   1 root     root           65 May 29  2020 usr/lib/modules/5.8.15-201.fc32.x86_64/weak-updates/zfs.ko.xz -> ../../../../../lib/modules/5.9.11-100.fc32.x86_64/extra/zfs.ko.xz
-rw-r--r--   1 root     root       859780 May 29  2020 usr/lib/modules/5.9.11-100.fc32.x86_64/extra/zfs.ko.xz

@gregory-lee-bartholomew
Copy link
Contributor

Oh I'm in luck I found it in the terminal history.

lsinitrd /boot/nitramfs-5.8.15-201.fc32.x86_64.img | grep zfs.ko shows ...

That looks like an old initramfs. The old initramfses are not touched during a kernel upgrade, only newly installed kernels will automatically trigger the generation a new initramfs containing the updated driver. There is also a possibility that it will take 2 kernel updates before the problem goes away because, as gogi983 pointed out in #11242 (comment), it's actually the previous kernel's driver that dkms picks up as a "weak module" and tries to incorporate into the kernel currently being installed. I didn't think that would occur, however, because I thought dkms would rebuild the driver for all of the currently-installed kernels when a new version of zfs came out.

One other thing that I've always done (but which should not be necessary) is to cancel the installation (by responding with "n" when dnf prompts to continue) when I see both kernel installations and zfs updates being attempted in the same transaction. I've always preferred to split those into two stages -- by first running "dnf update --exclude=kernel*" and then after zfs has updated successfully I run "dnf update" again (this time without excluding the kernel) to get the remaining kernel updates.

Please give the mechanism one more chance to work. It may just need a few tries to get the older, corrupt entries worked out of the system. Also, be sure that /var/lib/dkms/zfs doesn't contain any old links (pointing to no-longer-installed kernels) or any directories for old versions of zfs that are no longer installed.

@torriem
Copy link

torriem commented Jan 18, 2021

No it definitely touched the old initramfs files. Otherwise there'd have been no issue. It wasn't the kernel update that did this. It was the zfs-dkms update. Also this isn't just a matter of the initramfs. dkms also put the weak-update symlinks in /lib/modules for each installed kernel, which is probably where dracut got it from when it updated all the initramfs files.

In my experience every update to zfs-dkms causes dracut to regenerate all initramfs for all kernels installed. So the weak-update problem ends up in every initramfs except the one for the running kernel.

So I'm thinking this is still a dkms issue.

Oh and to your suggestion, I always do the same thing. I never update kernel and zfs at the same time. My issues were from the zfs update only, not from a kernel update. There were no older corrupt weak-updates in /lib or in my existing initramfs files either. I had cleaned all that out the last time a zfs-dkms update hosed everything.

@gregory-lee-bartholomew
Copy link
Contributor

In my experience every update to zfs-dkms causes dracut to regenerate all initramfs for all kernels installed. So the weak-update problem ends up in every initramfs except the one for the running kernel.

That's interesting. My system does not do that:

[/root]# find /boot/ -name initrd -printf "%p was created on %c\n"
/boot/8c76d196c474411a85814e376f2c30c4/5.9.12-200.fc33.x86_64/initrd was created on Thu Dec 10 22:02:29.0200000000 2020
/boot/8c76d196c474411a85814e376f2c30c4/5.10.7-200.fc33.x86_64/initrd was created on Sat Jan 16 00:25:04.6600000000 2021
/boot/8c76d196c474411a85814e376f2c30c4/5.9.13-200.fc33.x86_64/initrd was created on Sat Dec 12 12:57:34.9700000000 2020
/boot/8c76d196c474411a85814e376f2c30c4/5.9.14-200.fc33.x86_64/initrd was created on Thu Dec 17 18:36:34.1200000000 2020
/boot/8c76d196c474411a85814e376f2c30c4/5.9.15-200.fc33.x86_64/initrd was created on Thu Dec 24 01:02:42.4800000000 2020
/boot/8c76d196c474411a85814e376f2c30c4/5.9.16-200.fc33.x86_64/initrd was created on Mon Jan  4 19:20:36.8800000000 2021

So I guess the question is why is your system behaving differently?

@torriem
Copy link

torriem commented Jan 18, 2021

Strange. Not sure where the issue is but it's definitely still happening. I'm pretty sure the cause is dkms, and it's only manifest when a package like zfs is updated inside of dkms. It does not happen (does not rebuild initramfs) if there's no version update within dkms, if that makes any sense. That's why I have to manually run dracut in my recovery process below.

My current method for dealing with this is after ever zfs-dkms update is to:

  1. verify initramfs files with lsinitrd, look for weak-updates reappearing.
  2. remove all weak-update symlinks in /lib/modules for all kernels
  3. dkms install -m zfs -v current_zfs_ver kernel_ver for all installed kernels
  4. use dracut to force a regeneration of initramfs files.

I'll be sure to update here the next time the problem reappears.

I note that your system is fedora 33, and mine is fedora 32. Maybe the problem no longer exists on 33.

@gregory-lee-bartholomew
Copy link
Contributor

I note that your system is fedora 33, and mine is fedora 32. Maybe the problem no longer exists on 33.

Let's hope so. 🙂

The old initramfses really shouldn't be touched when zfs is updated. Essentially, they are supposed to be known-good recovery options in case anything goes wrong with the latest update. That is, if anything goes wrong with the latest kernel+zfs combination (e.g. zfs isn't compatible with the latest kernel), the old previously-working kernel+initramfs should still be available for use/recovery. If all the initramfses are updated at once, one could easily end up in a situation where the system is completely unbootable with no "fallback" options.

If you can find where in your system the old initramfses are being updated, please let us know. That is not supposed to happen.

jsai20 pushed a commit to jsai20/zfs that referenced this issue Mar 30, 2021
Due to a mismatch between the text and a regex looking for that text,
the `%preuninstall` script would never run the `dkms remove` command
necessary to avoid corrupting the DKMS data configuration.  Increase
regex specificity to avoid this issue.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Lindee <chris.lindee+github@gmail.com>
Closes: openzfs#9891
Closes openzfs#10327
jsai20 pushed a commit to jsai20/zfs that referenced this issue Mar 30, 2021
Fedora does not guarantee a stable kABI, so weak modules should be dis-
abled. See the dkms man page for a more detailed explanation of the weak
module feature.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Closes openzfs#9891
Closes openzfs#11128
Closes openzfs#11242
Closes openzfs#11335
sempervictus pushed a commit to sempervictus/zfs that referenced this issue May 31, 2021
Fedora does not guarantee a stable kABI, so weak modules should be dis-
abled. See the dkms man page for a more detailed explanation of the weak
module feature.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
Closes openzfs#9891
Closes openzfs#11128
Closes openzfs#11242
Closes openzfs#11335
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Packaging custom packages
Projects
None yet
Development

Successfully merging a pull request may close this issue.