-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade of zfs to new version corrupts dkms kernel module configuration in Fedora #9891
Comments
Same thing happened to me. My dkms configuration was corrupted during the zfs update, which in turn broke my nvidia drivers, which then took out my plasma desktop. Took hours to recover. Thanks for your help above, I was able to manually work through the issues and get up and running again. |
A temporary work around to this problem, is to remove zfs completly with This issues is if you have more than one kernel installed, first you will need to check that the zfs modules has been built for each installed kernel and not just the running kernel, to do this run
When you have more than one kernel installed the issue is that there is a format exec error on zfs which is cause by a mismatch of the zfs module and zfs dependency module version built for each kernel. The mismatch:
As show above the vermagic is different from the kernel version, similar for other kernels as well except the actual kernel the zfs module was built against.
|
The issue |
Temporary fix:
ls -l
|
I've seen this happen for every new zfs version on Fedora. The quick fix for me has been: sudo rm /var/lib/dkms/zfs/0.8.2/source
sudo dkms autoinstall
sudo reboot I believe what happens is the upgrade leaves behind that $ sudo dkms autoinstall
Error! Could not locate dkms.conf file.
File: /var/lib/dkms/zfs/0.8.2/source/dkms.conf does not exist.
$ ls -lah /var/lib/dkms/zfs/0.8.2/source
lrwxrwxrwx. 1 root root 18 Nov 11 11:46 /var/lib/dkms/zfs/0.8.2/source -> /usr/src/zfs-0.8.2 |
shouldn't this issue be opened in fedora since it's the dkms script causing the issue, not openzfs per se? |
@luckied No, this is a problem in the deployment logic (the use of dkms) not with dkms itself. Although the rpm package scripts call dkms to build and deploy zfs kernel modules at installation time, they don't (or incorrectly) call dkms at rpm removal time. This issue is slightly complicated with the overlap of a different issue: a new zfs kernel module version could possibly be incompatible with an older installed kernel version. So, while zfs-version-x works with kernel-version-a, but not kernel-version-a+1, zfs-version-y may work with kernel-version-a+1, but not kernel-version-a. In such a case the removal of older kernel modules would result in older kernel versions no longer work with zfs. |
I was recently added as maintainer of dkms in Fedora, and just pushed an update for f31 (other versions are still in the pipeline) - I don't think that update will do anything to help this, though. I was just ponted here; I'll read up on this and hopefully comment more later. |
I too have seen this. As others have pointed out "dkms status" will show incomplete output if there is a problem I've worked around the problem simply by deleting the old directory from /var/lib/dkms/zfs (e.g. rm -rf /var/lib/dkms/zfs/0.8.1) and then re-running the dkms install command (e.g. dkms install -m zfs -v 0.8.2 -k 5.3.7-200.fc30.x86_64) manually when necessary. My guess is that dkms just needs to delete /var/lib/dkms/zfs/<old-version> when it detects that a new version of zfs is being used. You might want to avoid running "dkms remove" if that command also deletes the previously compiled zfs module from /lib/modules/<old-kernel-version>. I think it would be best to leave the old version of the compiled module around as a fall-back. The previous module should also be in the old initramfs, but I still think it should be left in /lib/modules/<old-kernel-version> in case someone wants to rebuild their old initramfs for whatever reason. Uninstalling the old kernel though should, of course, remove the old zfs driver from /lib/modules/<old-kernel-version>. My two cents. |
I ran into this again during the 0.8.4 upgrade on my Fedora 32 box. I've decided to stop using dkms and just modified this repo's scripts a bit to build a proper akmod package: https://gist.github.com/chenxiaolong/d7f8321a5d45e4770d1dd31e7bc6ce66 Fedora's akmod system isn't perfect either, but I've had much better luck with it cleaning things up properly (due to each kernel's zfs modules owned by a separate In case anyone else wants to try this, first clone the repo and apply the patch on top of 0.8.4: git clone https://github.com/openzfs/zfs
cd zfs
git checkout zfs-0.8.4
curl -L -O https://gist.github.com/chenxiaolong/d7f8321a5d45e4770d1dd31e7bc6ce66/raw/724a53267547ebdb2fc9f2c6bcb2e7bab9aba088/0001-rpm-Add-support-for-building-an-akmods-package.patch
git am 0001-rpm-Add-support-for-building-an-akmods-package.patch Then build the RPMs: ./autogen.sh
./configure
make rpm-kmod And install them: sudo dnf install ./akmod-zfs-0.8.4-1.fc32.x86_64.rpm ./zfs-kmod-common-0.8.4-1.fc32.noarch.rpm For the non-kernel-module RPMs, I'm just using the packages from the official repo. |
It sounds like a good idea. It looks like there may be a problem getting it to automatically include the updated kernel module in the initramfs on upgrade though because, at least on my fedora 31 system, akmodsposttrans.install is run after dracut.install:
|
Hmm, that's a good point. I hadn't personally run into that because I don't boot from zfs. My initramfs with dkms didn't have the zfs modules either. |
The problem stems from this conditional in
The [root@test ~]$ cat /var/lib/dkms/zfs/0.8.3/*/*/zfs_config.h | awk -F'"' '/META_ALIAS/ { print $2; exit 0 }'
[root@test ~]$ After a slight modification to show all matches, it becomes clear that a bad match is to blame: [root@test ~]$ cat /var/lib/dkms/zfs/0.8.3/*/*/zfs_config.h | awk -F'"' '/META_ALIAS/ { print $2 }'
zfs-0.8.3-1
zfs-0.8.3-1
[root@test ~]$ I should have a fix shortly. Unfortunately, this fix changes the uninstall script, so it will not be available until you upgrade from the next release of the RPM. Distributions can make this happen sooner by incrementing the |
Due to a mismatch between the text and a regex looking for that text, the `%preuninstall` script would never run the `dkms remove` command necessary to avoid corrupting the DKMS data configuration. Increase regex specificity to avoid this issue. Closes: openzfs#9891
Due to a mismatch between the text and a regex looking for that text, the `%preuninstall` script would never run the `dkms remove` command necessary to avoid corrupting the DKMS data configuration. Increase regex specificity to avoid this issue. Closes: openzfs#9891 Signed-off-by: Chris Lindee <chris.lindee+github@gmail.com>
Due to a mismatch between the text and a regex looking for that text, the `%preuninstall` script would never run the `dkms remove` command necessary to avoid corrupting the DKMS data configuration. Increase regex specificity to avoid this issue. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Lindee <chris.lindee+github@gmail.com> Closes: #9891 Closes #10327
Due to a mismatch between the text and a regex looking for that text, the `%preuninstall` script would never run the `dkms remove` command necessary to avoid corrupting the DKMS data configuration. Increase regex specificity to avoid this issue. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Lindee <chris.lindee+github@gmail.com> Closes: openzfs#9891 Closes openzfs#10327 (cherry picked from commit 4d6043f)
Due to a mismatch between the text and a regex looking for that text, the `%preuninstall` script would never run the `dkms remove` command necessary to avoid corrupting the DKMS data configuration. Increase regex specificity to avoid this issue. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Lindee <chris.lindee+github@gmail.com> Closes: openzfs#9891 Closes openzfs#10327
Due to a mismatch between the text and a regex looking for that text, the `%preuninstall` script would never run the `dkms remove` command necessary to avoid corrupting the DKMS data configuration. Increase regex specificity to avoid this issue. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Lindee <chris.lindee+github@gmail.com> Closes: openzfs#9891 Closes openzfs#10327
This issue continues to happen. Has this been resolved for anyone? |
The problem still occurs. I ran "dnf update" on a Fedora system with kernel-5.8.13 that brought in zfs-0.8.5 only. That appeared to work: I didn't check, but I was able to reboot the system. (I was getting complaisant.) A subsequent update brought in kernel-5.8.14: That's when the zfs modules failed to build. The mess was exactly as described above and I've seen it on all previous zfs updates. I recovered by deleting the old zfs artifacts from the dkms tree by hand. Then I ran dkms specifying the new kernel and zfs versions. I like to delete all the weak-updates symbolic links as well. It appears that weak modules are an attempt to make new zfs modules work with old kernels. I don't understand how this idea could ever be expected to work. I like the suggestion by Gregory that would allow multiple versions of zfs to be installed and present in the dkms tree as long as the old kernels are around that use them. When an old kernel is removed, the dkms tree should be maintained in parallel. The old version of zfs should be deleted when it has no dependants. |
I just ran into this issue again on Fedora upgrading to 5.8.18. It would be great to get a fix. |
Indeed, these weak modules seem to be causing a lot of the problems people are reporting on Red Hat based systems. According to the dkms man page, it looks like adding |
Adding |
I would love to, but I don't currently have a system that is exhibiting the problem and I don't know how to recreate it. |
I’m currently experiencing this issue and would try this aforementioned fix. I’m currently on mobile. Once I log in I’ll see what I can do to test. |
Fedora does not guarantee a stable kABI, so weak modules should be dis- abled. See the dkms man page for a more detailed explanation of the weak module feature. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com> Closes #9891 Closes #11128 Closes #11242 Closes #11335
Fedora does not guarantee a stable kABI, so weak modules should be dis- abled. See the dkms man page for a more detailed explanation of the weak module feature. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com> Closes #9891 Closes #11128 Closes #11242 Closes #11335
This is still a problem! Just happened to me again on Fedora 32 when I updated zfs from the repos today! |
Hi @torriem: Can you verify that it is the same problem by posting the output from:
If the problem is with weak modules, the wrong kernel version will be listed in the output of the above command. See #11242 (comment) for an example. Thanks. |
Yes that's correct. That's exactly what I saw. The latest version of the ZFS modules from the currently running kernel got symlinked into all the other kernels' initrds with weak-updates folder. I can't show you that now because I had dkms rebuild the current version of zfs for the older kernels I had installed still, and then I rebuilt the initrds with dracut. Which is what I have to do every time zfs gets updated. |
Oh I'm in luck I found it in the terminal history. lsinitrd /boot/nitramfs-5.8.15-201.fc32.x86_64.img | grep zfs.ko shows:
|
That looks like an old initramfs. The old initramfses are not touched during a kernel upgrade, only newly installed kernels will automatically trigger the generation a new initramfs containing the updated driver. There is also a possibility that it will take 2 kernel updates before the problem goes away because, as gogi983 pointed out in #11242 (comment), it's actually the previous kernel's driver that dkms picks up as a "weak module" and tries to incorporate into the kernel currently being installed. I didn't think that would occur, however, because I thought dkms would rebuild the driver for all of the currently-installed kernels when a new version of zfs came out. One other thing that I've always done (but which should not be necessary) is to cancel the installation (by responding with "n" when dnf prompts to continue) when I see both kernel installations and zfs updates being attempted in the same transaction. I've always preferred to split those into two stages -- by first running "dnf update --exclude=kernel*" and then after zfs has updated successfully I run "dnf update" again (this time without excluding the kernel) to get the remaining kernel updates. Please give the mechanism one more chance to work. It may just need a few tries to get the older, corrupt entries worked out of the system. Also, be sure that /var/lib/dkms/zfs doesn't contain any old links (pointing to no-longer-installed kernels) or any directories for old versions of zfs that are no longer installed. |
No it definitely touched the old initramfs files. Otherwise there'd have been no issue. It wasn't the kernel update that did this. It was the zfs-dkms update. Also this isn't just a matter of the initramfs. dkms also put the weak-update symlinks in /lib/modules for each installed kernel, which is probably where dracut got it from when it updated all the initramfs files. In my experience every update to zfs-dkms causes dracut to regenerate all initramfs for all kernels installed. So the weak-update problem ends up in every initramfs except the one for the running kernel. So I'm thinking this is still a dkms issue. Oh and to your suggestion, I always do the same thing. I never update kernel and zfs at the same time. My issues were from the zfs update only, not from a kernel update. There were no older corrupt weak-updates in /lib or in my existing initramfs files either. I had cleaned all that out the last time a zfs-dkms update hosed everything. |
That's interesting. My system does not do that:
So I guess the question is why is your system behaving differently? |
Strange. Not sure where the issue is but it's definitely still happening. I'm pretty sure the cause is dkms, and it's only manifest when a package like zfs is updated inside of dkms. It does not happen (does not rebuild initramfs) if there's no version update within dkms, if that makes any sense. That's why I have to manually run dracut in my recovery process below. My current method for dealing with this is after ever zfs-dkms update is to:
I'll be sure to update here the next time the problem reappears. I note that your system is fedora 33, and mine is fedora 32. Maybe the problem no longer exists on 33. |
Let's hope so. 🙂 The old initramfses really shouldn't be touched when zfs is updated. Essentially, they are supposed to be known-good recovery options in case anything goes wrong with the latest update. That is, if anything goes wrong with the latest kernel+zfs combination (e.g. zfs isn't compatible with the latest kernel), the old previously-working kernel+initramfs should still be available for use/recovery. If all the initramfses are updated at once, one could easily end up in a situation where the system is completely unbootable with no "fallback" options. If you can find where in your system the old initramfses are being updated, please let us know. That is not supposed to happen. |
Due to a mismatch between the text and a regex looking for that text, the `%preuninstall` script would never run the `dkms remove` command necessary to avoid corrupting the DKMS data configuration. Increase regex specificity to avoid this issue. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Lindee <chris.lindee+github@gmail.com> Closes: openzfs#9891 Closes openzfs#10327
Fedora does not guarantee a stable kABI, so weak modules should be dis- abled. See the dkms man page for a more detailed explanation of the weak module feature. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com> Closes openzfs#9891 Closes openzfs#11128 Closes openzfs#11242 Closes openzfs#11335
Fedora does not guarantee a stable kABI, so weak modules should be dis- abled. See the dkms man page for a more detailed explanation of the weak module feature. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Gregory Bartholomew <gregory.lee.bartholomew@gmail.com> Closes openzfs#9891 Closes openzfs#11128 Closes openzfs#11242 Closes openzfs#11335
System information
Describe the problem you're observing
With every upgrade of zfs through the official repo on Fedora all zfs devices disappear. Here is the specific series of events that lead to this issue:
$ dkms status
Error! Could not locate dkms.conf file.
File: /var/lib/dkms/zfs/0.8.2/source/dkms.conf does not exist.
$ zpool status
The ZFS modules are not loaded.
Try running '/sbin/modprobe zfs' as root to load them.
$ modprobe zfs
modprobe: FATAL: Module zfs not found in directory /lib/modules/5.4.13-201.fc31.x86_64
Describe how to fix the issue
enhance the removal script of fedora package add a command which removes this zfs version from the dkms configuration, e.g. 'dkms remove zfs/0.8.2' or similar before the source code/folder is removed from hard drive (otherwise the dkms remove command fails because the dkms configuration is already corrupted).
Describe how to reproduce the problem
$ dnf update
Include any warning/errors/backtraces from the system logs
The text was updated successfully, but these errors were encountered: