-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WARNING: COMPLETELY BROKEN WITH KERNEL 5.16.X #13210
Comments
Could you say a bit more about the pool(s) you're using, exact kernel version, and the settings on them? I've got Fedora 35 running 2.1.3 and 5.16.8-200.fc35.x86_64, and quickly flinging data around to and from the pool doesn't catch fire for me. |
@Gibson85 we've had reports of ZFS troubles in USB . Can you redo the tests with disks attached in a SATA bus? I also run ZFS on Fedora 35, both almost always on latest version. ZFS from git master, indeed. |
It is really hard to say whats going on. All applications are crashing randomly. Never seen something like this. Main pool:
External pool:
OS:
The biggest difference between these pools is that the main pool has no enryption, but the external has. I have an additional pool in my PC. But this is my only full backup. So no. I definitely do nothing on it with such an broken system :). |
PS: I see there is an kernel update 5.16.13-200 today. Maybe the kernel was broken. I'll try it out in the evening again with the new kernel (if it is supported by ZFS ;)). |
If you've got any backtraces from the crashes, or if anything turns up in dmesg, that'd be useful as well. |
The problem also appears with kernel 5.16.13-200.fc35.x86_64. Copying data to the external USB HDD fails with verification errors. Maybe it's nothing you can do and the kernel is broken. Over night I scrubbed my main pool. Thanks god it has no errors. Also I did a smarttools long run on the external HDD. It also has no errors. I am not very used to dmesg. But this is all in red/related to ZFS I could find:
I tried to upload several crashes. But at some point the backtrace generation failed always. There are lots of problems with Fedora I see :). |
⬆️ there might be a problem with your drive |
@tonyhutter I connected the drive internally now per SATA. And it shows the same behaviour getting verification errors. So it is not related to USB. |
Does |
@Gibson85 can you please try setting the following module options in an
You can verify they're set properly by checking:
Assuming this resolves the stability issue can you post the output of |
Do you know what. I booted into windows now and formatted the USB drive with NTFS (I hope you feel a bit ashamed treating me to do so :)). Copied a 50 GB file to it without problems. Also a second USB drive works normal. When I booted back to Fedora I had a 20 pixel wide barn with graphic errors over the screen. I had this after the kernel update these days once before.
Shows no errors. @behlendorf |
@Gibson85 Could you also report what |
On 3/14/22 12:53, Gibson85 wrote:
|[ 31.126692] nvidia-gpu 0000:03:00.3: i2c timeout error e0000000 |
|[ 31.126702] ucsi_ccg 0-0008: i2c_transfer failed -110 [ 31.126708]
ucsi_ccg 0-0008: ucsi_ccg_init failed - -110 [ 31.126713] ucsi_ccg:
probe of 0-0008 failed with error -110 ... [ 430.438571]
gnome-terminal-[3854]: segfault at 55cfe9027000 ip 00007facf8ee1abf sp
00007ffd0a1584a0 error 4 in libXi.so.6.1.0[7facf8edf000+b000] [
430.438596] Code: 89 9d a0 00 00 00 85 d2 0f 84 da 02 00 00 8d 72 ff
31 c0 48 c1 e6 03 eb 0d 66 90 48 8b 9d a0 00 00 00 48 83 c0 08 66 0f
ef c0 <f2> 0f 2a 04 07 f2 0f 11 04 03 8b 4c 07 04 66 0f ef c0 48 8b 95 a0 |Message ID: ***@***.***>
In your last kernel update, did you also update nvidia drivers? Try to
rollback then...
|
In fact nothing. No. After the fresh install I installed ZFS first making the system unstable again. After that I installed the NVIDIA driver. So I don't think it is related to the NVIDIA driver. |
It seems I am the only one having these problems. In a few days Fedora 36 with kernel 5.17 should be released I have heard. Hopefully this solves the issues :(. |
@Gibson85 if you haven't already, I'd just double check those module options got set correctly by reading the One other thing you could try to help narrow this down would be to rollback to an older kernel and verify the 2.1.2 release runs without issue. |
Yeah, I'd also suggest making sure that the settings mentioned are really enabled (as explained above). If they are, I'd suggest the following. Since it seems that you are not using root on ZFS and assuming that your /home isn't on ZFS either, you could export all your pools ( |
I had the same problems.
After attempts, reverting zfs-dkms and zfs-utils from 2.1.3 to 2.1.2 resolves all my problems. other info: cpu Intel(R) Celeron(R) CPU J1900 |
This definitely looks like a problem with FPU state handling, which is kinda strange since my change should've only affected AVX/XSAVE capable CPUs. I'll try to have a look later today. @comicchang You're using Linux 5.16, right? Are you using zfs kmod or dkms packages? Any chance you could try @Gibson85 What is the CPU model you encountered this on? |
@AttilaFueloep I'll try archzfs soon later. |
Thanks for the confirmation. Well, using zfs from the archzfs repo or via aur shouldn't make a difference. What I was heading at is the |
If it's of any help, on my Fedora 5.16.13-200.fc35.x86_64/2.1.3 where I seem to be unable to reproduce this, the CPU is a
|
@comicchang Today I removed all unnecessary hardware and with ZFS modules loaded I did the following:
|
The dmesg output looks quite wild today also. Maybe this helps somehow.
|
So I think you got the right direction to fix this issue. But I don't expect a fixed version soon. So can you please say which kernel version 5.16.? is safe/stable for use so long? |
I just upgraded to 2.1.3 and am currently in the process of shuffling block devices around; a full scrub followed by resilvering with each block device swap. So far at least I haven't seen any of these issues. One of the more interesting google results I ran across talks about how the Linux kernel (at least in 2015) disables use of XSAVES since it uses a 'compacted' format; while XSAVE and XSAVEOPT use a standard format. https://lore.kernel.org/lkml/tip-65ac2e9baa7deebe3e9588769d44d85555e05619@git.kernel.org/ Checking my CPU's flags, I see only xsave and xsaveopt. tr '\040' '\n' < /proc/cpuinfo | grep xsave | sort -u While the code here searches for XSAVES and uses that first if it has the option; which might be where your system is tripping up? https://github.com/openzfs/zfs/blob/master/include/os/linux/kernel/linux/simd_x86.h |
I noticed Gibson85 posted one core of their /proc/cpuinfo earlier. They don't have XSAVE at all, but do have FXSR. https://github.com/openzfs/zfs/blob/master/include/os/linux/kernel/linux/simd_x86.h#L382 Am I reading the current code correctly? static inline void
kfpu_end(void)
{
uint8_t *state = zfs_kfpu_fpregs[smp_processor_id()];
#if defined(HAVE_XSAVES)
if (static_cpu_has(X86_FEATURE_XSAVES)) {
kfpu_do_xrstor("xrstors", state, ~0);
goto out;
}
#endif
if (static_cpu_has(X86_FEATURE_XSAVE)) {
kfpu_do_xrstor("xrstor", state, ~0);
} else if (static_cpu_has(X86_FEATURE_FXSR)) {
kfpu_save_fxsr(state);
} else {
kfpu_save_fsave(state);
}
out:
local_irq_enable();
preempt_enable();
} The NON XSAVE paths are trying to SAVE the state again, rather than RESTORE it? E.G. on Line 382 shouldn't this be: kfpu_restore_fxsr ? Line 384 kfpu_restore_fsave ? |
You're reading that correctly, and that is the right fix. Let me open a PR with the needed change. I'm not sure how we missed that when reviewing the change, but thank you for calling it out. |
Commit 3b52ccd introduced a flaw where FSR and FSAVE are not restored when using a Linux 5.16 kernel. These instructions are only used when XSAVE is not supported by the processor meaning only some systems will encounter this issue. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#13210
Commit 3b52ccd introduced a flaw where FSR and FSAVE are not restored when using a Linux 5.16 kernel. These instructions are only used when XSAVE is not supported by the processor meaning only some systems will encounter this issue. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Attila Fülöp <attila@fueloep.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #13210 Closes #13236
@Gibson85 if you stick with the 5.14 kernel for now you should be able to use zfs-2.1.3 without encountering this issue. We'll get this fix applied to 2.1.4 and when it's released you'll be able to update the kernel. |
@Gibson85 Behlendorf also poked the patch which is now in the current git (latest 'unstable') version of ZFS. https://github.com/openzfs/zfs/blob/master/include/os/linux/kernel/linux/simd_x86.h If for some reason you'd rather run with the per-release version your OS distribution might have a package such as zfs-dkms-git or similar that will pull both this fix and any other pending changes in automatically. The git tag zfs-2.1.4-staging status can be found here: #13235 You'll note some of it's integration tests need to be re-run as well, though they also need some shepherding. When that pull request is completed you could also build the zfs-2.1.4 from the source tag. (Though 2.1.3 with the simple 2 lines changed yourself is going to be the fastest and safest solution.) |
Just out of curiosity, is 2.1.4 going to be released soon to deal with this issue, or is it going to be another three months? |
Thanks for the investigation and fix! And my condolences to other folks running under Hyper-V (which seems to lack XSAVE support), I was only mostly wrong to blame it for the corruption. |
This fix for this is included in zfs-2.1.4, which was just released: |
Commit 3b52ccd introduced a flaw where FSR and FSAVE are not restored when using a Linux 5.16 kernel. These instructions are only used when XSAVE is not supported by the processor meaning only some systems will encounter this issue. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Attila Fülöp <attila@fueloep.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#13210 Closes openzfs#13236
Commit 3b52ccd introduced a flaw where FSR and FSAVE are not restored when using a Linux 5.16 kernel. These instructions are only used when XSAVE is not supported by the processor meaning only some systems will encounter this issue. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Attila Fülöp <attila@fueloep.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#13210 Closes openzfs#13236
Commit 3b52ccd introduced a flaw where FSR and FSAVE are not restored when using a Linux 5.16 kernel. These instructions are only used when XSAVE is not supported by the processor meaning only some systems will encounter this issue. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Attila Fülöp <attila@fueloep.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#13210 Closes openzfs#13236
Commit 3b52ccd introduced a flaw where FSR and FSAVE are not restored when using a Linux 5.16 kernel. These instructions are only used when XSAVE is not supported by the processor meaning only some systems will encounter this issue. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Attila Fülöp <attila@fueloep.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#13210 Closes openzfs#13236
Commit 3b52ccd introduced a flaw where FSR and FSAVE are not restored when using a Linux 5.16 kernel. These instructions are only used when XSAVE is not supported by the processor meaning only some systems will encounter this issue. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Attila Fülöp <attila@fueloep.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#13210 Closes openzfs#13236
Commit 3b52ccd introduced a flaw where FSR and FSAVE are not restored when using a Linux 5.16 kernel. These instructions are only used when XSAVE is not supported by the processor meaning only some systems will encounter this issue. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Attila Fülöp <attila@fueloep.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#13210 Closes openzfs#13236
Commit 3b52ccd introduced a flaw where FSR and FSAVE are not restored when using a Linux 5.16 kernel. These instructions are only used when XSAVE is not supported by the processor meaning only some systems will encounter this issue. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Attila Fülöp <attila@fueloep.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#13210 Closes openzfs#13236
Commit 3b52ccd introduced a flaw where FSR and FSAVE are not restored when using a Linux 5.16 kernel. These instructions are only used when XSAVE is not supported by the processor meaning only some systems will encounter this issue. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Attila Fülöp <attila@fueloep.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#13210 Closes openzfs#13236
Commit 3b52ccd introduced a flaw where FSR and FSAVE are not restored when using a Linux 5.16 kernel. These instructions are only used when XSAVE is not supported by the processor meaning only some systems will encounter this issue. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Attila Fülöp <attila@fueloep.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#13210 Closes openzfs#13236
Commit 3b52ccd introduced a flaw where FSR and FSAVE are not restored when using a Linux 5.16 kernel. These instructions are only used when XSAVE is not supported by the processor meaning only some systems will encounter this issue. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Attila Fülöp <attila@fueloep.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#13210 Closes openzfs#13236
Commit 3b52ccd introduced a flaw where FSR and FSAVE are not restored when using a Linux 5.16 kernel. These instructions are only used when XSAVE is not supported by the processor meaning only some systems will encounter this issue. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Attila Fülöp <attila@fueloep.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#13210 Closes openzfs#13236
Commit 3b52ccd introduced a flaw where FSR and FSAVE are not restored when using a Linux 5.16 kernel. These instructions are only used when XSAVE is not supported by the processor meaning only some systems will encounter this issue. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Attila Fülöp <attila@fueloep.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#13210 Closes openzfs#13236
Commit 3b52ccd introduced a flaw where FSR and FSAVE are not restored when using a Linux 5.16 kernel. These instructions are only used when XSAVE is not supported by the processor meaning only some systems will encounter this issue. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Attila Fülöp <attila@fueloep.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#13210 Closes openzfs#13236
Commit 3b52ccd introduced a flaw where FSR and FSAVE are not restored when using a Linux 5.16 kernel. These instructions are only used when XSAVE is not supported by the processor meaning only some systems will encounter this issue. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Attila Fülöp <attila@fueloep.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#13210 Closes openzfs#13236
Commit 3b52ccd introduced a flaw where FSR and FSAVE are not restored when using a Linux 5.16 kernel. These instructions are only used when XSAVE is not supported by the processor meaning only some systems will encounter this issue. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Attila Fülöp <attila@fueloep.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#13210 Closes openzfs#13236
Commit 3b52ccd introduced a flaw where FSR and FSAVE are not restored when using a Linux 5.16 kernel. These instructions are only used when XSAVE is not supported by the processor meaning only some systems will encounter this issue. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Attila Fülöp <attila@fueloep.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#13210 Closes openzfs#13236
Commit 3b52ccd introduced a flaw where FSR and FSAVE are not restored when using a Linux 5.16 kernel. These instructions are only used when XSAVE is not supported by the processor meaning only some systems will encounter this issue. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Attila Fülöp <attila@fueloep.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#13210 Closes openzfs#13236
Hi,
I am totally speechless. After awaiting the new ZFS release being compatible with the 5.16 Kernel staying around unprotected with open security bugs for months your newest release 2.1.3 is totally broken. I installed this release yesterday. After that the system became totally instable. Trying to copy data from my main ZFS pool to a fresh formatted external drive (also ZFS) via USB failed.
Dozens of the following messages appeared. The system freezed, the monitor became black and the user was logged out.
I tried this about 10 times and reformatted the external drive with ZFS again each time. But nothing changed.
Almost all applications like VLC, Firefox, Vivaldi, VSCode had been crashing directly. It was not possible to copy data over USB. Audio problems. Then I installed an slightly newer NVIDIA driver and the system became a bit more stable. Applications started now.
But this morning I tried again to copy data per rsync to the external drive (reformatted before again). And boom black monitor logged out again. Ok then I did two Memcheck86+ passes without any errors and checked the BIOS settings again. Everything is normal. The PC works with Windows like normal also.
Ok, maybe the kernel upgrade was broken. So I decided to do a fresh Fedora install. And what should I say. After installing ZFS the system was instable again. Horrible unstable to be precise. The first import of my pool worked. But when I tried to open a folder the PC crashed. After a reboot I could access at least some folders without crashing.
Then I installed the NVIDIA driver again. Now the system was much more stable again. I could open all folders now.
But the problems still persist. The System is completely unstable.
PLEASE INVESTIGATE THIS! THERE ARE HORRIBLE BUGS!
The text was updated successfully, but these errors were encountered: