Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PANIC: inode has invalid mode: 0x0 #11474

Closed
gbsf opened this issue Jan 17, 2021 · 43 comments
Closed

PANIC: inode has invalid mode: 0x0 #11474

gbsf opened this issue Jan 17, 2021 · 43 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@gbsf
Copy link

gbsf commented Jan 17, 2021

System information

Type Version/Name
Distribution Name Archlinux
Distribution Version rolling
Linux Kernel 5.10.6-arch1-1
Architecture x86_64
ZFS Version 2.0.1-1
SPL Version 2.0.1-1

Describe the problem you're observing

A couple of days ago, a program (running under Wine, if it matters) either created a corrupted file or caused some system failure that made the file mode become 0.
Since those were only log files, I renamed the enclosing dir to stop the program from trying to access them.

Comparing with other log files created by the application, the correct mode for this file should be S_IFREG|S_ISVTX|0777.
It is stored under a directory with mode S_IFDIR|S_ISUID|S_ISGID|S_ISVTX|0777 and everything is owned by my user.

The pool is backed by a single SATA SSD. I already ran a scrub but no errors were reported.

Describe how to reproduce the problem

I couldn't find a way to reproduce the creation of such corrupted files, but any attempt to access the existing ones will hang with a kernel panic.

Include any warning/errors/backtraces from the system logs

PANIC: inode 716430 has invalid mode: 0x0
Showing stack for process 30162
CPU: 0 PID: 30162 Comm: tree Tainted: P           OE     5.10.6-arch1-1 #1
Hardware name: LENOVO 20217/VIQY0Y1, BIOS 74CN47WW(V3.08) 1/30/2015
Call Trace:
 dump_stack+0x6b/0x83
 vcmn_err.cold+0x58/0x80 [spl]
 zfs_panic_recover+0x75/0x90 [zfs]
 zfs_znode_alloc+0x6d8/0x740 [zfs]
 zfs_zget+0x270/0x2b0 [zfs]
 zfs_dirent_lock+0x36c/0x6c0 [zfs]
 zfs_dirlook+0xad/0x2d0 [zfs]
 ? zfs_zaccess+0x127/0x490 [zfs]
 zfs_lookup+0x1e6/0x3d0 [zfs]
 zpl_lookup+0xf2/0x210 [zfs]
 ? step_into+0xd3/0x700
 __lookup_slow+0x85/0x140
 walk_component+0x141/0x1b0
 path_lookupat+0x5b/0x190
 filename_lookup+0xbe/0x1d0
 vfs_statx+0x86/0x140
 __do_sys_newlstat+0x47/0x80
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fcf46b6631a
Code: ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 41 89 f8 48 89 f7 48 89 d6 41 83 f8 01 77 2d b8 06 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 06 c3 0f 1f 44 00 00 48 8b 15 21 1b 0d 00 f7
RSP: 002b:00007ffc141c4ec8 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
RAX: ffffffffffffffda RBX: 0000000000000021 RCX: 00007fcf46b6631a
RDX: 00007ffc141c4f10 RSI: 00007ffc141c4f10 RDI: 000055b340320240
RBP: 00007ffc141c5050 R08: 0000000000000001 R09: 00007fcf46bfd0c0
R10: 00007fcf46bfcfc0 R11: 0000000000000246 R12: 000055b3402fc420
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
@gbsf gbsf added Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang) labels Jan 17, 2021
@gbsf
Copy link
Author

gbsf commented Jan 17, 2021

I did some more testing, and the file mode is only lost after a reboot.

@aerusso
Copy link
Contributor

aerusso commented Jan 24, 2021

@gbsf I have a few questions, some of them very mundane, some of them might conceivably be dangerous to investigate:

  1. Does smartctl tell you the drive is healthy?
  2. Are you using ECC ram? Are you getting any ECC errors? Have you done a ram-test recently?
  3. Can you snapshot the dataset? Can you zfs send the snapshot (without the panic)? Can you zfs receive that to another pool?
  4. Just to be clear, when you stat the file, you get the panic? Can you mv it (same dataset) without the panic? What happens when you cat the file, or append to the file? (You might consider trying to snapshot and restore the dataset with the misbehaving file before doing this).
  5. Did the problem go away by itself after you rebooted? If it resolved, what is the state of the file now?
  6. Has the problem recurred?
  7. If you can cat the file without error, what is cat $file | wc -c, and how does it compare to the size reported by ls?

@Arvedui
Copy link

Arvedui commented Jan 25, 2021

I think I might have the same problem. The error message is the same, the stack trace is basically identical and the problematic file is a log file written by windows application running via wine. I am running archlinux too.

@aerusso

  1. yes
  2. no
  3. yes and yes
  4. yes, no and I cannot cat it
  5. no it only appreared after reboot
  6. not yet

@sdimitro
Copy link
Contributor

Hey @gbsf, is this issue reproducible? If yes, would it be possible to capture a kernel crash dump from the system?

@Xaenalt
Copy link

Xaenalt commented Jan 27, 2021

Hey all, I seem to be able to reproduce the issue 100%, but I can't for the life of me figure out how to get it to generate a core dump... I get the panic and everything, but the dump seems to just disappear rather than be written, if someone could help me generate it I'd be happy to upload it

@gbsf
Copy link
Author

gbsf commented Jan 27, 2021

Hey @gbsf, is this issue reproducible? If yes, would it be possible to capture a kernel crash dump from the system?

Yes, it is reproducible. I don't exactly understand what you mean by kernel crash dump, the panic backtrace is in the issue already. If you mean a memory dump from an affected system, I'm not really setup to capture those. Either way, since the panic is only a symptom of the error, I don't think it'll be very useful. At that point, the inode is already corrupted.

But I managed to create a test case based on the operations Wine performs.
Steps to reproduce:

  1. Run ./issue11474
  2. Reboot
  3. ls test_dir/test_file
  4. Kernel panic, ls is hung forever.

issue11474.c.txt

@aerusso
Copy link
Contributor

aerusso commented Jan 28, 2021

@gbsf Hey! Thanks for the reproducer! I'm using Debian 5.10.9, and I'm not able to reproduce any issue with your program.

Does this work on a new pool? I.e.,

truncate -s 512M ~/trash-file.loopdev
losetup loop0 ~/trash-file.loopdev
zpool create trashpool loop0

and then run it on that one?

Other facts: I'm on a virtual machine, using the "cloud image". I can try with 5.10.5 relatively easily, if you're able to reproduce on a clean pool.

@gbsf
Copy link
Author

gbsf commented Jan 29, 2021

I can't seem to be able to reproduce this issue any more, even on the affected pool. @Arvedui, @Xaenalt, can you still reproduce this?

@Xaenalt
Copy link

Xaenalt commented Jan 29, 2021

I still have the file/directory in question around, yeah I can still reproduce it

@Xaenalt
Copy link

Xaenalt commented Jan 29, 2021

This is all the info I have from the crash, I still can't get it to write the vmcore out

[ 1952.002618] PANIC: inode 950970 has invalid mode: 0xbb22

[ 1952.002620] Showing stack for process 58662
[ 1952.002621] CPU: 2 PID: 58662 Comm: rsync Kdump: loaded Tainted: P           OE     5.10.10-200.fc33.x86_64 #1
[ 1952.002622] Hardware name: Gigabyte Technology Co., Ltd. Z370 AORUS Gaming 7/Z370 AORUS Gaming 7, BIOS F13 03/22/2019
[ 1952.002622] Call Trace:
[ 1952.002627]  dump_stack+0x6b/0x83
[ 1952.002632]  vcmn_err.cold+0x58/0x80 [spl]
[ 1952.002634]  ? kmem_cache_alloc+0xe7/0x1e0
[ 1952.002671]  ? sa_cache_constructor+0x23/0x40 [zfs]
[ 1952.002674]  ? spl_kmem_cache_alloc+0x9b/0x7a0 [spl]
[ 1952.002675]  ? _cond_resched+0x16/0x40
[ 1952.002676]  ? mutex_lock+0xe/0x30
[ 1952.002695]  ? dmu_buf_set_user_ie+0x55/0x80 [zfs]
[ 1952.002724]  zfs_panic_recover+0x5d/0x60 [zfs]
[ 1952.002754]  zfs_znode_alloc+0x658/0x6c0 [zfs]
[ 1952.002786]  zfs_zget+0x19e/0x230 [zfs]
[ 1952.002815]  zfs_dirent_lock+0x336/0x6c0 [zfs]
[ 1952.002845]  zfs_dirlook+0x75/0x280 [zfs]
[ 1952.002874]  ? zfs_zaccess+0x1aa/0x450 [zfs]
[ 1952.002902]  zfs_lookup+0x1b4/0x3a0 [zfs]
[ 1952.002930]  zpl_lookup+0xa1/0x1e0 [zfs]
[ 1952.002931]  ? _cond_resched+0x16/0x40
[ 1952.002933]  __lookup_slow+0x74/0x130
[ 1952.002934]  walk_component+0x11b/0x190
[ 1952.002935]  ? _cond_resched+0x16/0x40
[ 1952.002936]  path_lookupat+0x72/0x1c0
[ 1952.002937]  filename_lookup+0x97/0x180
[ 1952.002940]  ? strncpy_from_user+0x85/0x1a0
[ 1952.002941]  ? getname_flags.part.0+0x45/0x1a0
[ 1952.002942]  vfs_statx+0x64/0x100
[ 1952.002943]  __do_sys_newlstat+0x26/0x40
[ 1952.002944]  ? iterate_dir+0xfe/0x1b0
[ 1952.002946]  ? switch_fpu_return+0x40/0xb0
[ 1952.002947]  ? exit_to_user_mode_prepare+0x64/0x140
[ 1952.002949]  do_syscall_64+0x33/0x40
[ 1952.002951]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1952.002952] RIP: 0033:0x7f0b2e24664a
[ 1952.002953] Code: ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 41 89 f8 48 89 f7 48 89 d6 41 83 f8 01 77 2d b8 06 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 06 c3 0f 1f 44 00 00 48 8b 15 f9 27 0d 00 f7
[ 1952.002954] RSP: 002b:00007ffea6a4e908 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
[ 1952.002955] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0b2e24664a
[ 1952.002956] RDX: 00007ffea6a4ea40 RSI: 00007ffea6a4ea40 RDI: 00007ffea6a4ead0
[ 1952.002956] RBP: 00007ffea6a4ead0 R08: 0000000000000001 R09: 0000000000000000
[ 1952.002957] R10: 00007ffea6a52d00 R11: 0000000000000246 R12: 00007ffea6a4ead0
[ 1952.002958] R13: 00007ffea6a4ea40 R14: 000055922f3f7b50 R15: 00007ffea6a4ea40

@Xaenalt
Copy link

Xaenalt commented Jan 29, 2021

Is there a way, just to get things working again, that I could reach in and fix the value for that inode?

Also, I have xattr=sa set, not sure if that's relevant

@Xaenalt
Copy link

Xaenalt commented Jan 29, 2021

I narrowed it down to a file in this directory somewhere, and ran zdb on them to try to get an idea of what the results were
Not sure what I'm looking at here, but one of the items in there is the culprit, since issuing ls on the directory causes the panic

zdb-all-objects.gz

@Xaenalt
Copy link

Xaenalt commented Jan 29, 2021

Ah, think I found out how to get some info on that inode specifically
The mode there looks really unusual, might have some garbage getting written into that field?

950970.txt

@Xaenalt
Copy link

Xaenalt commented Jan 29, 2021

Searching that device, there's a few occurrences of modes that don't look right

  71755 	mode	100444
     11 	mode	100555
  17446 	mode	100600
 186012 	mode	100644
 445315 	mode	100664
      2 	mode	100744
   4750 	mode	100750
  31853 	mode	100755
   2428 	mode	100770
  71658 	mode	100775
     11 	mode	100777
   2640 	mode	120777
      9 	mode	1777777777761304335442
      1 	mode	1777777777770033564065
      4 	mode	40
    211 	mode	40700
    483 	mode	40750
  29764 	mode	40755
    252 	mode	40770
  66333 	mode	40775
      2 	mode	40777

@Xaenalt
Copy link

Xaenalt commented Jan 29, 2021

Also worth mentioning is this is a mirror device, and scrub does not fix this error

@Xaenalt
Copy link

Xaenalt commented Jan 29, 2021

At a glance, I'm wondering if the user.wine.sd field is somehow overflowing into the mode field

@gbsf
Copy link
Author

gbsf commented Jan 30, 2021

I can't seem to be able to reproduce this issue any more, even on the affected pool. @Arvedui, @Xaenalt, can you still reproduce this?

It seems my test case is incomplete, because I can reproduce using Wine. I tried to avoid this because it is way more involved than a simple C program. @aerusso, here are the new STR:

  1. Install winehq-staging=6.0.0. See here for Debian instructions.
  2. truncate -s 5G ~/trash-file.loopdev
  3. losetup loop0 ~/trash-file.loopdev
  4. zpool create trashpool loop0
  5. wget https://mtgarena.downloads.wizards.com/Live/Windows64/versions/3358.841497/MTGAInstaller_0.1.3358.841497.msi
  6. WINEPREFIX=/trashpool msiexec /i MTGAInstaller_0.1.3358.841497.msi
  7. WINEPREFIX=/trashpool wine "C:\Program Files\Wizards of the Coast\MTGA\MTGA.exe". It will download about 1GB of files. You can close when it reaches the login screen.
  8. Reboot and reimport pool.
  9. tree -p /trashpool | grep '?' will show all corrupted files if you set zfs_recover=1 and just hang otherwise.

@aerusso
Copy link
Contributor

aerusso commented Feb 2, 2021

@gbsf Thanks again---I am going to try your reproducer over the weekend on ZFS 2.0.2 and Linux 5.10.12.

@aerusso
Copy link
Contributor

aerusso commented Feb 6, 2021

@gbsf I have "good" news: your reproducer works on a VM I have. For anyone else interested, even if you get a wine crash calling MTGA.exe, it still produces the corruption (I ran MTGA.exe twice, but I don't know if that was necessary).

@aerusso
Copy link
Contributor

aerusso commented Feb 6, 2021

3d40b65 refs/bisect/bad
a103ae4 refs/bisect/good-a103ae446ed4aaaac6c83ce242efef313aad2200

(The ZTS patch in between is obviously innocent).

3d40b65 is a big, refactoring patch @mattmacy @freqlabs @behlendorf . This is a data corruption bug, as far as I can tell.

I'm double-checking my work right now.

@aerusso
Copy link
Contributor

aerusso commented Feb 7, 2021

My double check confirms that 3d40b65 introduces this bug (fa7b558 does not reproduce).

aerusso added a commit to aerusso/zfs that referenced this issue Feb 7, 2021
3d40b65 refactored zfs_vnops.c, which shared much code verbatim between
Linux and BSD.  After a successful write, the suid/sgid bits are reset,
and the mode to be written is stored in newmode.  On Linux, this was
propagated to both the in-memory inode and znode, which is then updated
with sa_update.

3d40b65 accidentally removed the initialization of newmode, which
happened to occur on the same line as the inode update (which has been
moved out of the function).

The uninitialized newmode can be saved to disk, leading to a crash on
stat() of that file, in addition to a merely incorrect file mode.

Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes openzfs#11474
aerusso added a commit to aerusso/zfs that referenced this issue Feb 7, 2021
3d40b65 refactored zfs_vnops.c, which shared much code verbatim between
Linux and BSD.  After a successful write, the suid/sgid bits are reset,
and the mode to be written is stored in newmode.  On Linux, this was
propagated to both the in-memory inode and znode, which is then updated
with sa_update.

3d40b65 accidentally removed the initialization of newmode, which
happened to occur on the same line as the inode update (which has been
moved out of the function).

The uninitialized newmode can be saved to disk, leading to a crash on
stat() of that file, in addition to a merely incorrect file mode.

Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes openzfs#11474
@aerusso
Copy link
Contributor

aerusso commented Feb 7, 2021

I think I figured it out, but I'd recommend waiting until the patch gets some review before testing it.

@ExceptionGit
Copy link

How to fix or remove all bad directories/files after this patch? Release zfs 2.0.1 a month ago, this is too long a time interval for restoring from a backup (root on zfs).

@Xaenalt
Copy link

Xaenalt commented Feb 7, 2021

Pending the fix could overwriting the mode fix the error?

behlendorf pushed a commit that referenced this issue Feb 8, 2021
3d40b65 refactored zfs_vnops.c, which shared much code verbatim between
Linux and BSD.  After a successful write, the suid/sgid bits are reset,
and the mode to be written is stored in newmode.  On Linux, this was
propagated to both the in-memory inode and znode, which is then updated
with sa_update.

3d40b65 accidentally removed the initialization of newmode, which
happened to occur on the same line as the inode update (which has been
moved out of the function).

The uninitialized newmode can be saved to disk, leading to a crash on
stat() of that file, in addition to a merely incorrect file mode.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11474 
Closes #11576
@behlendorf
Copy link
Contributor

I've applied @aerusso's fix to the master branch and have added it to the staging branch for the next point release (zfs-2.0.3).

For anyone who hit this issue you should be able to fix the panic by temporarily enabling the zfs_recover module option (set /sys/module/zfs/parameters/zfs_recover=1). This will convert the panic in to a warning for any effected files/directories/symlinks/etc. Since the mode information is what was long the code will assume the inode is a regular file and you should be able to remove it.

@adamdmoss
Copy link
Contributor

Without trying to sound alarmist, the (tiny) possibility of a file ending up with a semirandom mode from an uninitialized field - which might leave permissions open rather than just weird - elevates this from a stability or corruption issue to a security issue - IMHO. Just sayin' in case that warrants flushing out the next point release ASAP.

@behlendorf
Copy link
Contributor

@adamdmoss yes, I'd like to get this tagged sooner rather than latter.

@behlendorf behlendorf removed the Status: Triage Needed New issue which needs to be triaged label Feb 8, 2021
@dimez
Copy link

dimez commented Feb 11, 2021

Hello,
Same problem.
zfs 2.0.2 (userspace and kernelspace)

Showing stack for process 5633
CPU: 1 PID: 5633 Comm: rm Tainted: P           OE     5.4.0-66-generic #74-Ubuntu
Hardware name: Supermicro X10SL7-F/X10SL7-F, BIOS 3.3 03/28/2020
Call Trace:
dump_stack+0x6d/0x9a
spl_dumpstack+0x29/0x2b [spl]
vcmn_err.cold+0x60/0x94 [spl]
? _cond_resched+0x19/0x30
? kmem_cache_alloc+0x178/0x230
? sa_cache_constructor+0x27/0x50 [zfs]
? sa_lookup_locked+0x66/0xb0 [zfs]
zfs_panic_recover+0x6f/0x90 [zfs]
? zfs_inode_update+0x5d/0xb0 [zfs]
zfs_znode_alloc+0x657/0x6d0 [zfs]
zfs_zget+0x1c3/0x270 [zfs]
zfs_dirent_lock+0x34b/0x680 [zfs]
zfs_dirlook+0x90/0x2b0 [zfs]
? zfs_zaccess+0x224/0x410 [zfs]
zfs_lookup+0x1fd/0x3f0 [zfs]
zpl_lookup+0xc9/0x1e0 [zfs]
? __d_alloc+0x138/0x1f0
__lookup_hash+0x70/0xa0
? __sb_start_write+0x47/0x80
do_unlinkat+0x135/0x2d0
__x64_sys_unlinkat+0x38/0x60
do_syscall_64+0x57/0x190
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fe3a168fe6b
Code: 73 01 c3 48 8b 0d 25 80 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 07 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f5 7f 0d 00 f7 d8 64 89 01 48
RSP: 002b:00007ffee1fccbb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000107
RAX: ffffffffffffffda RBX: 000056410e4f1b30 RCX: 00007fe3a168fe6b
RDX: 0000000000000000 RSI: 000056410e4f1c38 RDI: 0000000000000008
RBP: 000056410e4d8440 R08: 0000000000000003 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffee1fccd90 R14: 00007ffee1fccd90 R15: 0000000000000002

I have raidz2 with one bad disk:

  pool: pool
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
	invalid.  Sufficient replicas exist for the pool to continue
	functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
  scan: scrub repaired 0B in 00:00:34 with 0 errors on Thu Feb 11 21:00:35 2021
config:

	NAME                                                  STATE     READ WRITE CKSUM
	pool                                                  DEGRADED     0     0     0
	  raidz2-0                                            DEGRADED     0     0     0
	    ata-WDC_WD1000CHTZ-04JCPV0_WD-WXH1E33YFPD1-part2  ONLINE       0     0     0
	    ata-WDC_WD1000CHTZ-04JCPV0_WD-WXK1E63ACEJX-part2  ONLINE       0     0     0
	    ata-WDC_WD1000CHTZ-04JCPV0_WD-WX11E83ET219-part2  UNAVAIL      0     0     0
	    ata-WDC_WD1000CHTZ-04JCPV1_WD-WXS1EC4HLFJ2-part2  ONLINE       0     0     0
	    ata-WDC_WD1000CHTZ-04JCPV0_WD-WX91E13TRM21-part2  ONLINE       0     0     0
	    ata-WDC_WD1000CHTZ-04JCPV0_WD-WX11E83ET028-part2  ONLINE       0     0     0
	    ata-WDC_WD1000CHTZ-04JCPV0_WD-WX11E83ES147-part2  ONLINE       0     0     0
	    ata-WDC_WD1000CHTZ-04JCPV0_WD-WX21E42MM611-part2  ONLINE       0     0     0

errors: No known data errors

P.S. Just upgraded to 2.0.3, problem is present.

@spillner
Copy link

spillner commented Feb 24, 2021

I just hit the same issue on driver 2.0.1, so it's not just isolated to 2.0.2 and may have been in the code base longer / have other code paths that trigger a similar issue (not sure if that's what dimez means by "P.S. Just upgraded to 2.0.3, problem is present", or he just hasn't repaired the on-disk structure). My case was also a directory created under Wine (pure coincidence with OP?); there were no apparent issues with the installer process that created the directory, but after a reboot it is impossible to ls, du, or rm -rf that directory (any of the three put the process into an unkillable D+ state that never returns). I'm on Linux 5.10.6 with zfs-2.0.1 built in-kernel; the affected partition comes from a zpool mirrored across two SATA drives, neither of which is reporting any hardware errors, so I don't think it's a coincidental physical failure. Of note, I've been running that zpool for years under the 0.8.x series and never had any problems, but I just upgraded from 0.8.6 (on an older kernel) to the kernel 5.10.6 / zfs-2.0.1 combination last month.

The dmesg entry for the first hang (subsequent attempts to access the directory also hang, but don't generate new kernel log entries) is

[703847.050679] PANIC: inode 2834363 has invalid mode: 0x0

[703847.050685] Showing stack for process 20511
[703847.050689] CPU: 12 PID: 20511 Comm: du Tainted: P O 5.10.6 #4
[703847.050691] Hardware name: System manufacturer System Product Name/PRIME B350-PLUS, BIOS 5220 09/12/2019
[703847.050692] Call Trace:
[703847.050701] dump_stack+0x57/0x6a
[703847.050706] vcmn_err.cold+0x58/0x80
[703847.050711] ? kmem_cache_alloc+0xed/0x1f0
[703847.050715] ? sa_cache_constructor+0x23/0x40
[703847.050719] ? spl_kmem_cache_alloc+0x92/0x710
[703847.050721] ? spl_kmem_cache_alloc+0x92/0x710
[703847.050726] ? _cond_resched+0x16/0x40
[703847.050729] ? mutex_lock+0xe/0x30
[703847.050732] ? dmu_buf_set_user_ie+0x52/0x70
[703847.050736] zfs_panic_recover+0x6d/0x90
[703847.050741] zfs_znode_alloc+0x655/0x6c0
[703847.050747] zfs_zget+0x1a6/0x250
[703847.050752] zfs_dirent_lock+0x353/0x6a0
[703847.050756] zfs_dirlook+0x89/0x2b0
[703847.050758] ? zfs_zaccess+0x114/0x480
[703847.050762] zfs_lookup+0x1e4/0x3d0
[703847.050765] zpl_lookup+0xc5/0x1e0
[703847.050769] __lookup_slow+0x84/0x140
[703847.050772] walk_component+0x154/0x1d0
[703847.050774] ? path_init+0x1e2/0x3e0
[703847.050777] path_lookupat+0x72/0x1c0
[703847.050780] filename_lookup+0xaa/0x1b0
[703847.050785] ? strncpy_from_user+0x4e/0x140
[703847.050788] ? getname_flags.part.0+0x45/0x1a0
[703847.050791] vfs_statx+0x74/0x130
[703847.050795] __do_sys_newfstatat+0x31/0x70
[703847.050798] ? iterate_dir+0xfe/0x1b0
[703847.050801] ? fsnotify_grab_connector+0x4a/0x80
[703847.050804] ? fsnotify_find_mark+0x14/0x80
[703847.050808] ? exit_to_user_mode_prepare+0x32/0x140
[703847.050812] do_syscall_64+0x33/0x80
[703847.050815] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[703847.050819] RIP: 0033:0x7f0244eb7f2a
[703847.050823] Code: 48 89 f2 b9 00 01 00 00 48 89 fe bf 9c ff ff ff e9 0b 00 00 00 66 2e 0f 1f 84 00 00 00 00 00 90 41 89 ca b8 06 01 00 00 0f 05 <3d> 00 f0 ff ff 77 07 31 c
0 c3 0f 1f 40 00 48 8b 15 09 7f 0d 00 f7
[703847.050825] RSP: 002b:00007fff1be2c4c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000106
[703847.050829] RAX: ffffffffffffffda RBX: 00000000023df7f0 RCX: 00007f0244eb7f2a
[703847.050830] RDX: 00000000023df868 RSI: 00000000023df8f8 RDI: 0000000000000005
[703847.050832] RBP: 00000000023df868 R08: 0000000000000005 R09: 00000000023ee440
[703847.050833] R10: 0000000000000100 R11: 0000000000000246 R12: 00000000023b1a10
[703847.050835] R13: 00000000023a8d20 R14: 000000000000000b R15: 8000000000000000

@Retrodynen
Copy link

Same here. Also with a direct made by wine:

image

(Screenshot from fsearch during reindexing, a generic file search program)

[78476.508749] PANIC: inode 432417 has invalid mode: 0xf224

[78476.508751] Showing stack for process 1058041
[78476.508753] CPU: 10 PID: 1058041 Comm: fsearch Tainted: P           O      5.10.10 #1
[78476.508754] Hardware name: LENOVO 20TK000RUK/20TK000RUK, BIOS N2VET22W (1.07 ) 09/09/2020
[78476.508754] Call Trace:
[78476.508760]  dump_stack+0x6d/0x88
[78476.508764]  spl_panic+0x154/0x1e8 [spl]
[78476.508767]  ? _cond_resched+0x15/0x30
[78476.508769]  ? kmem_cache_alloc+0x3c/0x410
[78476.508786]  ? sa_cache_init+0xa3/0xc0 [zfs]
[78476.508788]  ? spl_kmem_cache_alloc+0x9c/0x790 [spl]
[78476.508801]  ? sa_handle_get+0x11a4/0x11e0 [zfs]
[78476.508814]  zfs_panic_recover+0x5f/0x70 [zfs]
[78476.508821]  zfs_inode_update+0x692/0x6f0 [zfs]
[78476.508829]  zfs_zget+0x1ae/0x260 [zfs]
[78476.508837]  zfs_dirent_lock+0x336/0x6c0 [zfs]
[78476.508844]  zfs_dirlook+0x75/0x280 [zfs]
[78476.508851]  ? zfs_zaccess+0x20d/0x410 [zfs]
[78476.508857]  zfs_lookup+0x1b4/0x3a0 [zfs]
[78476.508863]  zio_do_crypt_abd+0x2691/0x29b0 [zfs]
[78476.508865]  ? step_into+0xaa/0x6f0
[78476.508867]  __lookup_slow+0x74/0x130
[78476.508869]  walk_component+0x110/0x180
[78476.508870]  ? path_init+0x2af/0x390
[78476.508871]  path_lookupat.isra.0+0x6d/0x150
[78476.508873]  filename_lookup+0x9b/0x150
[78476.508875]  ? __check_object_size+0x136/0x150
[78476.508878]  ? strncpy_from_user+0x47/0x190
[78476.508879]  ? getname_flags+0x69/0x1e0
[78476.508881]  vfs_statx+0x62/0xf0
[78476.508882]  __do_sys_newlstat+0x26/0x40
[78476.508884]  ? __x64_sys_getdents64+0xde/0x100
[78476.508885]  ? verify_dirent_name+0x30/0x30
[78476.508887]  do_syscall_64+0x33/0x80
[78476.508889]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[78476.508891] RIP: 0033:0x7fa153c7d816
[78476.508892] Code: b6 0c 00 64 c7 00 16 00 00 00 b8 ff ff ff ff c3 0f 1f 40 00 41 89 f8 48 89 f7 48 89 d6 41 83 f8 01 77 29 b8 06 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 02 c3 90 48 8b 15 
21 b6 0c 00 f7 d8 64 89 02
[78476.508893] RSP: 002b:00007fa0e2ffbe18 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
[78476.508895] RAX: ffffffffffffffda RBX: 000055737204e780 RCX: 00007fa153c7d816
[78476.508895] RDX: 00007fa0e2ffbe58 RSI: 00007fa0e2ffbe58 RDI: 00007fa0fc03b460
[78476.508896] RBP: 00007fa0e2ffcb10 R08: 0000000000000001 R09: 0000000000000000
[78476.508896] R10: 00007fa0fe733703 R11: 0000000000000246 R12: 00007fa0fe733703
[78476.508897] R13: 00007fa0fe733690 R14: 00007fa0fc002318 R15: 00007fa0e2ffbe58


jsai20 pushed a commit to jsai20/zfs that referenced this issue Mar 30, 2021
3d40b65 refactored zfs_vnops.c, which shared much code verbatim between
Linux and BSD.  After a successful write, the suid/sgid bits are reset,
and the mode to be written is stored in newmode.  On Linux, this was
propagated to both the in-memory inode and znode, which is then updated
with sa_update.

3d40b65 accidentally removed the initialization of newmode, which
happened to occur on the same line as the inode update (which has been
moved out of the function).

The uninitialized newmode can be saved to disk, leading to a crash on
stat() of that file, in addition to a merely incorrect file mode.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes openzfs#11474 
Closes openzfs#11576
@reactormonk
Copy link

Also via wine and lutris here. Any workaround for actually deleting the invalid files?

@reactormonk
Copy link

#11474 (comment) works.

sempervictus pushed a commit to sempervictus/zfs that referenced this issue May 31, 2021
3d40b65 refactored zfs_vnops.c, which shared much code verbatim between
Linux and BSD.  After a successful write, the suid/sgid bits are reset,
and the mode to be written is stored in newmode.  On Linux, this was
propagated to both the in-memory inode and znode, which is then updated
with sa_update.

3d40b65 accidentally removed the initialization of newmode, which
happened to occur on the same line as the inode update (which has been
moved out of the function).

The uninitialized newmode can be saved to disk, leading to a crash on
stat() of that file, in addition to a merely incorrect file mode.

Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes openzfs#11474 
Closes openzfs#11576
@mhosken
Copy link

mhosken commented Jul 21, 2021

@Xaenalt

Searching that device, there's a few occurrences of modes that don't look right

  71755 	mode	100444
     11 	mode	100555
...

How did you get this report? I would like to run the same on my machine that is exhibiting very similar behaviour (on completely different files, 5.13 kernel, ubuntu 21.04 zfs-2.0.2, zfs-kmod-2.0.3).

@atj
Copy link

atj commented Jul 29, 2021

@Xaenalt

Searching that device, there's a few occurrences of modes that don't look right

  71755 	mode	100444
     11 	mode	100555
...

How did you get this report? I would like to run the same on my machine that is exhibiting very similar behaviour (on completely different files, 5.13 kernel, ubuntu 21.04 zfs-2.0.2, zfs-kmod-2.0.3).

Here's a ruby one liner that provides roughly the same output for all files in the current working directory:

$ ruby -e 'Dir["*"].each {|f| next if !File.file?(f); s = File.stat(f); puts "%d\tmode\t%s" % [s.ino, s.mode.to_s(8)]}'
7704	mode	100644
301336	mode	100644
146243	mode	100644
7703	mode	100644
148760	mode	100644

@Xaenalt
Copy link

Xaenalt commented Jul 29, 2021

100444 and 100555 are valid values, a mode like 1777777777761304335442 is what indicated there was a problem

To generate that report I used zdb -dAdvv $FILESYSTEM | grep 'mode\s\+[0-9]' | sort | uniq -c

@Retrodynen
Copy link

Just been hit with the same issue again... on ZFS 2.1.0, is there anything I can provide to assist in getting this fixed?

@aerusso
Copy link
Contributor

aerusso commented Jul 30, 2021

@Retrodynen To help you with the issue you're experiencing, you'll have to explain it precisely. Better still, if you can give steps that reproduce the problem on a newly created pool, we'll be able to resolve the issue with relative ease.

By the way, you should probably do this in a new bug report because the underlying issue that caused these symptoms here has already been resolved.

I'll add that, if you ran either of the two point releases affected by this issue, be sure you have already manually repaired the broken mode (by first setting ZFS_RECOVER and then manually adjusting the mode). Read here for details.

For safe measure, you may also want to remove all snapshots that contain files with the damaged modes (i.e., corrupt metadata), since those snapshots will also induce the panic in this bug report. There are no plans that I am aware of to automate the repair. Moreover, any such tool would be severely limited by the fact that the (metadata) information was genuinely lost.

@Retrodynen
Copy link

Retrodynen commented Aug 1, 2021 via email

@jawn-smith
Copy link

I have encountered this error as well, but the zfs_recover=1 suggestion is not helping.

jawn-smith@desktop:~$ zfs version
zfs-2.0.3-8ubuntu7
zfs-kmod-2.0.2-1ubuntu5
jawn-smith@desktop:~$ cat /sys/module/zfs/parameters/zfs_recover
1
jawn-smith@desktop:~$ cat /proc/cmdline
BOOT_IMAGE=/BOOT/ubuntu_ixviql@/vmlinuz-5.11.0-22-generic root=ZFS=rpool/ROOT/ubuntu_ixviql ro quiet splash acpi=force zfs.zfs_recover=1 vt.handoff=1

Then trying to rm -rf the corrupted files results in

353228 Aug 13 14:57:22 desktop kernel: [   71.306007] VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_h       dl)) failed
353229 Aug 13 14:57:22 desktop kernel: [   71.306009] PANIC at zfs_znode.c:339:zfs_znode_sa_init()
353230 Aug 13 14:57:22 desktop kernel: [   71.306011] Showing stack for process 14472
353231 Aug 13 14:57:22 desktop kernel: [   71.306012] CPU: 25 PID: 14472 Comm: rm Tainted: P        W  OE     5.11.0-22-generic #23-Ubun       tu
353232 Aug 13 14:57:22 desktop kernel: [   71.306014] Hardware name: Micro-Star International Co., Ltd. MS-7C37/MPG X570 GAMING PLUS (MS       -7C37), BIOS A.C0 01/25/2021
353233 Aug 13 14:57:22 desktop kernel: [   71.306015] Call Trace:
353234 Aug 13 14:57:22 desktop kernel: [   71.306017]  show_stack+0x52/0x58
353235 Aug 13 14:57:22 desktop kernel: [   71.306021]  dump_stack+0x70/0x8b
353236 Aug 13 14:57:22 desktop kernel: [   71.306024]  spl_dumpstack+0x29/0x2b [spl]
353237 Aug 13 14:57:22 desktop kernel: [   71.306030]  spl_panic+0xd4/0xfc [spl]
353238 Aug 13 14:57:22 desktop kernel: [   71.306035]  ? slab_pre_alloc_hook.constprop.0+0x96/0xe0
353239 Aug 13 14:57:22 desktop kernel: [   71.306038]  ? spl_kmem_cache_alloc+0x3b/0x100 [spl]
353240 Aug 13 14:57:22 desktop kernel: [   71.306042]  ? _cond_resched+0x1a/0x50
353241 Aug 13 14:57:22 desktop kernel: [   71.306044]  ? do_raw_spin_unlock+0x9/0x10 [zfs]
353242 Aug 13 14:57:22 desktop kernel: [   71.306088]  ? __raw_spin_unlock+0x9/0x10 [zfs]
353243 Aug 13 14:57:22 desktop kernel: [   71.306127]  ? dmu_buf_replace_user+0x65/0x80 [zfs]
353244 Aug 13 14:57:22 desktop kernel: [   71.306175]  ? dmu_buf_set_user+0x13/0x20 [zfs]
353245 Aug 13 14:57:22 desktop kernel: [   71.306217]  ? dmu_buf_set_user_ie+0x15/0x20 [zfs]
353246 Aug 13 14:57:22 desktop kernel: [   71.306257]  zfs_znode_sa_init+0xd9/0xe0 [zfs]
353247 Aug 13 14:57:22 desktop kernel: [   71.306322]  zfs_znode_alloc+0x101/0x560 [zfs]
353248 Aug 13 14:57:22 desktop kernel: [   71.306373]  ? dmu_buf_unlock_parent+0x5d/0x90 [zfs]
353249 Aug 13 14:57:22 desktop kernel: [   71.306414]  ? do_raw_spin_unlock+0x9/0x10 [zfs]
353250 Aug 13 14:57:22 desktop kernel: [   71.306454]  ? dbuf_read_impl.constprop.0+0x316/0x3e0 [zfs]
353251 Aug 13 14:57:22 desktop kernel: [   71.306493]  ? dbuf_rele_and_unlock+0x13b/0x4f0 [zfs]
353252 Aug 13 14:57:22 desktop kernel: [   71.306533]  ? _cond_resched+0x1a/0x50
353253 Aug 13 14:57:22 desktop kernel: [   71.306534]  ? slab_pre_alloc_hook.constprop.0+0x96/0xe0
353254 Aug 13 14:57:22 desktop kernel: [   71.306536]  ? spl_kmem_cache_alloc+0x3b/0x100 [spl]
353255 Aug 13 14:57:22 desktop kernel: [   71.306540]  ? kmem_cache_alloc+0xf1/0x200
353256 Aug 13 14:57:22 desktop kernel: [   71.306542]  ? _cond_resched+0x1a/0x50
353257 Aug 13 14:57:22 desktop kernel: [   71.306543]  ? _cond_resched+0x1a/0x50
353258 Aug 13 14:57:22 desktop kernel: [   71.306544]  ? down_read+0x13/0x90
353259 Aug 13 14:57:22 desktop kernel: [   71.306545]  ? dbuf_read+0x1cc/0x520 [zfs]
353260 Aug 13 14:57:22 desktop kernel: [   71.306585]  ? _cond_resched+0x1a/0x50
353261 Aug 13 14:57:22 desktop kernel: [   71.306586]  ? do_raw_spin_unlock+0x9/0x10 [zfs]
353262 Aug 13 14:57:22 desktop kernel: [   71.306633]  ? __raw_spin_unlock+0x9/0x10 [zfs]
353263 Aug 13 14:57:22 desktop kernel: [   71.306685]  ? dnode_rele_and_unlock+0x69/0xe0 [zfs]
353264 Aug 13 14:57:22 desktop kernel: [   71.306731]  ? dmu_object_info_from_dnode+0x8e/0xa0 [zfs]
353265 Aug 13 14:57:22 desktop kernel: [   71.306772]  zfs_zget+0x237/0x280 [zfs]
353266 Aug 13 14:57:22 desktop kernel: [   71.306820]  zfs_dirent_lock+0x42a/0x570 [zfs]
353267 Aug 13 14:57:22 desktop kernel: [   71.306866]  zfs_dirlook+0x91/0x2a0 [zfs]
353268 Aug 13 14:57:22 desktop kernel: [   71.306910]  zfs_lookup+0x1fb/0x3f0 [zfs]
353269 Aug 13 14:57:22 desktop kernel: [   71.306953]  zpl_lookup+0xcb/0x230 [zfs]
353270 Aug 13 14:57:22 desktop kernel: [   71.306996]  ? __d_alloc+0x138/0x1f0
353271 Aug 13 14:57:22 desktop kernel: [   71.306998]  __lookup_hash+0x70/0xa0
353272 Aug 13 14:57:22 desktop kernel: [   71.306999]  ? _cond_resched+0x1a/0x50
353273 Aug 13 14:57:22 desktop kernel: [   71.307000]  do_unlinkat+0x135/0x2d0
353274 Aug 13 14:57:22 desktop kernel: [   71.307001]  __x64_sys_unlinkat+0x38/0x60
353275 Aug 13 14:57:22 desktop kernel: [   71.307002]  do_syscall_64+0x38/0x90
353276 Aug 13 14:57:22 desktop kernel: [   71.307004]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
353277 Aug 13 14:57:22 desktop kernel: [   71.307005] RIP: 0033:0x7fa83849b76b
353278 Aug 13 14:57:22 desktop kernel: [   71.307006] Code: 73 01 c3 48 8b 0d 05 67 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 0       0 00 00 00 00 90 f3 0f 1e fa b8 07 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d5 66 0d 00 f7 d8 64 89 01 48
353279 Aug 13 14:57:22 desktop kernel: [   71.307007] RSP: 002b:00007fffab2531b8 EFLAGS: 00000256 ORIG_RAX: 0000000000000107
353280 Aug 13 14:57:22 desktop kernel: [   71.307008] RAX: ffffffffffffffda RBX: 0000562b7275fc70 RCX: 00007fa83849b76b
353281 Aug 13 14:57:22 desktop kernel: [   71.307009] RDX: 0000000000000000 RSI: 0000562b7275fd78 RDI: 0000000000000005
353282 Aug 13 14:57:22 desktop kernel: [   71.307010] RBP: 0000562b72756440 R08: 0000000000000003 R09: 0000000000000000
353283 Aug 13 14:57:22 desktop kernel: [   71.307010] R10: 0000000000000000 R11: 0000000000000256 R12: 0000000000000000
353284 Aug 13 14:57:22 desktop kernel: [   71.307011] R13: 00007fffab253390 R14: 00007fffab253390 R15: 0000000000000002

Any suggestions for how to resolve this issue besides what I've already tried?

@aerusso
Copy link
Contributor

aerusso commented Aug 14, 2021

@GlenPickle You seem to have already identified the Ubuntu bug that is (probably) responsible for your stack trace. That corresponds to OpenZFS #10971.

@riyad
Copy link

riyad commented Sep 2, 2021

100444 and 100555 are valid values, a mode like 1777777777761304335442 is what indicated there was a problem

To generate that report I used zdb -dAdvv $FILESYSTEM | grep 'mode\s\+[0-9]' | sort | uniq -c

I just tried this on an encrypted pool/dataset and zdb seems to "hide" the necessary info (yes, the keys are loaded):

[...]
    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
       936    1   128K  25.5K     4K      1K  25.5K  100.00  ZFS plain file
                                               176   bonus  System attributes
        dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED 
        dnode maxblkid: 0
                (bonus encrypted)
                (object encrypted)
[...]

Is there a flag for zdb to make it show the info as it does for unencrypted datasets?

@Xaenalt
Copy link

Xaenalt commented Sep 2, 2021

Uhhh, good question my system uses the drive hw encryption, sorry D:

@spillner
Copy link

For anyone who hit this issue you should be able to fix the panic by temporarily enabling the zfs_recover module option (set /sys/module/zfs/parameters/zfs_recover=1). This will convert the panic in to a warning for any effected files/directories/symlinks/etc. Since the mode information is what was long the code will assume the inode is a regular file and you should be able to remove it.

Is there a utility anywhere that can manually fix the ZPL_MODE field, or alternatively that will unlink a file with an invalid ZPL_MODE? As some other users have noted, even when zfs_recover is set to 1, attempts to stat(), unlink(), or chmod() an affected file will hang indefinitely.

I understand that ultimately this can be done via sa_update() with the SA_ZPL_MODE() macro, but its uses in the kernel module source require a delicate setup that doesn't seem to translate easily to a userspace utility, and I can't find any uses of sa_update() in any of the utilities provided in openzfs.

h0tw1r3 pushed a commit to h0tw1r3/pve-openzfs-overlayfs that referenced this issue Mar 11, 2022
the patch fixes a potential panic on systems running ZFS > 2.0.0 and
is already queued for inclusion in 2.0.3 - see [0] for a related
github issue.

[0] openzfs/zfs#11474

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests