Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

silent corruption gives input/output error but cannot be detected with scrub, experienced on 0.7.5 and 0.8.3 versions #10697

Closed
phiser678 opened this issue Aug 10, 2020 · 23 comments
Labels
Component: Send/Recv "zfs send/recv" feature Status: Stale No recent activity for issue Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@phiser678
Copy link

phiser678 commented Aug 10, 2020

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 20.04
Linux Kernel 5.4.0-42-generic
Architecture x86_64
ZFS Version 0.8.3-1ubuntu12.2
SPL Version 0.8.3-1ubuntu12.2

Describe the problem you're observing

I have an input/output error on a directory on a raidz2 zfs filesystem with lz4 compression, but there is no sign of corruption of the disks and it is not detected by scrub.

The error is propagated in the snapshots and the zfs send/recv streams as as well. The original is on a Ubuntu 18.04 with 0.7.5 zfs version which I transferred to a new Ubuntu 20.04 with zfs version 0.8.3. I will be keeping the update only, so I want to delete the bad I/O error directory on the new Ubuntu 20.04 system. The new system uses LVM partitions, which indeed could be the problem, but the original Ubuntu 18.04 has raw disks without LVM and has this fault propagated to the new Ubuntu 20.04. They both have the same behaviour.

I can still read the contents with zdb and extract the contents of the files correctly. I managed to recover the files, but I cannot delete the directory and free the space!

# ls uav_london-input-output-error
ls: cannot open directory 'uav_london-input-output-error': Input/output error

# rm -r uav_london-input-output-error
rm: cannot remove 'uav_london-input-output-error': Directory not empty

# zpool status
  pool: tank
 state: ONLINE
  scan: scrub repaired 0B in 0 days 10:14:06 with 0 errors on Sun Aug  9 10:38:08 2020
config:

	NAME                 STATE     READ WRITE CKSUM
	tank                 ONLINE       0     0     0
	  raidz2-0           ONLINE       0     0     0
	    dm-name-0-fish   ONLINE       0     0     0
	    dm-name-1-fish   ONLINE       0     0     0
	    dm-name-2-fish   ONLINE       0     0     0
	    dm-name-3-fish   ONLINE       0     0     0
	    dm-name-4-fish   ONLINE       0     0     0
	    dm-name-5-fish   ONLINE       0     0     0
	    dm-name-6-fish   ONLINE       0     0     0
	    dm-name-7-fish   ONLINE       0     0     0
	    dm-name-8-fish   ONLINE       0     0     0
	    dm-name-9-fish   ONLINE       0     0     0
	    dm-name-10-fish  ONLINE       0     0     0
	    dm-name-11-fish  ONLINE       0     0     0
	    dm-name-12-fish  ONLINE       0     0     0

errors: No known data errors

# zdb -vv -O tank/ipi/shared video_analysis/uav_london-input-output-error

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
   1525565    1   128K    512  19.5K     512    512  100.00  ZFS directory
                                               176   bonus  System attributes
	dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED 
	dnode maxblkid: 0
	uid     760
	gid     0
	atime	Sun Apr  5 21:31:52 2020
	mtime	Tue Apr  7 22:31:13 2020
	ctime	Thu Jul 16 17:33:56 2020
	crtime	Sun Apr  5 16:32:12 2020
	gen	10088303
	mode	40755
	size	5
	parent	6877228
	links	2
	pflags	40800000144
	xattr	1525566
	microzap: 512 bytes, 3 entries

		README.txt = 1524605 (type: Regular File)
		UAV_London_20200405_15_30.ts = 1525572 (type: Regular File)
		UAV_London_20200326_21_00.ts = 1525760 (type: Regular File)
Indirect blocks:
               0 L0 0:255a3478c000:3000 200L/200P F=1 B=86455/86455 cksum=7b95e48d8:2e8b12b7231:92e665508f7c:142ff65beab585

		segment [0000000000000000, 0000000000000200) size   512


# zdb -vv -O tank/ipi/shared video_analysis/uav_london-input-output-error/README.txt

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
   1524605    1   128K    512    10K     512    512  100.00  ZFS plain file
                                               176   bonus  System attributes
	dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED 
	dnode maxblkid: 0
	uid     760
	gid     605
	atime	Sun Apr  5 21:31:53 2020
	mtime	Sun Apr  5 23:26:15 2020
	ctime	Sun Apr  5 23:26:15 2020
	crtime	Sun Apr  5 19:09:58 2020
	gen	10090207
	mode	100644
	size	98
	parent	1525565
	links	1
	pflags	40800000004
	xattr	1524606
Indirect blocks:
               0 L0 0:255a48d4b000:3000 200L/200P F=1 B=86455/86455 cksum=90f29355e:41f88fcd567:f32f74d1294e:25ce7d86beb3e3

		segment [0000000000000000, 0000000000000200) size   512

# zdb -R tank 0:255a48d4b000:200:r|hexdump -C
Found vdev type: raidz
00000000  76 69 64 65 6f 20 63 61  70 74 75 72 65 64 20 64  |video captured d|
00000010  75 72 69 6e 67 20 74 68  65 20 63 6f 76 69 64 2d  |uring the covid-|
00000020  31 39 20 63 72 69 73 69  73 20 6f 76 65 72 20 74  |19 crisis over t|
00000030  68 65 20 63 69 74 79 20  6f 66 20 4c 6f 6e 64 6f  |he city of Londo|
00000040  6e 2e 0a 4c 69 76 65 20  66 65 65 64 20 70 72 6f  |n..Live feed pro|
00000050  64 75 63 65 64 20 62 79  20 52 65 75 74 65 72 73  |duced by Reuters|
00000060  0a 0a 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000200

Both .ts streams can be read and reconstructed as well. As far as I can tell, this is the only input/output error directory on a 35TB zpool which I have detected. I doubled checked both systems with md5 checksums which pointed me to this abnormally. If I did not checked both systems I would have not known the error, so that's why I put silent in the title and could be possible with numerous zpool systems. Both systems have ECC memory.

How do I free up the space? But before I do, can I run some tests to find the cause or even better fix the abnormally in case others have this as well?

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

No errors in the system logs.

@bghira
Copy link

bghira commented Aug 11, 2020

as far as removing the corruption i am not too sure other than recreating the filesystem but there have been several causes of corruption like this.

#7703 #8421 #10019 #10161 and others.

@bghira
Copy link

bghira commented Aug 11, 2020

@phiser678
Copy link
Author

phiser678 commented Aug 12, 2020

Thanks for finding the other issues!

Hmmm, interesting, issue #10019 shows almost the exact hardware, I'm using a Xeon E3-1226 v3 and also uses avx2 when I check with cat /proc/spl/kstat/zfs/vdev_raidz_bench on both the older system with 0.7.5 and the newer with 0.8.3. I don't have encryption on the datasets.

The other issues involved are with send/recv, but this is not the case. The original zpool created an input/output error, which is just propagated to the new zpools. It could not detect there is a fault when sending the zfs dataset, which is what I experience when I can just read everything with the zdb commands. I have hole_birth enabled, but both datasets are equal as far as I see.
There was also no sudden power failure since everything is handled by UPS power devices.
Is there no feature in zfs which we can add that can detect a input/output error and refuses to send the zfs stream when this happens? This way we could avoid the silent corruption!

Is there a way to delete a used zfs blocks directly with some command? Although I agree this could be very dangerous.

Indeed recreate the dataset and all the snapshots would be a possibility since it only involves the snapshots of a couple of months ago. So, this is, as you suggest, the only way to get rid of this input/output error?

Thanks again for any update.

@michaelvvvvv
Copy link

If it's a any help , I got the same issue on a old Intel i5-5200U using zfs 0.8.3-1ubuntu12.3 on my laptop.

The only solution to fix my problem , was delete all the snapshots newer than a certain date , and zpool scrub , zpool clear a couple of times. No warning in dmesg. What frighten me most is zfs happily continues taking snapshot without any warnings.

@shuther
Copy link

shuther commented Jan 6, 2021

I am getting the same problem; not sure who can help.
Issue appears in Freenas (Freebsd 12) using openzfs 2.0; problem was confirmed using SystemRescue (https://github.com/nchevsky/systemrescue-zfs) that is using a more recent version of zfs. This sound more a zfs issue.
The impacted files are read-only (I never changed them, should have been part of the pool for at least one year). The files (about 17,000) are all part of the same dataset (mainly .jpg files but not all while the dataset contains mostly music)

I have many files that I can't see (cat, less, ... reports: Input/output error); ls is reporting the file howver. I replaced a disk recently, not sure if I should try a resilver or if it was the reason of the problem. I checked old snapshots, and the problem is also there.
I tried to see the content of the file using zdb and it seems successful

cp returns: Bad Address, and dmesg reports the error (nothing for cat, ....):
vm_fault: pager read error

Pool is healthy, scrub did not report any issue.
zfs send
returns also Input/output error; easy way to spot the problem.

I opened a new issue as it relates also to the new open ZFS 2.0: #11443 (comment)

@ghost
Copy link

ghost commented Jan 11, 2021

@shuther what did you do with systemrescue to confirm the issue? I'm not clear on whether you have reproduced the same issue using systemrescue or have used it to confirm that a different version of ZFS is unaffected.

@shuther
Copy link

shuther commented Jan 12, 2021

I can confirm that cat reports also input/output error in systemrescue. And nothing shows up in dmesg. My gut feeling is that the data on disk is good (I am not very familiar with zdb but I was able to extract a piece of the file - if needed I could confirm it matches what I have in a backup), but the problem is connected with the metadata in a way that scrub is not reporting an error, but something else fails.

@phiser678
Copy link
Author

phiser678 commented Jan 14, 2021

At least you could detect the corruption with zfs send/recv, but for me this was silent also! I managed to copy the corrupted files from the person who still had it in his local directory. I cannot delete the corrupted files on any of the copy zpools and I just renamed them to a hidden directory. So, this is really bad for zfs it's "rock solid" reputation!
On 1 storage system I moved our strategy. 1 zfs send/recv on another system and back to a rsync backup copy on a 3th zpool!
Of course the rsync takes hours compaired to zfs recv/send but if you have these kind of errors, what else we can do?

@IvanVolosyuk
Copy link
Contributor

If there is a pool which passes zfs scrub and gives io error on read without any dmesg traces. I wonder if it would be nice to identify the code path of such an io error with a stack. I would imagine there should not too many places in the code which produce this error and somebody could instrument them to dump a stack trace in dmesg for debugging this issue. This could have greatly speed up the investigation of this scary bug. IMHO

@Ringdingcoder
Copy link

This should be relatively easy to investigate, since you can recreate it across send/receive. Can you make a clone and remove everything except the problematic directory from it, and then send this? And make the stream available somewhere?

Or if this makes the problem go away, at least you can then start tracking down what it is that makes it go away.

@phiser678
Copy link
Author

phiser678 commented Jan 17, 2021

That is possible. I will have a look, but it will take some time to make it. The dataset where the corrupted data is, is 9TB, so removing files from there in a clone will probably take some time. Next week I will post an update, if we can trap it in a clone stream. Thank you for your help!

@phiser678
Copy link
Author

phiser678 commented Jan 18, 2021

I managed to narrow the possible bug in the zfs dataset. It seems the dataset is very picky on what system it can run.
Here are the tests I did:

root@backup:~# modinfo zfs|grep version
version:        0.8.3-1ubuntu12.4
srcversion:     4528C7B15E59A789E01F814
root@backup:~# ssh ipifs zfs send -c tank/bad@readme|zfs recv tank/bad
root@backup:~# ls /tank/bad/
video_analysis
root@backup:~# ls /tank/bad/video_analysis/uav_london-input-output-error/
ls: cannot open directory '/tank/bad/video_analysis/uav_london-input-output-error/': Input/output error
root@backup:~# zfs destroy -r tank/bad
root@backup:~# ssh ipifs zfs send tank/bad@readme|zfs recv tank/bad
root@backup:~# ls /tank/bad/video_analysis/uav_london-input-output-error/
ls: cannot open directory '/tank/bad/video_analysis/uav_london-input-output-error/': Input/output error

On an other newer system which does work:

root@testpc1:~# ssh ipifs zfs send -c tank/bad@readme|zfs recv tank/bad 
root@testpc1:~# ls -l /tank/bad/
total 1
drwxrwx--- 3 1081 605 3 Jan 18 19:24 video_analysis
root@testpc1:~# ls -l /tank/bad/video_analysis/
total 1
drwxr-xr-x 2 760 root 3 Jan 18 20:26 uav_london-input-output-error
root@testpc1:~# ls -l /tank/bad/video_analysis/uav_london-input-output-error/
total 1
-rw-r--r-- 1 760 605 98 Apr  5  2020 README.txt
root@testpc1:~# modinfo zfs|grep version
version:        0.8.3-1ubuntu12.5
srcversion:     4528C7B15E59A789E01F814
root@testpc1:~# cat /tank/bad/video_analysis/uav_london-input-output-error/README.txt 
video captured during the covid-19 crisis over the city of London.
Live feed produced by Reuters

However, I tested another machine, with the same(!) zfs version as the one that is working:

root@ridzo:~# ssh root@ipifs zfs send -c tank/bad@readme|zfs recv pool/bad
root@ridzo:~# ls -l /pool/bad/
total 1
drwxrwx---+ 3 1081 605 3 Jan 18 19:24 video_analysis
root@ridzo:~# ls -l /pool/bad/video_analysis
ls: /pool/bad/video_analysis/uav_london-input-output-error: Input/output error
total 1
drwxr-xr-x 2 760 root 3 Jan 18 20:26 uav_london-input-output-error
root@ridzo:~# ls -l /pool/bad/video_analysis/uav_london-input-output-error
ls: /pool/bad/video_analysis/uav_london-input-output-error: Input/output error
ls: cannot open directory '/pool/bad/video_analysis/uav_london-input-output-error': Input/output error
root@ridzo:~# modinfo zfs|grep version
version:        0.8.3-1ubuntu12.5
srcversion:     4528C7B15E59A789E01F814

I also tested an older zfs system, which also worked:

root@testpc3:~# ssh ipifs zfs send -c tank/bad@readme|zfs recv -v pool/bad
receiving full stream of tank/bad@readme into pool/bad@readme
received 5.09M stream in 1 seconds (5.09M/sec)
root@testpc3:~# ls -l /pool/bad/video_analysis/uav_london-input-output-error/README.txt 
-rw-r--r-- 1 760 605 98 Apr  5  2020 /pool/bad/video_analysis/uav_london-input-output-error/README.txt
root@testpc3:~# modinfo zfs|grep version
version:        0.8.1-1ubuntu14.3
srcversion:     F3C94D5226BB5E654A00EF1
root@testpc3:~# modinfo spl|grep version
version:        0.8.1-1ubuntu14.3
srcversion:     9B21F4F344A05823B8DB47A

It's very random where it works and where not.
I included the dataset as a stream here:

$ gzip -dc zfs-bad-input-output.zfs.gz|zfs recv "yourpool"/bad

I did a strace on the bad system when I do an ls on the bad directory, it says:

getxattr("/tank/bad/video_analysis/uav_london-input-output-error", "system.posix_acl_access", NULL, 0) = -1 EIO (Input/output error)

The good system says:

getxattr("/tank/bad/video_analysis/uav_london-input-output-error", "system.posix_acl_access", NULL, 0) = -1 EOPNOTSUPP (Operation not supported)

It seems it's some getxattr that gives different results. I tried getfattr on both systems, both result in empty output, but the bad systems gives something extra in the strace:

listxattr("/tank/bad/video_analysis/uav_london-input-output-error", "system.posix_acl_access\0system.p"..., 256) = 49

Tune in a bit deeper, good system:

# zfs get acltype tank
NAME  PROPERTY  VALUE     SOURCE
tank  acltype   off       default

bad system:

# zfs get acltype tank
NAME  PROPERTY  VALUE     SOURCE
tank  acltype   posixacl  local

We do use ACL's on our systems, so we do need them, I cannot just get rid of them. I normally set:

zfs set acltype=posixacl tank
zfs set xattr=sa tank

But I can confirm, the systems where I don't get input/output errors are the ones where I don't have acltype set.
Any clue's now how to fix?

PS. resetting acls does not work also:

# setfacl -b -R /tank/bad/video_analysis
setfacl: /tank/bad/video_analysis/uav_london-input-output-error: Input/output error
setfacl: /tank/bad/video_analysis/uav_london-input-output-error: Input/output error

@Ringdingcoder
Copy link

Ringdingcoder commented Jan 19, 2021

Cool, it is easily reproducible with your stream (receive into a pool/dataset with default settings => works / receive into a pool/dataset with acltype=posixacl => error).

With

filename:       /lib/modules/5.8.18-300.fc33.x86_64/extra/zfs/zfs/zfs.ko
version:        0.8.5-1

@shuther
Copy link

shuther commented Jan 19, 2021

On my dataset, the issue does "not" seem to be connected to this ACL, because I am using the same settings across the pool and only one dataset is impacted by such a corruption. ACL is the same (as it looks) for the good and bad files.
I will try to run a strace today if I can get anything (so we know if it is a similar or a different problem)

root@freenas:~ # zfs get xattr,aclmode,acltype voluz/media/music
NAME               PROPERTY  VALUE        SOURCE
voluz/media/music  xattr     off          inherited from voluz
voluz/media/music  aclmode   passthrough  inherited from voluz
voluz/media/music  acltype   nfsv4        default

@phiser678
Copy link
Author

phiser678 commented Jan 19, 2021

@shuther FreeBSD and Linux use different ACL systems. Indeed, previously we had nfsv4 ACL's on our datasets from FreeBSD, but when I transferred them to Linux, it was incompatible and I had to recreate the ACL's (posix) for the Linux system.
Notice for this bug, the fault is propagated when have the acltype on. So, this stream I added above will not do much on FreeBSD systems. Your should be very careful with systemrescue-zfs, since this is Linux based and has different ACL types. You should look for a FreeBSD rescue system instead.

My idea was to temporary disable acltype, then I can delete (or copy first) the error directory and put acltype back on again. Would it delete all the ACL's of the complete dataset with this operation?
Next is to try this on systemrescuecd-zfs which indeed has the 2.0 branch.

@phiser678
Copy link
Author

phiser678 commented Jan 19, 2021

I can confirm the bug persists in the 2.0 branch also!

root@sysrescue ~]# ls -l /pool/bad/
total 1
drwxrwx---+ 3 1081 605 3 Jan 18 18:24 video_analysis
[root@sysrescue ~]# ls -l /pool/bad/video_analysis/
ls: /pool/bad/video_analysis/uav_london-input-output-error: Input/output error
total 1
drwxr-xr-x 2 760 root 3 Jan 18 19:26 uav_london-input-output-error
[root@sysrescue ~]# ls -l /pool/bad/video_analysis/uav_london-input-output-error/
ls: /pool/bad/video_analysis/uav_london-input-output-error/: Input/output error
ls: cannot open directory '/pool/bad/video_analysis/uav_london-input-output-error/': Input/output error
[root@sysrescue ~]# modinfo zfs|grep version
version:        2.0.0-1
srcversion:     3A54AFFBC84534A6E7FF55C

Now, try to check if we lose previous ACL's when disabling posixacl:

root@ridzo:/pool/bad# echo ok >error
root@ridzo:/pool/bad# setfacl -m u:phiser678:rwx error 
root@ridzo:/pool/bad# ls -l
total 1
-rw-rwx---+ 1 root root 3 Jan 19 10:58 error
drwxrwx---+ 3 1081  605 3 Jan 18 19:24 video_analysis
root@ridzo:/pool/bad# getfacl error|grep user
user::rw-
user:phiser678:rwx
root@ridzo:/pool/bad# zfs set acltype=noacl pool/bad
root@ridzo:/pool/bad# getfacl error|grep user
user::rw-
root@ridzo:/pool/bad# cp -a video_analysis video_analysis-ok
root@ridzo:/pool/bad# zfs set acltype=posixacl pool/bad
root@ridzo:/pool/bad# getfacl error|grep user
user::rw-
user:phiser678:rwx
root@ridzo:/pool/bad# ls -l video_analysis-ok/uav_london-input-output-error/README.txt 
-rw-r--r-- 1 760 605 98 Apr  5  2020 video_analysis-ok/uav_london-input-output-error/README.txt

Now I try to delete the input-output error:

root@ridzo:/pool/bad# zfs set acltype=noacl pool/bad
root@ridzo:/pool/bad# rm -rf video_analysis
root@ridzo:/pool/bad# mv  video_analysis-ok  video_analysis
root@ridzo:/pool/bad# zfs set acltype=posixacl pool/bad
root@ridzo:/pool/bad# ls -l
total 1
-rw-rwx---+ 1 root root 3 Jan 19 10:58 error
drwxrwx---  3 1081  605 3 Jan 18 19:24 video_analysis
root@ridzo:/pool/bad# ls -l video_analysis/uav_london-input-output-error/README.txt 
-rw-r--r-- 1 760 605 98 Apr  5  2020 video_analysis/uav_london-input-output-error/README.txt

Works! All files recovered and ACL's retained. No more input-output errors.
Back to rock-solid ZFS again! :-)

@phiser678
Copy link
Author

This should be checked and send upstream. In case you missed the small stream in the lenghty report:
zfs-bad-input-output.zfs.gz

@maxximino
Copy link
Contributor

So, from the stream, I see that
/video_analysis/uav_london-input-output-error//system.posix_acl_access
exists, but it has size zero.
`

zdb -vvvvvv -ddddd brokenpool/brokenfs 1525568

Dataset brokenpool/brokenfs [ZPL], ID 389, cr_txg 11, 2.13M, 2074 objects, rootbp DVA[0]=<0:8048e00:200> DVA[1]=<0:18052a00:200> [L0 DMU objset] fletcher4 lz4 unencrypted LE contiguous unique double size=1000L/200P birth=614L/614P fill=2074 cksum=161558d1d1:6c21d6ed532:12cf12503a5c9:260b771f9953a4

Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type

1525568 1 128K 512 0 512 512 0.00 ZFS plain file (K=inherit) (Z=inherit=uncompressed)
168 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED
dnode maxblkid: 0
path /video_analysis/uav_london-input-output-error//system.posix_acl_access
uid 760
gid 605
atime Sun Apr 5 16:32:12 2020
mtime Sun Apr 5 19:48:15 2020
ctime Sun Apr 5 19:48:15 2020
crtime Sun Apr 5 16:32:12 2020
gen 10088303
mode 100644
size 0
parent 1525566
links 1
pflags 40800000005
Indirect blocks:
`
Speculation only by reading code on the web browser:
Looking at the code for ACLs, zpl_get_acl returns -EIO for an acl of size zero, but zpl_set_acl will save an acl of size zero if called with acl=NULL.
Looking at the kernel side, looks like that if userspace sets system.posix_acl_access to an empty value, we are going to receive a NULL acl pointer (which correctly isn't dereferenced, but writes on disk something that our read path doesn't accept).

This with the caveat that I didn't manage to reproduce from userspace, but didn't try very hard.

(apologies, but I will not be sending a pull request with a proposed fix)

@maxximino
Copy link
Contributor

maxximino commented Jan 19, 2021

Looks like that /video_analysis/uav_london-input-output-error//system.posix_acl_access exists, but has zero size.

# zdb -vvvvvv -ddddd brokenpool/brokenfs 1525568
Dataset brokenpool/brokenfs [ZPL], ID 389, cr_txg 11, 2.13M, 2074 objects, rootbp DVA[0]=<0:8048e00:200> DVA[1]=<0:18052a00:200> [L0 DMU objset] fletcher4 lz4 unencrypted LE contiguous unique double size=1000L/200P birth=614L/614P fill=2074 cksum=161558d1d1:6c21d6ed532:12cf12503a5c9:260b771f9953a4

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
   1525568    1   128K    512      0     512    512    0.00  ZFS plain file (K=inherit) (Z=inherit=uncompressed)
                                               168   bonus  System attributes
        dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED 
        dnode maxblkid: 0
        path    /video_analysis/uav_london-input-output-error/<xattrdir>/system.posix_acl_access
        uid     760
        gid     605
        atime   Sun Apr  5 16:32:12 2020
        mtime   Sun Apr  5 19:48:15 2020
        ctime   Sun Apr  5 19:48:15 2020
        crtime  Sun Apr  5 16:32:12 2020
        gen     10088303
        mode    100644
        size    0
        parent  1525566
        links   1
        pflags  40800000005
Indirect blocks:

Speculation by reading the code on the web browser:
zpl_set_acl if given a NULL pointer in the acl parameter, will happily write a zero-sized xattr.

if (acl) {
size = posix_acl_xattr_size(acl->a_count);
value = kmem_alloc(size, KM_SLEEP);
error = zpl_acl_to_xattr(acl, value, size);
if (error < 0) {
kmem_free(value, size);
return (error);
}
}
error = zpl_xattr_set(ip, name, value, size, 0);

The kernel seems also willing to pass such NULL pointer if userspace writes an empty value.
https://github.com/torvalds/linux/blob/fcadab740480e0e0e9fa9bd272acd409884d431a/fs/posix_acl.c#L860-L896
However, zpl_get_acl will return -EIO if it reads a zero-sized xattr :
} else {
acl = ERR_PTR(-EIO);
}

I couldn't manage to reproduce from userspace with purposely-wrong calls to setxattr or libacl, but tried only for a few minutes.

(apologies in advance, I will not be sending a pull request with a proposed fix)

@shuther
Copy link

shuther commented Jan 19, 2021

So I can confirm I am facing a different issue. Using SystemRescue+ZFS, I cam read partially a file (using cat - it is a picture), and I face the input/output error in the middle of the file (so not ACL related).
Screenshot 2021-01-19 at 15 00 02

Still nothing reported in dmesg or zpool status.

I tried also an strace using strace file xxx
Screenshot 2021-01-19 at 14 59 16

@phiser678
Copy link
Author

Thanks for pointing this out @maxximino ! The ACL is set like this and normally is inherited from the shared folder:

setfacl -m group:ipi:rwx -d -m group:ipi:rwx /shared

This is a 9TB dataset, this zero ACL only happened on the uav_londen directory. It looks like this is a very rare case then?

@behlendorf behlendorf added Component: Send/Recv "zfs send/recv" feature Type: Defect Incorrect behavior (e.g. crash, hang) labels Jan 19, 2021
@maxximino
Copy link
Contributor

As usual, the catch is just one layer below where I stopped looking.
sending the NULL on the value is the expected way to remove the xattr:

/* Remove a specific name xattr when value is set to NULL. */
if (value == NULL) {
if (xzp)
error = -zfs_remove(dxzp, (char *)name, cr, 0);
goto out;
}

... now I don't have anymore a clue of a around-the-ACLs code path that could mistakenly trigger this.

@stale
Copy link

stale bot commented Jan 20, 2022

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Jan 20, 2022
@stale stale bot closed this as completed Apr 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Send/Recv "zfs send/recv" feature Status: Stale No recent activity for issue Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

8 participants