Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPLError: 5906:0:(zfs_znode.c:330:zfs_inode_set_ops()) SPL PANIC #709

Closed
MarkRidley123 opened this issue Apr 30, 2012 · 49 comments
Closed
Milestone

Comments

@MarkRidley123
Copy link

Hi,

I am testing rc8 inside ESX 5 on some fast Dell hardware. I've given it 15GB of memory.

I then rebooted the server and now the pool will not mount. This is in the log:

What do i need to do? Thanks.

[ 957.090546] BUG: unable to handle kernel NULL pointer dereference at 0000000000000490
[ 957.090925] IP: [] zfs_preumount+0x10/0x30 [zfs]
[ 957.091220] PGD 3ba908067 PUD 3aa869067 PMD 0
[ 957.091543] Oops: 0000 [#11] SMP
[ 957.091805] CPU 1
[ 957.091892] Modules linked in: lockd ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables zfs(PO) zcommon(PO) znvpair(PO) zavl(PO) zunicode(PO) spl(O) shpchp ppdev parport_pc parport i2c_piix4 i2c_core vmw_balloon vmxnet3 microcode sunrpc btrfs zlib_deflate libcrc32c vmw_pvscsi [last unloaded: scsi_wait_scan]
[ 957.094416]
[ 957.094543] Pid: 2097, comm: mount.zfs Tainted: P D O 3.3.2-1.fc16.x86_64 #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
[ 957.095116] RIP: 0010:[] [] zfs_preumount+0x10/0x30 [zfs]
[ 957.095528] RSP: 0018:ffff8803ba23dd08 EFLAGS: 00010296
[ 957.095728] RAX: 0000000000000000 RBX: ffff8803e9624800 RCX: 000000000000fba2
[ 957.095963] RDX: 000000000000fba1 RSI: ffff8803e8207000 RDI: 0000000000000000
[ 957.096198] RBP: ffff8803ba23dd08 R08: 00000000000165f0 R09: ffffea000fa08000
[ 957.096434] R10: ffffffffa016499b R11: 0000000000000000 R12: ffffffffa02cd4c0
[ 957.096670] R13: ffff8803ba23dd88 R14: fffffffffffffff0 R15: 0000000002000000
[ 957.096921] FS: 00007f3c0c238c00(0000) GS:ffff8803ffc80000(0000) knlGS:0000000000000000
[ 957.097228] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 957.097437] CR2: 0000000000000490 CR3: 00000003b29dd000 CR4: 00000000000006e0
[ 957.097676] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 957.097916] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 957.098152] Process mount.zfs (pid: 2097, threadinfo ffff8803ba23c000, task ffff8803e8bf2e60)
[ 957.098461] Stack:
[ 957.098600] ffff8803ba23dd28 ffffffffa02a37e6 00000a00ba23ddb8 ffff8803e9624800
[ 957.099089] ffff8803ba23dd48 ffffffff8118389c 00000000fffffff0 ffffffffa02a3830
[ 957.099578] ffff8803ba23dd78 ffffffff8118409b ffff8803ab732a00 ffffffffa02cd4c0
[ 957.100073] Call Trace:
[ 957.100258] [] zpl_kill_sb+0x16/0x30 [zfs]
[ 957.100476] [] deactivate_locked_super+0x3c/0xa0
[ 957.100733] [] ? zpl_mount+0x30/0x30 [zfs]
[ 957.100944] [] mount_nodev+0xab/0xb0
[ 957.101181] [] zpl_mount+0x25/0x30 [zfs][ 957.101388] [] mount_fs+0x43/0x1b0
[ 957.101586] [] vfs_kern_mount+0x6a/0xf0
[ 957.101791] [] do_kern_mount+0x54/0x110
[ 957.101996] [] do_mount+0x1a4/0x830
[ 957.102198] [] ? copy_mount_options+0x3a/0x170
[ 957.102417] [] sys_mount+0x90/0xe0
[ 957.102616] [] system_call_fastpath+0x16/0x1b
[ 957.102836] Code: 05 00 00 48 c7 c6 eb f1 2a a0 48 c7 c7 c0 ff 2b a0 e8 b5 9e ed ff e9 b8 fe ff ff 55 48 89 e5 66 66 66 66 90 48 8b bf f0 02 00 00 <48> 83 bf 90 04 00 00 00 74 05 e8 f1 cc fe ff 5d c3 66 66 66 66
[ 957.106671] RIP [] zfs_preumount+0x10/0x30 [zfs]
[ 957.106983] RSP
[ 957.107150] CR2: 0000000000000490

@MarkRidley123
Copy link
Author

Sorry. I gave some bad information then...after more testing

The problem is if I zfs umount -a
and then zfs mount -a

It needs to be rebooted before it will mount again.

This is reproducable every time.

@behlendorf
Copy link
Contributor

Argh, I really need to make a new tag. This was fixed post-rc8 so just grab the latest source from master or if your running Ubuntu the dialy PPA will have the fix.

@MarkRidley123
Copy link
Author

Hi Brian,
Thanks. That has fixed it.
Is there a way to speed up directory traversal?
I have about 200k files in ifs now and when I rsync to it, it is only checking about 100 files per second. XFS rattles through so fast you can't see the numbers change.
The server has 15GB of RAM. I have limited arc_max to 10GB in zfs.conf and arc_min to 5GB.
Are there any other settings I can try?
Thanks,
Mark

Date: Mon, 30 Apr 2012 10:43:17 -0700
From: reply@reply.github.com
To: mark_ridley@hotmail.com
Subject: Re: [zfs] rc8 pool will not mount (#709)

Argh, I really need to make a new tag. This was fixed post-rc8 so just grab the latest source from master or if your running Ubuntu the dialy PPA will have the fix.


Reply to this email directly or view it on GitHub:
#709 (comment)

@behlendorf
Copy link
Contributor

That's quite a bit slower than I'd expect, you files don't happen to have a lot of xattrs do they?

@MarkRidley123
Copy link
Author

Hi Brian,

Funny you should say that. They all have xattrs...thats how I am storing windows acls.

Any way of speeding it up?

Thanks.

-----Original Message-----

From: Brian Behlendorf
Sent: 2 May 2012 22:16:54 GMT
To: MarkRidley123
Subject: Re: [zfs] rc8 pool will not mount (#709)

That's quite a bit slower than I'd expect, you files don't happen to have a lot of xattrs do they?


Reply to this email directly or view it on GitHub:
#709 (comment)

@behlendorf
Copy link
Contributor

Xattrs are known to be quite slow with zfs because of how they stored on disk. This is basically true for all the various implementations, it's not specific to Linux.

However, your in luck because Lustre makes extensive use of xattrs we've done some work to optimize their performance in the Linux zfs port. If you set the xattr property on your dataset to sa instead of on then the xattr will be stored in the dnode which is much faster. There are two down sides to doing this however.

  1. This is still a Linux only feature, you won't be able to access the xattrs from other ZFS implementations. That's why this optimization is still off by default.

  2. You won't see any performance improvement until the files are recreated since this changes how they are stored on disk.

So you may want to enable the feature on a new dataset then rsync the files in to it. Then do a little testing and see how much improved the traversal time is. I'd be very interested in your before and after numbers,

zfs set xattr=sa tank/fish

@MarkRidley123
Copy link
Author

Thanks, Brian.

I'll recreate the filesystem and let you know the results...great product by the way.

-----Original Message-----

From: Brian Behlendorf
Sent: 2 May 2012 23:12:22 GMT
To: MarkRidley123
Subject: Re: [zfs] rc8 pool will not mount (#709)

Xattrs are known to be quite slow with zfs because of how they stored on disk. This is basically true for all the various implementations, it's not specific to Linux.

However, your in luck because Lustre makes extensive use of xattrs we've done some work to optimize their performance in the Linux zfs port. If you set the xattr property on your dataset to sa instead of on then the xattr will be stored in the dnode which is much faster. There are two down sides to doing this however.

  1. This is still a Linux only feature, you won't be able to access the xattrs from other ZFS implementations. That's why this optimization is still off by default.

  2. You won't see any performance improvement until the files are recreated since this changes how they are stored on disk.

So you may want to enable the feature on a new dataset then rsync the files in to it. Then do a little testing and see how much improved the traversal time is. I'd be very interested in your before and after numbers,

zfs set xattr=sa tank/fish

Reply to this email directly or view it on GitHub:
#709 (comment)

@MarkRidley123
Copy link
Author

Hi Brian,

All was going really well...until.

I had loaded up 1.4TB of 1.7 million files and then it stopped with:

May 3 20:52:36 bk568 kernel: ZFS: Invalid mode: 0xd276
May 3 20:52:36 bk568 kernel: VERIFY(0) failed
May 3 20:52:36 bk568 kernel: SPLError: 7703:0:(zfs_znode.c:330:zfs_inode_set_ops()) SPL PANIC
May 3 20:52:36 bk568 kernel: SPL: Showing stack for process 7703
May 3 20:52:36 bk568 kernel: Pid: 7703, comm: smbd Tainted: P ---------------- 2.6.32-220.13.1.el6.x8
6_64 #1
May 3 20:52:36 bk568 kernel: Call Trace:
May 3 20:52:36 bk568 kernel: [] ? spl_debug_dumpstack+0x27/0x40 [spl]
May 3 20:52:36 bk568 kernel: [] ? spl_debug_bug+0x81/0xd0 [spl]
May 3 20:52:36 bk568 kernel: [] ? zfs_znode_alloc+0x321/0x500 [zfs]
May 3 20:52:36 bk568 kernel: [] ? dmu_object_info_from_dnode+0x115/0x190 [zfs]
May 3 20:52:36 bk568 kernel: [] ? zfs_zget+0x151/0x1e0 [zfs]
May 3 20:52:36 bk568 kernel: [] ? zfs_dirent_lock+0x47c/0x540 [zfs]
May 3 20:52:36 bk568 kernel: [] ? zfs_dirlook+0x8b/0x270 [zfs]
May 3 20:52:36 bk568 kernel: [] ? tsd_exit+0x5f/0x1c0 [spl]
May 3 20:52:36 bk568 kernel: [] ? zfs_lookup+0x2ff/0x350 [zfs]
May 3 20:52:36 bk568 kernel: [] ? zpl_lookup+0x57/0xc0 [zfs]
May 3 20:52:36 bk568 kernel: [] ? do_lookup+0x16c/0x1e0
May 3 20:52:36 bk568 kernel: [] ? __link_path_walk+0x734/0x1030
May 3 20:52:36 bk568 kernel: [] ? path_walk+0x6a/0xe0
May 3 20:52:36 bk568 kernel: [] ? do_path_lookup+0x5b/0xa0
May 3 20:52:36 bk568 kernel: [] ? user_path_at+0x57/0xa0
May 3 20:52:36 bk568 kernel: [] ? locks_free_lock+0x43/0x60
May 3 20:52:36 bk568 kernel: [] ? fcntl_setlk+0x75/0x320
May 3 20:52:36 bk568 kernel: [] ? vfs_fstatat+0x46/0x80
May 3 20:52:36 bk568 kernel: [] ? vfs_stat+0x1b/0x20
May 3 20:52:36 bk568 kernel: [] ? sys_newstat+0x24/0x50
May 3 20:52:36 bk568 kernel: [] ? audit_syscall_entry+0x272/0x2a0
May 3 20:52:36 bk568 kernel: [] ? system_call_fastpath+0x16/0x1b
May 3 20:52:36 bk568 kernel: SPL: Dumping log to /tmp/spl-log.1336074756.7703

Spl.log has this at the end:
T�space_map.cspace_map_syncobject 149, txg 330199, pass 5, A, count 1, space 1f800
T�space_map.cspace_map_syncobject 147, txg 330199, pass 5, A, count 1, space 1f800
T�space_map.cspace_map_syncobject 152, txg 330199, pass 5, A, count 1, space 17000
T�space_map.cspace_map_syncobject 151, txg 330199, pass 6, A, count 1, space d000
T�space_map.cspace_map_syncobject 149, txg 330199, pass 6, A, count 1, space d000
T�space_map.cspace_map_syncobject 147, txg 330199, pass 6, A, count 1, space d000
T�txg.ctxg_sync_threadtxg=330200 quiesce_txg=330202 sync_txg=244627

Any ideas?

Thanks.

Mark

-----Original Message-----

From: Brian Behlendorf
Sent: 2 May 2012 23:12:22 GMT
To: MarkRidley123
Subject: Re: [zfs] rc8 pool will not mount (#709)

Xattrs are known to be quite slow with zfs because of how they stored on disk. This is basically true for all the various implementations, it's not specific to Linux.

However, your in luck because Lustre makes extensive use of xattrs we've done some work to optimize their performance in the Linux zfs port. If you set the xattr property on your dataset to sa instead of on then the xattr will be stored in the dnode which is much faster. There are two down sides to doing this however.

  1. This is still a Linux only feature, you won't be able to access the xattrs from other ZFS implementations. That's why this optimization is still off by default.

  2. You won't see any performance improvement until the files are recreated since this changes how they are stored on disk.

So you may want to enable the feature on a new dataset then rsync the files in to it. Then do a little testing and see how much improved the traversal time is. I'd be very interested in your before and after numbers,

zfs set xattr=sa tank/fish

Reply to this email directly or view it on GitHub:
#709 (comment)

@MarkRidley123
Copy link
Author

Hi Brian,

The pool cannot be read from now.

I had to pull the power out as it was stuck going down, unmounting filesystems.

When it comes back up the pool is there, but:

[root@bk568 D]# ls

Message from syslogd@bk568 at May 3 22:06:09 ...
kernel:VERIFY(0) failed

Message from syslogd@bk568 at May 3 22:06:09 ...
kernel:SPLError: 2407:0:(zfs_znode.c:330:zfs_inode_set_ops()) SPL PANIC

Is this fixable?

Thanks.

-----Original Message-----

From: Brian Behlendorf
Sent: 2 May 2012 23:12:22 GMT
To: MarkRidley123
Subject: Re: [zfs] rc8 pool will not mount (#709)

Xattrs are known to be quite slow with zfs because of how they stored on disk. This is basically true for all the various implementations, it's not specific to Linux.

However, your in luck because Lustre makes extensive use of xattrs we've done some work to optimize their performance in the Linux zfs port. If you set the xattr property on your dataset to sa instead of on then the xattr will be stored in the dnode which is much faster. There are two down sides to doing this however.

  1. This is still a Linux only feature, you won't be able to access the xattrs from other ZFS implementations. That's why this optimization is still off by default.

  2. You won't see any performance improvement until the files are recreated since this changes how they are stored on disk.

So you may want to enable the feature on a new dataset then rsync the files in to it. Then do a little testing and see how much improved the traversal time is. I'd be very interested in your before and after numbers,

zfs set xattr=sa tank/fish

Reply to this email directly or view it on GitHub:
#709 (comment)

@maxximino
Copy link
Contributor

have you tried to load the zfs module with the paramter zfs_recover=1 then import and scrub the pool?

@MarkRidley123
Copy link
Author

I'll give that a go, but do you know what has caused this and if it will repeatedly happen?

Would it be setting xattr=sa?

I have found the pool loads and other filesystems mount and can be read, it is just one filesystem that causes the kernel panic.

Thanks for your help and for the work you do on zfs. I would really like to use this in production rather than btrfs with only compression. Zfs dedup will save me loads of space.

Any logs or help you need, just ask.

-----Original Message-----

From: Massimo Maggi
Sent: 3 May 2012 17:05:31 GMT
To: MarkRidley123
Subject: Re: [zfs] rc8 pool will not mount (#709)

have you tried to load the zfs module with the paramter zfs_recover=1 then import and scrub the pool?


Reply to this email directly or view it on GitHub:
#709 (comment)

@behlendorf
Copy link
Contributor

The initial failure is interesting, basically you have triggered an assertion in the code because you tried to create a file with the mode set to (S_IFREG | S_IFDIR | S_IFIFO). That's just nonsense and I'm surprised the VFS allowed it. The assertion is there because I was fairly sure the VFS wouldn't permit this. However, since it does pass it through we'll have to do something with it... either return an errno or silently set the modes but to something reasonable.

For now you'll want to locate this file (and others like it) in your source filesystem and inspect it for other damage. Then set its permissions to something sane which will let you avoid this issue in the next rsync until we figure out the right thing to do with this.

Next you'll want to set the zil_replay_disable=1 module option to prevent attempting to replay this create from the zil when the pool is imported. That should get things back in shape for you.

@behlendorf behlendorf reopened this May 3, 2012
@MarkRidley123
Copy link
Author

Thanks, Brian.

Is there a 'find' command i can use to locate the files/directories that have this set?

I cannot get into the filesystem to find them anyway, so what steps do I need to go through to recover this?

From what I understand from you're email, I'll need to export it and then import with the zil_replay_disable option.

By the way where do I set that option?

Many thanks.

Mark

-----Original Message-----

From: Brian Behlendorf
Sent: 3 May 2012 18:45:38 GMT
To: MarkRidley123
Subject: Re: [zfs] rc8 pool will not mount (#709)

The initial failure is interesting, basically you have triggered an assertion in the code because you tried to create a file with the mode set to (S_IFREG | S_IFDIR | S_IFIFO). That's just nonsense and I'm surprised the VFS allowed it. The assertion is there because I was fairly sure the VFS wouldn't permit this. However, since it does pass it through we'll have to do something with it... either return an errno or silently set the modes but to something reasonable.

For now you'll want to locate this file (and others like it) in your source filesystem and inspect it for other damage. Then set its permissions to something sane which will let you avoid this issue in the next rsync until we figure out the right thing to do with this.

Next you'll want to set the zil_replay_disable=1 module option to prevent attempting to replay this create from the zil when the pool is imported. That should get things back in shape for you.


Reply to this email directly or view it on GitHub:
#709 (comment)

@MarkRidley123
Copy link
Author

By the way...the files were coming from rsync on an xfs system centos 6.2, through ssh to rsync on zfs centos 6.2

-----Original Message-----

From: Brian Behlendorf
Sent: 3 May 2012 18:45:38 GMT
To: MarkRidley123
Subject: Re: [zfs] rc8 pool will not mount (#709)

The initial failure is interesting, basically you have triggered an assertion in the code because you tried to create a file with the mode set to (S_IFREG | S_IFDIR | S_IFIFO). That's just nonsense and I'm surprised the VFS allowed it. The assertion is there because I was fairly sure the VFS wouldn't permit this. However, since it does pass it through we'll have to do something with it... either return an errno or silently set the modes but to something reasonable.

For now you'll want to locate this file (and others like it) in your source filesystem and inspect it for other damage. Then set its permissions to something sane which will let you avoid this issue in the next rsync until we figure out the right thing to do with this.

Next you'll want to set the zil_replay_disable=1 module option to prevent attempting to replay this create from the zil when the pool is imported. That should get things back in shape for you.


Reply to this email directly or view it on GitHub:
#709 (comment)

@MarkRidley123
Copy link
Author

Hi Brian,

Sorry - i set options zfs zil_replay_disable=1 in /etc/modprobe/zfs.conf, exported the pool, imported the pool
But it still crashes when I cd into that filesystem.

Any ideas?

Thanks.

-----Original Message-----

From: Brian Behlendorf
Sent: 3 May 2012 18:45:38 GMT
To: MarkRidley123
Subject: Re: [zfs] rc8 pool will not mount (#709)

The initial failure is interesting, basically you have triggered an assertion in the code because you tried to create a file with the mode set to (S_IFREG | S_IFDIR | S_IFIFO). That's just nonsense and I'm surprised the VFS allowed it. The assertion is there because I was fairly sure the VFS wouldn't permit this. However, since it does pass it through we'll have to do something with it... either return an errno or silently set the modes but to something reasonable.

For now you'll want to locate this file (and others like it) in your source filesystem and inspect it for other damage. Then set its permissions to something sane which will let you avoid this issue in the next rsync until we figure out the right thing to do with this.

Next you'll want to set the zil_replay_disable=1 module option to prevent attempting to replay this create from the zil when the pool is imported. That should get things back in shape for you.


Reply to this email directly or view it on GitHub:
#709 (comment)

@maxximino
Copy link
Contributor

You should also "rmmod zfs" after exporting the pool and reinsert it - or reboot.

@behlendorf
Copy link
Contributor

I'd start by running a find /mnt/xfs -type p which should identify any FIFO files. There probably aren't too many of them and by running stat <file> on these candidate files I'd hope you'ld be able to find it.

Alternately, you could try this quick patch behlendorf/zfs@0a6b2b4 should should convert the assertion in to just an error code back to rsync.

To use the zil_replay_disable option just add it to your /etc/modprobe.d/zfs.conf file and reboot. When the modules are reloaded the log with be discarded rather than replayed. Don't forget to remove this option afterwards.

@MarkRidley123
Copy link
Author

Hi,

I have set the zil_replay_disable=1 option, exported, rebooted, imported.
But when I go into the filesystem and 'ls' i still get:

ZFS: Invalid mode: 0xd276
VERIFY(0) failed
SPLError: 2686:0:(zfs_znode.c:330:zfs_inode_set_ops()) SPL PANIC
SPL: Showing stack for process 2686
Pid: 2686, comm: ls Tainted: P ---------------- 2.6.32-220.13.1.el6.x86_64 #1
Call Trace:
[] ? spl_debug_dumpstack+0x27/0x40 [spl]
[] ? spl_debug_bug+0x81/0xd0 [spl]
[] ? zfs_znode_alloc+0x321/0x500 [zfs]
[] ? dmu_object_info_from_dnode+0x115/0x190 [zfs]
[] ? zfs_zget+0x151/0x1e0 [zfs]
[] ? zfs_dirent_lock+0x47c/0x540 [zfs]
[] ? zfs_dirlook+0x8b/0x270 [zfs]
[] ? zfs_lookup+0x2ff/0x350 [zfs]
[] ? zpl_lookup+0x57/0xc0 [zfs]
[] ? do_lookup+0x16c/0x1e0
[] ? __link_path_walk+0x734/0x1030
[] ? path_walk+0x6a/0xe0
[] ? do_path_lookup+0x5b/0xa0
[] ? user_path_at+0x57/0xa0
[] ? cp_new_stat+0xe4/0x100
[] ? vfs_fstatat+0x46/0x80
[] ? vfs_lstat+0x1e/0x20

If instead i CD into a subdirectory of that filesystem, it works fine ,but i cannot ls the base of the filesystem.

I tried a rm * to delete anything but that cause a kernel panic too.

Any ideas?

I haven't applied the patch as I didn't think it would fix an already broken filesystem.

I left a zpool scrub working overnight and it said there were no errors.

Thanks.

-----Original Message-----

From: Brian Behlendorf
Sent: 3 May 2012 21:48:44 GMT
To: MarkRidley123
Subject: Re: [zfs] S_IFREG|S_IFDIR|S_IFIFO mode bits set (#709)

I'd start by running a find /mnt/xfs -type p which should identify any FIFO files. There probably aren't too many of them and by running stat <file> on these candidate files I'd hope you'ld be able to find it.

Alternately, you could try this quick patch behlendorf/zfs@0a6b2b4 should should convert the assertion in to just an error code back to rsync.

To use the zil_replay_disable option just add it to your /etc/modprobe.d/zfs.conf file and reboot. When the modules are reloaded the log with be discarded rather than replayed. Don't forget to remove this option afterwards.


Reply to this email directly or view it on GitHub:
#709 (comment)

@MarkRidley123
Copy link
Author

Some more information:

If i cd into the filesystem and do:
For i in *
Do
Echo $I
Done

That shows everything, but 'ls' crashes the kernel.

-----Original Message-----

From: Brian Behlendorf
Sent: 3 May 2012 21:48:44 GMT
To: MarkRidley123
Subject: Re: [zfs] S_IFREG|S_IFDIR|S_IFIFO mode bits set (#709)

I'd start by running a find /mnt/xfs -type p which should identify any FIFO files. There probably aren't too many of them and by running stat <file> on these candidate files I'd hope you'ld be able to find it.

Alternately, you could try this quick patch behlendorf/zfs@0a6b2b4 should should convert the assertion in to just an error code back to rsync.

To use the zil_replay_disable option just add it to your /etc/modprobe.d/zfs.conf file and reboot. When the modules are reloaded the log with be discarded rather than replayed. Don't forget to remove this option afterwards.


Reply to this email directly or view it on GitHub:
#709 (comment)

@MarkRidley123
Copy link
Author

I edited zfs_znode.c to add your changes in, butrc = zfs_inode_set_ops(zsb, ip); does not compile since rc is not defined.so i changed to int rc = zfs_inode_set_ops(zsb, ip);

this compiled but complained about declarations being different or something.
i ignored the warning, went into the filesystem and the whole kernel crashed and starting writing complete rubbish on the console.

Date: Thu, 3 May 2012 14:48:44 -0700
From: reply@reply.github.com
To: mark_ridley@hotmail.com
Subject: Re: [zfs] S_IFREG|S_IFDIR|S_IFIFO mode bits set (#709)

I'd start by running a find /mnt/xfs -type p which should identify any FIFO files. There probably aren't too many of them and by running stat <file> on these candidate files I'd hope you'ld be able to find it.

Alternately, you could try this quick patch behlendorf/zfs@0a6b2b4 should should convert the assertion in to just an error code back to rsync.

To use the zil_replay_disable option just add it to your /etc/modprobe.d/zfs.conf file and reboot. When the modules are reloaded the log with be discarded rather than replayed. Don't forget to remove this option afterwards.


Reply to this email directly or view it on GitHub:
#709 (comment)

@MarkRidley123
Copy link
Author

CC [M] /tmp/zfs-build-root-6bPEtzO1/BUILD/zfs-0.6.0/module/zfs/../../module/zfs/zfs_znode.o/tmp/zfs-build-root-6bPEtzO1/BUILD/zfs-0.6.0/module/zfs/../../module/zfs/zfs_znode.c: In function 'zfs_znode_alloc':/tmp/zfs-build-root-6bPEtzO1/BUILD/zfs-0.6.0/module/zfs/../../module/zfs/zfs_znode.c:401: error: 'rc' undeclared (first use in this function)/tmp/zfs-build-root-6bPEtzO1/BUILD/zfs-0.6.0/module/zfs/../../module/zfs/zfs_znode.c:401: error: (Each undeclared identifier is reported only once/tmp/zfs-build-root-6bPEtzO1/BUILD/zfs-0.6.0/module/zfs/../../module/zfs/zfs_znode.c:401: error: for each function it appears in.)

Date: Thu, 3 May 2012 14:48:44 -0700
From: reply@reply.github.com
To: mark_ridley@hotmail.com
Subject: Re: [zfs] S_IFREG|S_IFDIR|S_IFIFO mode bits set (#709)

I'd start by running a find /mnt/xfs -type p which should identify any FIFO files. There probably aren't too many of them and by running stat <file> on these candidate files I'd hope you'ld be able to find it.

Alternately, you could try this quick patch behlendorf/zfs@0a6b2b4 should should convert the assertion in to just an error code back to rsync.

To use the zil_replay_disable option just add it to your /etc/modprobe.d/zfs.conf file and reboot. When the modules are reloaded the log with be discarded rather than replayed. Don't forget to remove this option afterwards.


Reply to this email directly or view it on GitHub:
#709 (comment)

@MarkRidley123
Copy link
Author

I managed to get the filesystem to read.

The rsync with extended attributes is still very slow, i think slower than before.

But it looks like it is cached because if i kill the rsync it resumes very quickly...however at 100 files per second it is just too slow.

-----Original Message-----

From: Brian Behlendorf
Sent: 3 May 2012 21:48:44 GMT
To: MarkRidley123
Subject: Re: [zfs] S_IFREG|S_IFDIR|S_IFIFO mode bits set (#709)

I'd start by running a find /mnt/xfs -type p which should identify any FIFO files. There probably aren't too many of them and by running stat <file> on these candidate files I'd hope you'ld be able to find it.

Alternately, you could try this quick patch behlendorf/zfs@0a6b2b4 should should convert the assertion in to just an error code back to rsync.

To use the zil_replay_disable option just add it to your /etc/modprobe.d/zfs.conf file and reboot. When the modules are reloaded the log with be discarded rather than replayed. Don't forget to remove this option afterwards.


Reply to this email directly or view it on GitHub:
#709 (comment)

@MarkRidley123
Copy link
Author

Hi Brian,
I have done some more testing and found the slow part of sa xattr.
It is if the xattr is longer than a certain string.
if the xattr is simply "hello", sa xattr is very quick.
if the xattr is a couple of hundred characters then the speed is very slow.
Any ideas?
Thanks,
Mark

Date: Wed, 2 May 2012 16:12:22 -0700
From: reply@reply.github.com
To: mark_ridley@hotmail.com
Subject: Re: [zfs] rc8 pool will not mount (#709)

Xattrs are known to be quite slow with zfs because of how they stored on disk. This is basically true for all the various implementations, it's not specific to Linux.

However, your in luck because Lustre makes extensive use of xattrs we've done some work to optimize their performance in the Linux zfs port. If you set the xattr property on your dataset to sa instead of on then the xattr will be stored in the dnode which is much faster. There are two down sides to doing this however.

  1. This is still a Linux only feature, you won't be able to access the xattrs from other ZFS implementations. That's why this optimization is still off by default.

  2. You won't see any performance improvement until the files are recreated since this changes how they are stored on disk.

So you may want to enable the feature on a new dataset then rsync the files in to it. Then do a little testing and see how much improved the traversal time is. I'd be very interested in your before and after numbers,

zfs set xattr=sa tank/fish

Reply to this email directly or view it on GitHub:
#709 (comment)

@behlendorf
Copy link
Contributor

@MarkRidley123 Glad to hear you were able to get the pool readable. As for your performance observations I actually don't think that's too surprising. The way the SA xattrs work is that very small xattrs (<100 bytes) will be saved in the 512 byte dnode on disk. That means reading them is basically free since you've already paid the cost to read the dnode in to memory. For slightly larger xattrs in the 100b-64kb range they will be stored in what zfs calls a spill block and that will need to be read off disk resulting in an I/O operation, For xattrs larger than 64k they will get stored in the legacy xattr directory format and may be of arbitrary size but will be ever slower due to an additional disk I/O the lookup process.

So for your case 100/s may seem slow but I suspect that's largely being limited by your disk IOPs. You can check this by watching the output of iostat -x. I assume you only have a few drives in your pool?

@chrisrd
Copy link
Contributor

chrisrd commented May 11, 2012

@behlendorf Re my experience in openzfs/spl#112, there are definitely no bad modes in the source fs, and the "Invalid mode" message appeared at different times in different directories. It seems the underlying problem may be due to ZFS corrupting something on the way to the disk, rather than VFS simply allowing bad modes to be set/retrieved?

Of course ZFS itself should still be able to cope with bad modes as you've indicated.

@chrisrd
Copy link
Contributor

chrisrd commented May 13, 2012

Here we go, an easy reproducer, using either zpool export/import or a snapshot.

Export/import:

zfs destroy -r pool/test &&
zfs create -o xattr=sa pool/test &&
mkdir /pool/test/dir &&
setfattr -n user.foo -v foo /pool/test/dir &&
touch /pool/test/dir/file &&
setfattr -n user.foo -v foo /pool/test/dir/file &&
echo &&
find /pool/test/ -type s &&
echo "Initial: OK" &&
zpool export pool &&
zpool import -d /dev/mapper pool &&
find /pool/test/ -type s &&
echo "Re-import: OK"

Initial: OK

Message from syslogd@b5 at May 13 17:46:40 ...
 kernel:[  325.185106] SPLError: 11477:0:(zfs_znode.c:330:zfs_inode_set_ops()) ASSERTION(0) failed

Snapshot (after rebooting from the export/import test):

zfs destroy -r pool/test &&
zfs create -o xattr=sa pool/test &&
mkdir /pool/test/dir &&
setfattr -n user.foo -v foo /pool/test/dir &&
touch /pool/test/dir/file &&
setfattr -n user.foo -v foo /pool/test/dir/file &&
echo &&
find /pool/test/ -type s &&
echo "Initial: OK" &&
zfs snapshot pool/test@snap &&
mount -t zfs pool/test@snap /snap && 
find /snap/ -type s &&
echo "Snap: OK"

Initial: OK

Message from syslogd@b5 at May 13 17:54:39 ...
 kernel:[  266.736645] SPLError: 10553:0:(zfs_znode.c:330:zfs_inode_set_ops()) ASSERTION(0) failed

@chrisrd
Copy link
Contributor

chrisrd commented May 13, 2012

And further...

no xattr on dir, xattr on file => no fail
xattr on dir, no xattr on file => fail
xattr on dir, no file => no fail

OK, given it's now 6:30 Sunday evening, I think I'm done!

@behlendorf
Copy link
Contributor

Thanks for the simple test case, that should make it easier to run to ground. It also sounds like we're somehow properly creating the file/dir and xattrs in memory and then not flushing them to disk. The would account for while they remain good as long as they are in the cache. However, once try are dropped from the cache (import/export, or accessed via a snapshot) and need to be refetched from disk with zfs_zget() and we notice the damage.

I'm assuming you can't reproduce this with xattr=dir, correct? That may make some sense since the mode bits share the SA space with the xattr.

@chrisrd
Copy link
Contributor

chrisrd commented Jun 14, 2012

Correct, can't reproduce with xattr=dir. (Apologies for the delayed response, I didn't notice the question when I first saw your response.)

@chrisrd
Copy link
Contributor

chrisrd commented Jun 14, 2012

Confirmed still happening with openzfs/spl@e0093fe, 74497b7 on linux-3.3.8:

zfs destroy -r pool/test &&
zfs create -o xattr=sa pool/test &&
mkdir /pool/test/dir &&
setfattr -n user.foo -v foo /pool/test/dir &&
touch /pool/test/dir/file &&
setfattr -n user.foo -v foo /pool/test/dir/file &&
echo &&
find /pool/test/ -type s &&
echo "Initial: OK" &&
zfs snapshot pool/test@snap &&
mount -t zfs pool/test@snap /snap && 
find /snap/ -type s &&
echo "Snap: OK"

Initial: OK

Message from syslogd@b5 at Jun 14 17:01:42 ...
 kernel:[ 1108.780996] SPLError: 13008:0:(zfs_znode.c:330:zfs_inode_set_ops()) ASSERTION(0) failed

Message from syslogd@b5 at Jun 14 17:01:42 ...
 kernel:[ 1108.781076] SPLError: 13008:0:(zfs_znode.c:330:zfs_inode_set_ops()) SPL PANIC

And extracted from kern.log;

kernel: [ 1108.780952] ZFS: Invalid mode: 0x3
kernel: [ 1108.780996] SPLError: 13008:0:(zfs_znode.c:330:zfs_inode_set_ops()) ASSERTION(0) failed
kernel: [ 1108.781076] SPLError: 13008:0:(zfs_znode.c:330:zfs_inode_set_ops()) SPL PANIC
kernel: [ 1108.781127] SPL: Showing stack for process 13008
kernel: [ 1108.781170] Pid: 13008, comm: find Tainted: G           O 3.3.8-otn-00022-gfcc8626 #1
kernel: [ 1108.781248] Call Trace:
kernel: [ 1108.781291]  [<ffffffffa04124d7>] spl_debug_dumpstack+0x27/0x40 [spl]
kernel: [ 1108.781344]  [<ffffffffa0413872>] spl_debug_bug+0x82/0xe0 [spl]
kernel: [ 1108.781423]  [<ffffffffa057eb99>] zfs_znode_alloc+0x409/0x6c0 [zfs]
kernel: [ 1108.781497]  [<ffffffffa0581d8f>] zfs_zget+0x1ef/0x370 [zfs]
kernel: [ 1108.781567]  [<ffffffffa055af7a>] zfs_dirent_lock+0x54a/0x790 [zfs]
kernel: [ 1108.781640]  [<ffffffffa055b256>] zfs_dirlook+0x96/0x3a0 [zfs]
kernel: [ 1108.781710]  [<ffffffffa0557593>] ? zfs_zaccess+0xa3/0x270 [zfs]
kernel: [ 1108.781782]  [<ffffffffa05791a1>] zfs_lookup+0x2e1/0x330 [zfs]
kernel: [ 1108.781853]  [<ffffffffa0598b18>] zpl_lookup+0x58/0x120 [zfs]
kernel: [ 1108.781903]  [<ffffffff813ede3b>] ? _raw_spin_unlock+0x2b/0x40
kernel: [ 1108.781953]  [<ffffffff811569e5>] d_alloc_and_lookup+0x45/0x90
kernel: [ 1108.782002]  [<ffffffff81164435>] ? d_lookup+0x35/0x60
kernel: [ 1108.782048]  [<ffffffff811596c2>] do_lookup+0x2d2/0x3d0
kernel: [ 1108.782094]  [<ffffffff8115e4b0>] ? sys_ioctl+0x80/0x80
kernel: [ 1108.782140]  [<ffffffff8115a294>] path_lookupat+0x134/0x740
kernel: [ 1108.782189]  [<ffffffff8108068d>] ? sched_clock_cpu+0xbd/0x110
kernel: [ 1108.782239]  [<ffffffff8113ebad>] ? kmem_cache_alloc+0x2d/0x130
kernel: [ 1108.782287]  [<ffffffff81156e9b>] ? getname_flags+0x3b/0x280
kernel: [ 1108.782334]  [<ffffffff8115a8d1>] do_path_lookup+0x31/0xc0
kernel: [ 1108.782382]  [<ffffffff8115bc69>] user_path_at_empty+0x59/0xa0
kernel: [ 1108.782431]  [<ffffffff81096ddc>] ? lock_release_holdtime.part.22+0x1c/0x190
kernel: [ 1108.782484]  [<ffffffff8115bcc1>] user_path_at+0x11/0x20
kernel: [ 1108.782530]  [<ffffffff81150c4a>] vfs_fstatat+0x3a/0x70
kernel: [ 1108.782576]  [<ffffffff8116a934>] ? mntput+0x24/0x40
kernel: [ 1108.782623]  [<ffffffff8114dac3>] ? fput+0x173/0x220
kernel: [ 1108.782668]  [<ffffffff81150e7a>] sys_newfstatat+0x1a/0x40
kernel: [ 1108.782719]  [<ffffffff81208c39>] ? lockdep_sys_exit_thunk+0x35/0x67
kernel: [ 1108.782771]  [<ffffffff813f64e9>] system_call_fastpath+0x16/0x1b
kernel: [ 1108.783122] SPL: Dumping log to /tmp/spl-log.1339657302.13008

@behlendorf
Copy link
Contributor

It was determined this was caused by using a private SA handle. However, we still haven't produced an updated patch to address this.

behlendorf added a commit to behlendorf/zfs that referenced this issue Aug 25, 2012
This reverts commit ec2626a which
caused consistency problems between the shared and private handles.
Reverting this change should resolve issues openzfs#709 and openzfs#727.  It
will also reintroduce an arc_anon memory leak which is addressed
by the next commit.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#709
Issue openzfs#727
@dringdahl
Copy link

@behlendorf

The bug causing corruption of the mode bits may still exist in rc14. I apologize that I do not have the exact log messages as the SPL PANIC message was not logged to disk, the system was unresponsive and the message scrolled off the console before I could copy it by hand. However, this pool was created using rc14 and xattr=sa to begin with and later changed to xattr=dir. I was using rsync to copy data over the network from another pool attached to another system when the SPL PANIC occurred. The messages did include an error about an invalid mode.

I have since destroyed the pool and recreated it, again with rc14, but with xattr left at the default. If I experience any further issues, I will post back here.

Again, I apologize that I do not have the exact messages.

@behlendorf
Copy link
Contributor

@dringdahl Please keep us updated but this issue should be very much dead.

@dringdahl
Copy link

@behlendorf
If you could provide a patch to have zfs return the error up to rsync rather than lock zfs up, I am willing to turn xattr=sa back on. Without a patch, I have had problems with corruption of the pool due to zfs locking up. I have a quarter billion inodes in use on the zfs pool (with just a single fs) I am copying from so this is not a quick task. This last crash caused the pool to be unrecoverable so I lost a weeks worth of copying. I am having to make this copy because the source pool has a corrupt file that I can not remove. Any attempt to do so causes this same SPL PANIC. But in that case, the pool was created with an earlier release candidate of zfs on linux and had xattr=sa set from the start, so it very likely preceded the fix that has already been implemented.

I am only about a day into the copy this go around and have had xattr=on from the beginning on the destination pool. If it would help, I am willing to start over with xattr=sa assuming you can provide a patch to have the error returned to rsync rather than zfs locking up.

Please note that both source and destination pools have the following set:
checksum=sha256
compression=gzip-9
dedup=sha256

Please let me know what you would like me to do.

@behlendorf
Copy link
Contributor

@dringdahl Unfortunately, the reason this is fatal in the existing code is because there is currently no safe way to handle it. However, you can try the following patch in your local tree which will instead of panic'ing just treat the inode as a regular file.

diff --git a/module/zfs/zfs_znode.c b/module/zfs/zfs_znode.c
index 9bf26a7..2b78501 100644
--- a/module/zfs/zfs_znode.c
+++ b/module/zfs/zfs_znode.c
@@ -335,8 +335,15 @@ zfs_inode_set_ops(zfs_sb_t *zsb, struct inode *ip)
                break;

        default:
-               printk("ZFS: Invalid mode: 0x%x\n", ip->i_mode);
-               VERIFY(0);
+               /*
+                * FIXME: Unknown type but treat it as a regular file.
+                */
+               printk("ZFS: Invalid mode: 0x%x assuming file\n", ip->i_mode);
+               ip->i_mode |= S_IFREG;
+               ip->i_op = &zpl_inode_operations;
+               ip->i_fop = &zpl_file_operations;
+               ip->i_mapping->a_ops = &zpl_address_space_operations;
+               break;
        }
 }

@dringdahl
Copy link

@behlendorf
I was thinking it could return an error without locking all future operations on the pool. This would cause the rsync to either skip that file or to stop altogether. This would be preferable to me so I can then deal with the problem file and continue the copy later.

I would try to modify the code to do so but I am not familiar enough with it to be certain I would not introduce a memory leak or some other problem.

I'm also happy to have additional verbose debugging to help isolate where the issue is.

@behlendorf
Copy link
Contributor

@dringdahl That would be my preference as well, but unfortunately this particular call path was never written to handle an error at this point. That's a long standing bit of technical debt we inherited whichs need to address at some point. But that means that today there's no particularly easy way to handle this in a non-fatal way.

@dringdahl
Copy link

@behlendorf
I am restarting my copy using a freshly created pool:
zpool create -O checksum=sha256 -O compression=gzip-9 -O dedup=sha256 -O sync=disabled -O xattr=sa -m /home/backups backups zfs0-backups
I am using 0.6.1 without applying the patch you gave above. I am running CentOS 6.3 with kernel 2.6.32-279.14.1.el6.x86_64. I will let you know if I experience any SPL PANICs.

@stewartadam
Copy link

I tried skimming through the issue but I am unsure if there is a known solution or workaround. I am seeing this error when the pool experiences heavy activity.

My setup is Fedora 20 running kernel 3.12.5 and zfs/spl 0.6.2 (git as of 20131221). My xattr setting is "on" and not "sa". Is there anything I can do to mitigate the crash?

[174367.944841] ZFS: Invalid mode: 0x8c
[174367.944844] VERIFY(0) failed
[174367.944847] SPLError: 27068:0:(zfs_znode.c:340:zfs_inode_set_ops()) SPL PANIC
[174367.944848] SPL: Showing stack for process 27068
[174367.944850] CPU: 1 PID: 27068 Comm: chown Tainted: PF       W  O 3.12.5-303.fc20.x86_64 #1
[174367.944851] Hardware name: Supermicro X10SAE/X10SAE, BIOS 1.1 08/13/2013
[174367.944852]  ffff8803f052c000 ffff8801910c38e0 ffffffff81662d91 0000000000000000
[174367.944855]  ffff8801910c38f0 ffffffffa053c507 ffff8801910c3918 ffffffffa053d8ff
[174367.944856]  ffffffffa0551721 000000000000000a ffff88004b9acfd0 ffff8801910c3ad8
[174367.944858] Call Trace:
[174367.944864]  [<ffffffff81662d91>] dump_stack+0x45/0x56
[174367.944878]  [<ffffffffa053c507>] spl_debug_dumpstack+0x27/0x40 [spl]
[174367.944882]  [<ffffffffa053d8ff>] spl_debug_bug+0x7f/0xe0 [spl]
[174367.944907]  [<ffffffffa08a4b9c>] zfs_znode_alloc+0x43c/0x550 [zfs]
[174367.944924]  [<ffffffffa08a6e90>] zfs_zget+0x150/0x1f0 [zfs]
[174367.944933]  [<ffffffffa0000001>] ? video_get_max_state+0x1/0x2a [video]
[174367.944949]  [<ffffffffa0885d24>] zfs_dirent_lock+0x434/0x570 [zfs]
[174367.944964]  [<ffffffffa0885ee1>] zfs_dirlook+0x81/0x370 [zfs]
[174367.944974]  [<ffffffffa0820d1e>] ? dmu_object_size_from_db+0x5e/0x80 [zfs]
[174367.944990]  [<ffffffffa089b885>] zfs_lookup+0x2d5/0x320 [zfs]
[174367.945005]  [<ffffffffa08b59fb>] zpl_lookup+0x6b/0xd0 [zfs]
[174367.945019]  [<ffffffff811b7ffd>] lookup_real+0x1d/0x50
[174367.945021]  [<ffffffff811b8923>] __lookup_hash+0x33/0x40
[174367.945023]  [<ffffffff8166175a>] lookup_slow+0x42/0xa7
[174367.945025]  [<ffffffff811baf28>] path_lookupat+0x718/0x7a0
[174367.945028]  [<ffffffff811948e5>] ? kmem_cache_alloc+0x35/0x200
[174367.945030]  [<ffffffff811b990f>] ? getname_flags+0x4f/0x190
[174367.945031]  [<ffffffff811bafdb>] filename_lookup+0x2b/0xc0
[174367.945034]  [<ffffffff811bec94>] user_path_at_empty+0x54/0x90
[174367.945038]  [<ffffffff81166600>] ? list_lru_add+0x90/0xe0
[174367.945041]  [<ffffffff811cc46e>] ? mntput_no_expire+0x3e/0x120
[174367.945042]  [<ffffffff811bece1>] user_path_at+0x11/0x20
[174367.945045]  [<ffffffff811ad3f8>] SyS_fchownat+0x68/0x120
[174367.945047]  [<ffffffff81671de9>] system_call_fastpath+0x16/0x1b

@stewartadam
Copy link

I'm happy with my current performance, I'd just like to workaround the kernel oops. Would xattr=sa also do that? From what I read above, my understanding is that xattr=sa triggers the bug more often...

@chrisrd
Copy link
Contributor

chrisrd commented Jan 1, 2014

@firewing1 Brian's patch above should let you avoid the ASSERT, but I'd enhance the patch to include the inode number to make it easier to find the affected file[s], e.g.:

diff --git a/module/zfs/zfs_znode.c b/module/zfs/zfs_znode.c
index abf6222..86b6ce2 100644
--- a/module/zfs/zfs_znode.c
+++ b/module/zfs/zfs_znode.c
@@ -336,8 +336,15 @@ zfs_inode_set_ops(zfs_sb_t *zsb, struct inode *ip)
        break;

    default:
-       printk("ZFS: Invalid mode: 0x%x\n", ip->i_mode);
-       VERIFY(0);
+       /*
+        * FIXME: Unknown type but treat it as a regular file.
+        */
+       printk("ZFS: Inode %ld has invalid mode 0x%x, assuming file\n", ip->i_ino, ip->i_mode);
+       ip->i_mode |= S_IFREG;
+       ip->i_op = &zpl_inode_operations;
+       ip->i_fop = &zpl_file_operations;
+       ip->i_mapping->a_ops = &zpl_address_space_operations;
+       break;
    }
 }

With that patch installed you'd then try to find all the affected files, e.g. ls -lR your_filesystem > /dev/null, then inspect your kern.log for the inode numbers and find the specific files, e.g. find your_filesystem -ino the_inode_number and decide what to do, e.g. if it seems it was actually a standard file you could possibly just leave it, otherwise delete it and recreate it. Even for standard files I'd be tempted to delete and recreate if possible as the corruption may extend further than just the file mode.

You say you see the problem under heavy activity. Can you recreate it without the heavy activity, and is it consistently on specific files? E.g. with nothing else running on the filesystem, do a ls -lR your_filesystem > /dev/null. If the problem is consistent on specific files and unrelated to load, and it's surviving a reboot (or a remount) it means you unfortunately have corrupt data on disk. If the problem isn't consistently on specific files and/or is only seen under load then it sounds like you're hitting a whole new bug.

The original bug was caused by xattr=sa. It doesn't seem likely that setting xattr=sa will help avoid the problem. Is it possible this filesystem ever had xattr=sa turned on between the original problem being introduced and the fix being applied? The original problem was committed 2012-03-02 and the fix was committed 2012-08-23, but of course it depends on when you might have picked up the code.

@stewartadam
Copy link

@chrisrd thanks for the patch - I've applied it and listed all files on the filesystem, a few have shown and fortunately I can easily replace them.

I will try and do some more testing to see if I can trigger the issue reliably, but so far it hasn't happened during regular use (mainly serving up media files via Plex and acting as a user /home). I encountered the errors while copying several GB of data and (in parallel) issuing a chown -R on large folders but I think the chown is the culprit, not the large data transfers. I've copied ~8TB of data to the server so far and it never showed any problems during the file transfers except for when I was also doing chown.

This is a brand new pool as of mid-December, so no chance of being bitten by the old code.

@chrisrd
Copy link
Contributor

chrisrd commented Jan 2, 2014

@firewing1 You may be right about the chown: see #1978. ...except that only affects xattr=sa file systems and it sounds like you're not using that - can you confirm this either way?

If the fs has ever had xattr=sa set you may have run into the same issue, especially with your suspicions about chown. Does your code include @5d862cb0d9a4b6dcc97a88fa0d5a7a717566e5ab? If not, that's very likely the problem. However you've said you're running git as of 20131221 and that commit is dated 20131219,

And, if the fs has never had xattr=sa then it seems we're looking at a new problem :-(

@stewartadam
Copy link

@chrisrd Correct, I have xattr=on and have never enabled xattr=sa.

That commit has been included in my builds, so this is likely a new bug. I've reported it as issue #2021.

pcd1193182 pushed a commit to pcd1193182/zfs that referenced this issue Sep 26, 2023
The zettacache is supposed to limit the number of pending changes to a
percentage of system memory.  However, the logic is reversed, so
typically all PendingChanges::update()'s are dropped if there isn't
already a PendingChange.  This applies to accesses of blocks that are
already in the cache, where we should create an AtimeUpdate, but instead
we ignore it.

This commit addresses the issue by using the helper method, which has
the correct logic.

Caused-by: openzfs#479
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants
@stewartadam @behlendorf @maxximino @chrisrd @MarkRidley123 @dringdahl and others