-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PANIC / zfs receive hanging #11648
Comments
Possibly related? #9741 |
I'm getting the same panic with zfs-2.0.0-rc5 on Linux 3.10.0. |
that is very old now, try with a current release |
@beren12 That is the newest package that seems to be available for CentOS 7. I've built 2.0.4 from source now. We'll see how that works. |
@beren12 It's still happening with 2.0.4. |
Same here on Ubuntu 20.04.2 LTS ...
This occurred while sending about 25TB across three filesystems with about 125 snapshots across each from one pool to another on the same host ...
Interestingly, the zfs send continues, while the zfs recieve freezes (so where all that data is going is a bit of a mystery).
[EDIT: the following was requested]
|
i don't see any pool layout details for any of the affected users. can you share the output of |
Here is mine:
|
Here's mine:
|
I rebuilt zfs from git master (583e320), and I'm still having the problem. |
I've upgraded a server from our ZFS farm to Ubuntu 21.04 and see the same panic and subsequent hang ...
We see this panic daily during zfs receives from elsewhere in our ZFS farm. The zfs send / receive incantation we're using is ... zfs send -I tank0/DAT1@auto-20210524112633 tank1/DAT1@auto-20210528050201 | ssh backup1 zfs recv -F tank1/DAT1 Note: these are backup servers so we have primarycache=metadata throughout.
|
Provided root cause information and reproducer in #9741 |
In zfs_znode_alloc we always hash inodes. If the znode is unlinked, we do not need to hash it. This fixes the problem where zfs_suspend_fs is doing zrele (iput) in an async fashion, and zfs_resume_fs unlinked drain processing will try to hash an inode that could still be hashed, resulting in a panic. Fixes: openzfs#9741 Fixes: openzfs#11223 Fixes: openzfs#11648 Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
In zfs_znode_alloc we always hash inodes. If the znode is unlinked, we do not need to hash it. This fixes the problem where zfs_suspend_fs is doing zrele (iput) in an async fashion, and zfs_resume_fs unlinked drain processing will try to hash an inode that could still be hashed, resulting in a panic. Fixes: openzfs#9741 Fixes: openzfs#11223 Fixes: openzfs#11648 Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
In zfs_znode_alloc we always hash inodes. If the znode is unlinked, we do not need to hash it. This fixes the problem where zfs_suspend_fs is doing zrele (iput) in an async fashion, and zfs_resume_fs unlinked drain processing will try to hash an inode that could still be hashed, resulting in a panic. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alan Somers <asomers@gmail.com> Signed-off-by: Paul Zuchowski <pzuchowski@datto.com> Closes openzfs#9741 Closes openzfs#11223 Closes openzfs#11648 Closes openzfs#12210
In zfs_znode_alloc we always hash inodes. If the znode is unlinked, we do not need to hash it. This fixes the problem where zfs_suspend_fs is doing zrele (iput) in an async fashion, and zfs_resume_fs unlinked drain processing will try to hash an inode that could still be hashed, resulting in a panic. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alan Somers <asomers@gmail.com> Signed-off-by: Paul Zuchowski <pzuchowski@datto.com> Closes #9741 Closes #11223 Closes #11648 Closes #12210
In zfs_znode_alloc we always hash inodes. If the znode is unlinked, we do not need to hash it. This fixes the problem where zfs_suspend_fs is doing zrele (iput) in an async fashion, and zfs_resume_fs unlinked drain processing will try to hash an inode that could still be hashed, resulting in a panic. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alan Somers <asomers@gmail.com> Signed-off-by: Paul Zuchowski <pzuchowski@datto.com> Closes openzfs#9741 Closes openzfs#11223 Closes openzfs#11648 Closes openzfs#12210
In zfs_znode_alloc we always hash inodes. If the znode is unlinked, we do not need to hash it. This fixes the problem where zfs_suspend_fs is doing zrele (iput) in an async fashion, and zfs_resume_fs unlinked drain processing will try to hash an inode that could still be hashed, resulting in a panic. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alan Somers <asomers@gmail.com> Signed-off-by: Paul Zuchowski <pzuchowski@datto.com> Closes openzfs#9741 Closes openzfs#11223 Closes openzfs#11648 Closes openzfs#12210
I'm still getting the kernel panics even after the fix. |
Can you post stack traces of your panic? |
Yes:
|
Are you certain you're running with the fix? The insert_inode_locked() call is at line 618 of zfs_znode.c but this trace says line 612. Or maybe you have back-ported the fix and that's why the line numbers differ? |
I built ZFS from commit afa7b34. The modification time of the installed module looks like it was from that build. But I didn't do What should I do to ensure I'm actually running the correct version? |
In zfs_znode_alloc we always hash inodes. If the znode is unlinked, we do not need to hash it. This fixes the problem where zfs_suspend_fs is doing zrele (iput) in an async fashion, and zfs_resume_fs unlinked drain processing will try to hash an inode that could still be hashed, resulting in a panic. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alan Somers <asomers@gmail.com> Signed-off-by: Paul Zuchowski <pzuchowski@datto.com> Closes #9741 Closes #11223 Closes #11648 Closes #12210
System information
Describe the problem you're observing
zfs receive randomly hangs when receiving file system snapshots from another system. Once it hangs, the receive stops writing to disk until the SSH process that was used to send the snapshot is killed.
Below is the syntax of the receive command we're using:
Describe how to reproduce the problem
Problem occurs randomly. Not sure how to trigger it.
Additional details, the ZFS pool that is doing the receiving "tank", contains both file systems receiving snapshots (replication), along with active / in use file systems.
Include any warning/errors/backtraces from the system logs
The text was updated successfully, but these errors were encountered: