-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
umount/zfs_iput_taskq deadlock #1988
Comments
I just updated a machine from 0.6.0 to 0.6.2 last night and now I'm seeing the same thing. Tried to "zfs umount -a" and it hung with zfs_iput_taskq pegging a cpu. The stack trace for the zfs umount is identical to the above. The stack trace for zfs_iput_taskq just says ffffff. I hadn't even really done much io to the pool. |
The umount stack indicates it's at:
Whilst the zfs_iput_taskq stack indicates it's at:
...where ZFS_ENTER() is defined as:
I.e. the umount is holding a write lock on z_teardown_lock and waiting for the iput_taskq, but the zfs_put_taskq is trying to grab a read lock on z_teardown_lock and that's as far as we get. @dechamps added that ZFS_ENTER() in @119a394a. @nedbass, @lutorm, it looks like that commit is for performance only, I think you should try reverting that commit and see if that helps. |
...except @119a394 isn't in the tagged version of 0.6.2. @lutorm, are you using the tagged version or something more recent which includes that commit? |
I'm using the vanilla 0.6.2 built from source. On Sun, Dec 29, 2013 at 8:46 PM, chrisrd notifications@github.com wrote:
|
@lutorm OK, in that case it seems you're hitting something different. Have you tried multiple times to get the stack from zfs_iput_taskq, and is it always just showing ffff? (I don't actually know what the ffff means.) Also, how long have you left it in that condition - is it possible zfs_iput_taskq is making progress but just taking a very long time to do something, and holding up the umount in the meantime? |
It's only happened once, so I haven't had any more opportunities to look at On Sun, Dec 29, 2013 at 9:15 PM, chrisrd notifications@github.com wrote:
|
It's unsafe to drain the iput taskq while holding the z_teardown_lock as a writer. This is because when the last reference on an inode is dropped it may still have pages which need to be written to disk. This will be done through zpl_writepages which will acquire the z_teardown_lock as a reader in ZFS_ENTER. Therefore, if we're holding the lock as a writer in zfs_sb_teardown the unmount will deadlock. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#1988
@chrisrd @lutorm You may still be right. A deadlock identical to the one you've described in master is still possible on 0.6.2 because
The most straight forward way to fix this is going to be to pull the |
Looks good to me |
It's unsafe to drain the iput taskq while holding the z_teardown_lock as a writer. This is because when the last reference on an inode is dropped it may still have pages which need to be written to disk. This will be done through zpl_writepages which will acquire the z_teardown_lock as a reader in ZFS_ENTER. Therefore, if we're holding the lock as a writer in zfs_sb_teardown the unmount will deadlock. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Dunlop <chris@onthe.net.au> Closes openzfs#1988
We need to hold the z_teardown_lock as a writer whilst draining the iput taskq so the taskq isn't being filled up again behind us. These changes address the issues raised in Issue openzfs#1988 and revert commit @fd23720
We need to hold the z_teardown_lock as a writer whilst draining the iput taskq so the taskq isn't being filled up again behind us. These changes address the issues raised in Issue openzfs#1988 and revert commit @fd23720
We need to hold the z_teardown_lock as a writer whilst draining the iput taskq so the taskq isn't being filled up again behind us. These changes revert commit @fd23720 and address the deadlock raised in issue openzfs#1988 by removing ZFS_ENTER from zfs_putpage and zpl_writepages. Also remove a redundant zil_commit from zpl_writepages. Signed-off-by: Chris Dunlop <chris@onthe.net.au> Closes openzfs#3281
An unmount process hung and appears possibly deadlocked with zfs_iput_taskq. FWIW this happened on a test filesystem after reproducing #1978 with current master (c2d439d).
The text was updated successfully, but these errors were encountered: