Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

umount/zfs_iput_taskq deadlock #1988

Closed
nedbass opened this issue Dec 18, 2013 · 8 comments
Closed

umount/zfs_iput_taskq deadlock #1988

nedbass opened this issue Dec 18, 2013 · 8 comments
Milestone

Comments

@nedbass
Copy link
Contributor

nedbass commented Dec 18, 2013

An unmount process hung and appears possibly deadlocked with zfs_iput_taskq. FWIW this happened on a test filesystem after reproducing #1978 with current master (c2d439d).

$  bass6@hoja ~ /dev/pts/1 Wed Dec 18 12:02:57  >
dmesg | grep seconds
INFO: task zfs_iput_taskq/:31738 blocked for more than 120 seconds.
INFO: task umount:15989 blocked for more than 120 seconds.
$  bass6@hoja ~ /dev/pts/1 Wed Dec 18 12:03:22  >
sudo cat /proc/31738/stack
[<ffffffffa04bb755>] cv_wait_common+0x105/0x1c0 [spl]
[<ffffffffa04bb865>] __cv_wait+0x15/0x20 [spl]
[<ffffffffa07b26db>] rrw_enter_read+0x7b/0x2d0 [zfs]
[<ffffffffa08484ae>] zpl_writepages+0x4e/0x1a0 [zfs]
[<ffffffff8112e121>] do_writepages+0x21/0x40
[<ffffffff811ac80d>] writeback_single_inode+0xdd/0x290
[<ffffffff811acae5>] write_inode_now+0x95/0x100
[<ffffffff8119c788>] generic_detach_inode+0x88/0x1f0
[<ffffffff8119d85d>] generic_drop_inode+0x1d/0x80
[<ffffffff8119c6f2>] iput+0x62/0x70
[<ffffffffa08058c7>] zfs_purgedir+0x1e7/0x290 [zfs]
[<ffffffffa0805a9e>] zfs_unlinked_drain+0x12e/0x190 [zfs]
[<ffffffffa04b4568>] taskq_thread+0x218/0x4b0 [spl]
[<ffffffff81096936>] kthread+0x96/0xa0
[<ffffffff8100c0ca>] child_rip+0xa/0x20
[<ffffffffffffffff>] 0xffffffffffffffff
$  bass6@hoja ~ /dev/pts/1 Wed Dec 18 12:03:24  >
sudo cat /proc/15989/stack
[<ffffffffa04b3df5>] taskq_wait_all+0x65/0xa0 [spl]
[<ffffffffa04b3e83>] taskq_wait+0x53/0xf0 [spl]
[<ffffffffa081dd21>] zfs_sb_teardown+0x61/0x580 [zfs]
[<ffffffffa081e289>] zfs_umount+0x29/0x160 [zfs]
[<ffffffffa084a3a2>] zpl_put_super+0x12/0x70 [zfs]
[<ffffffff811833ab>] generic_shutdown_super+0x5b/0xe0
[<ffffffff81183496>] kill_anon_super+0x16/0x60
[<ffffffffa084a04e>] zpl_kill_sb+0x1e/0x30 [zfs]
[<ffffffff81183c37>] deactivate_super+0x57/0x80
[<ffffffff811a1c8f>] mntput_no_expire+0xbf/0x110
[<ffffffff811a26fb>] sys_umount+0x7b/0x3a0
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
@lutorm
Copy link

lutorm commented Dec 28, 2013

I just updated a machine from 0.6.0 to 0.6.2 last night and now I'm seeing the same thing. Tried to "zfs umount -a" and it hung with zfs_iput_taskq pegging a cpu. The stack trace for the zfs umount is identical to the above. The stack trace for zfs_iput_taskq just says ffffff. I hadn't even really done much io to the pool.

@chrisrd
Copy link
Contributor

chrisrd commented Dec 30, 2013

The umount stack indicates it's at:

int     
zfs_sb_teardown(zfs_sb_t *zsb, boolean_t unmounting)
{       
        znode_t *zp;

        rrw_enter(&zsb->z_teardown_lock, RW_WRITER, FTAG);

        ...
        /*      
         * If someone has not already unmounted this file system,
         * drain the iput_taskq to ensure all active references to the
         * zfs_sb_t have been handled only then can it be safely destroyed.
         */
        if (zsb->z_os)
                taskq_wait(dsl_pool_iput_taskq(dmu_objset_pool(zsb->z_os)));

Whilst the zfs_iput_taskq stack indicates it's at:

static int
zpl_writepages(struct address_space *mapping, struct writeback_control *wbc)
{
        znode_t         *zp = ITOZ(mapping->host);
        zfs_sb_t        *zsb = ITOZSB(mapping->host);
        enum writeback_sync_modes sync_mode;
        int result;

        ZFS_ENTER(zsb);

...where ZFS_ENTER() is defined as:

/* Called on entry to each ZFS vnode and vfs operation  */
#define ZFS_ENTER(zsb) \
        { \
                rrw_enter_read(&(zsb)->z_teardown_lock, FTAG); \
                if ((zsb)->z_unmounted) { \
                        ZFS_EXIT(zsb); \
                        return (EIO); \
                } \
        }

I.e. the umount is holding a write lock on z_teardown_lock and waiting for the iput_taskq, but the zfs_put_taskq is trying to grab a read lock on z_teardown_lock and that's as far as we get.

@dechamps added that ZFS_ENTER() in @119a394a.

@nedbass, @lutorm, it looks like that commit is for performance only, I think you should try reverting that commit and see if that helps.

@chrisrd
Copy link
Contributor

chrisrd commented Dec 30, 2013

...except @119a394 isn't in the tagged version of 0.6.2. @lutorm, are you using the tagged version or something more recent which includes that commit?

@lutorm
Copy link

lutorm commented Dec 30, 2013

I'm using the vanilla 0.6.2 built from source.

On Sun, Dec 29, 2013 at 8:46 PM, chrisrd notifications@github.com wrote:

...except @119a394 119a394isn't in the tagged version of 0.6.2.
@lutorm https://github.com/lutorm, are you using the tagged version or
something more recent which includes that commit?


Reply to this email directly or view it on GitHubhttps://github.com//issues/1988#issuecomment-31335236
.

@chrisrd
Copy link
Contributor

chrisrd commented Dec 30, 2013

@lutorm OK, in that case it seems you're hitting something different. Have you tried multiple times to get the stack from zfs_iput_taskq, and is it always just showing ffff? (I don't actually know what the ffff means.) Also, how long have you left it in that condition - is it possible zfs_iput_taskq is making progress but just taking a very long time to do something, and holding up the umount in the meantime?

@lutorm
Copy link

lutorm commented Dec 30, 2013

It's only happened once, so I haven't had any more opportunities to look at
the stack trace. At that time, I left it a long time. As I recall, it was
about an hour, without any progress.

On Sun, Dec 29, 2013 at 9:15 PM, chrisrd notifications@github.com wrote:

@lutorm https://github.com/lutorm OK, in that case it seems you're
hitting something different. Have you tried multiple times to get the stack
from zfs_iput_taskq, and is it always just showing ffff? (I don't actually
know what the ffff means.) Also, how long have you left it in that
condition - is it possible zfs_iput_taskq is making progress but just
taking a very long time to do something, and holding up the umount in the
meantime?


Reply to this email directly or view it on GitHubhttps://github.com//issues/1988#issuecomment-31335691
.

behlendorf added a commit to behlendorf/zfs that referenced this issue Jan 8, 2014
It's unsafe to drain the iput taskq while holding the z_teardown_lock
as a writer.  This is because when the last reference on an inode is
dropped it may still have pages which need to be written to disk.
This will be done through zpl_writepages which will acquire the
z_teardown_lock as a reader in ZFS_ENTER.  Therefore, if we're
holding the lock as a writer in zfs_sb_teardown the unmount will
deadlock.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#1988
@behlendorf
Copy link
Contributor

@chrisrd @lutorm You may still be right. A deadlock identical to the one you've described in master is still possible on 0.6.2 because zfs_putpage() also calls ZFS_ENTER(). This code has been there forever so it's clearly an unlikely thing to have happen, but I don't see why it couldn't.

zfs_putpage (Calls ZFS_ENTER)
zpl_putpage,
write_cache_pages
zpl_writepages
...

The most straight forward way to fix this is going to be to pull the taskq_wait() out from under the write lock in zfs_sb_teardown. The Linux VFS will already have locked the super block since we're in an unmount so it's safe to do this sooner. I've opened pull request #2032 with the change. @chrisrd @nedbass it would be great if you guys could review this.

@chrisrd
Copy link
Contributor

chrisrd commented Jan 8, 2014

Looks good to me

ryao pushed a commit to ryao/zfs that referenced this issue Apr 9, 2014
It's unsafe to drain the iput taskq while holding the z_teardown_lock
as a writer.  This is because when the last reference on an inode is
dropped it may still have pages which need to be written to disk.
This will be done through zpl_writepages which will acquire the
z_teardown_lock as a reader in ZFS_ENTER.  Therefore, if we're
holding the lock as a writer in zfs_sb_teardown the unmount will
deadlock.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Closes openzfs#1988
chrisrd added a commit to chrisrd/zfs that referenced this issue Apr 10, 2015
We need to hold the z_teardown_lock as a writer whilst draining the iput
taskq so the taskq isn't being filled up again behind us.

These changes address the issues raised in Issue openzfs#1988 and revert commit
@fd23720
chrisrd added a commit to chrisrd/zfs that referenced this issue Apr 10, 2015
We need to hold the z_teardown_lock as a writer whilst draining the iput
taskq so the taskq isn't being filled up again behind us.

These changes address the issues raised in Issue openzfs#1988 and revert commit
@fd23720
chrisrd added a commit to chrisrd/zfs that referenced this issue Apr 13, 2015
We need to hold the z_teardown_lock as a writer whilst draining the iput
taskq so the taskq isn't being filled up again behind us.

These changes revert commit @fd23720 and address the deadlock raised in
issue openzfs#1988 by removing ZFS_ENTER from zfs_putpage and zpl_writepages.

Also remove a redundant zil_commit from zpl_writepages.

Signed-off-by: Chris Dunlop <chris@onthe.net.au>

Closes openzfs#3281
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants