umount/zfs_iput_taskq deadlock #1988

nedbass · 2013-12-18T20:31:38Z

An unmount process hung and appears possibly deadlocked with zfs_iput_taskq. FWIW this happened on a test filesystem after reproducing #1978 with current master (c2d439d).

$  bass6@hoja ~ /dev/pts/1 Wed Dec 18 12:02:57  >
dmesg | grep seconds
INFO: task zfs_iput_taskq/:31738 blocked for more than 120 seconds.
INFO: task umount:15989 blocked for more than 120 seconds.
$  bass6@hoja ~ /dev/pts/1 Wed Dec 18 12:03:22  >
sudo cat /proc/31738/stack
[<ffffffffa04bb755>] cv_wait_common+0x105/0x1c0 [spl]
[<ffffffffa04bb865>] __cv_wait+0x15/0x20 [spl]
[<ffffffffa07b26db>] rrw_enter_read+0x7b/0x2d0 [zfs]
[<ffffffffa08484ae>] zpl_writepages+0x4e/0x1a0 [zfs]
[<ffffffff8112e121>] do_writepages+0x21/0x40
[<ffffffff811ac80d>] writeback_single_inode+0xdd/0x290
[<ffffffff811acae5>] write_inode_now+0x95/0x100
[<ffffffff8119c788>] generic_detach_inode+0x88/0x1f0
[<ffffffff8119d85d>] generic_drop_inode+0x1d/0x80
[<ffffffff8119c6f2>] iput+0x62/0x70
[<ffffffffa08058c7>] zfs_purgedir+0x1e7/0x290 [zfs]
[<ffffffffa0805a9e>] zfs_unlinked_drain+0x12e/0x190 [zfs]
[<ffffffffa04b4568>] taskq_thread+0x218/0x4b0 [spl]
[<ffffffff81096936>] kthread+0x96/0xa0
[<ffffffff8100c0ca>] child_rip+0xa/0x20
[<ffffffffffffffff>] 0xffffffffffffffff
$  bass6@hoja ~ /dev/pts/1 Wed Dec 18 12:03:24  >
sudo cat /proc/15989/stack
[<ffffffffa04b3df5>] taskq_wait_all+0x65/0xa0 [spl]
[<ffffffffa04b3e83>] taskq_wait+0x53/0xf0 [spl]
[<ffffffffa081dd21>] zfs_sb_teardown+0x61/0x580 [zfs]
[<ffffffffa081e289>] zfs_umount+0x29/0x160 [zfs]
[<ffffffffa084a3a2>] zpl_put_super+0x12/0x70 [zfs]
[<ffffffff811833ab>] generic_shutdown_super+0x5b/0xe0
[<ffffffff81183496>] kill_anon_super+0x16/0x60
[<ffffffffa084a04e>] zpl_kill_sb+0x1e/0x30 [zfs]
[<ffffffff81183c37>] deactivate_super+0x57/0x80
[<ffffffff811a1c8f>] mntput_no_expire+0xbf/0x110
[<ffffffff811a26fb>] sys_umount+0x7b/0x3a0
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

The text was updated successfully, but these errors were encountered:

lutorm · 2013-12-28T20:43:50Z

I just updated a machine from 0.6.0 to 0.6.2 last night and now I'm seeing the same thing. Tried to "zfs umount -a" and it hung with zfs_iput_taskq pegging a cpu. The stack trace for the zfs umount is identical to the above. The stack trace for zfs_iput_taskq just says ffffff. I hadn't even really done much io to the pool.

chrisrd · 2013-12-30T06:41:14Z

The umount stack indicates it's at:

int     
zfs_sb_teardown(zfs_sb_t *zsb, boolean_t unmounting)
{       
        znode_t *zp;

        rrw_enter(&zsb->z_teardown_lock, RW_WRITER, FTAG);

        ...
        /*      
         * If someone has not already unmounted this file system,
         * drain the iput_taskq to ensure all active references to the
         * zfs_sb_t have been handled only then can it be safely destroyed.
         */
        if (zsb->z_os)
                taskq_wait(dsl_pool_iput_taskq(dmu_objset_pool(zsb->z_os)));

Whilst the zfs_iput_taskq stack indicates it's at:

static int
zpl_writepages(struct address_space *mapping, struct writeback_control *wbc)
{
        znode_t         *zp = ITOZ(mapping->host);
        zfs_sb_t        *zsb = ITOZSB(mapping->host);
        enum writeback_sync_modes sync_mode;
        int result;

        ZFS_ENTER(zsb);

...where ZFS_ENTER() is defined as:

/* Called on entry to each ZFS vnode and vfs operation  */
#define ZFS_ENTER(zsb) \
        { \
                rrw_enter_read(&(zsb)->z_teardown_lock, FTAG); \
                if ((zsb)->z_unmounted) { \
                        ZFS_EXIT(zsb); \
                        return (EIO); \
                } \
        }

I.e. the umount is holding a write lock on z_teardown_lock and waiting for the iput_taskq, but the zfs_put_taskq is trying to grab a read lock on z_teardown_lock and that's as far as we get.

@dechamps added that ZFS_ENTER() in @119a394a.

@nedbass, @lutorm, it looks like that commit is for performance only, I think you should try reverting that commit and see if that helps.

chrisrd · 2013-12-30T06:45:49Z

...except @119a394 isn't in the tagged version of 0.6.2. @lutorm, are you using the tagged version or something more recent which includes that commit?

lutorm · 2013-12-30T06:56:41Z

I'm using the vanilla 0.6.2 built from source.

On Sun, Dec 29, 2013 at 8:46 PM, chrisrd notifications@github.com wrote:

...except @119a394 119a394isn't in the tagged version of 0.6.2.
@lutorm https://github.com/lutorm, are you using the tagged version or
something more recent which includes that commit?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/1988#issuecomment-31335236
.

chrisrd · 2013-12-30T07:15:23Z

@lutorm OK, in that case it seems you're hitting something different. Have you tried multiple times to get the stack from zfs_iput_taskq, and is it always just showing ffff? (I don't actually know what the ffff means.) Also, how long have you left it in that condition - is it possible zfs_iput_taskq is making progress but just taking a very long time to do something, and holding up the umount in the meantime?

lutorm · 2013-12-30T07:24:41Z

It's only happened once, so I haven't had any more opportunities to look at
the stack trace. At that time, I left it a long time. As I recall, it was
about an hour, without any progress.

On Sun, Dec 29, 2013 at 9:15 PM, chrisrd notifications@github.com wrote:

@lutorm https://github.com/lutorm OK, in that case it seems you're
hitting something different. Have you tried multiple times to get the stack
from zfs_iput_taskq, and is it always just showing ffff? (I don't actually
know what the ffff means.) Also, how long have you left it in that
condition - is it possible zfs_iput_taskq is making progress but just
taking a very long time to do something, and holding up the umount in the
meantime?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/1988#issuecomment-31335691
.

It's unsafe to drain the iput taskq while holding the z_teardown_lock as a writer. This is because when the last reference on an inode is dropped it may still have pages which need to be written to disk. This will be done through zpl_writepages which will acquire the z_teardown_lock as a reader in ZFS_ENTER. Therefore, if we're holding the lock as a writer in zfs_sb_teardown the unmount will deadlock. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#1988

behlendorf · 2014-01-08T18:44:36Z

@chrisrd @lutorm You may still be right. A deadlock identical to the one you've described in master is still possible on 0.6.2 because zfs_putpage() also calls ZFS_ENTER(). This code has been there forever so it's clearly an unlikely thing to have happen, but I don't see why it couldn't.

zfs_putpage (Calls ZFS_ENTER)
zpl_putpage,
write_cache_pages
zpl_writepages
...

The most straight forward way to fix this is going to be to pull the taskq_wait() out from under the write lock in zfs_sb_teardown. The Linux VFS will already have locked the super block since we're in an unmount so it's safe to do this sooner. I've opened pull request #2032 with the change. @chrisrd @nedbass it would be great if you guys could review this.

chrisrd · 2014-01-08T22:17:55Z

Looks good to me

It's unsafe to drain the iput taskq while holding the z_teardown_lock as a writer. This is because when the last reference on an inode is dropped it may still have pages which need to be written to disk. This will be done through zpl_writepages which will acquire the z_teardown_lock as a reader in ZFS_ENTER. Therefore, if we're holding the lock as a writer in zfs_sb_teardown the unmount will deadlock. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Dunlop <chris@onthe.net.au> Closes openzfs#1988

We need to hold the z_teardown_lock as a writer whilst draining the iput taskq so the taskq isn't being filled up again behind us. These changes address the issues raised in Issue openzfs#1988 and revert commit @fd23720

We need to hold the z_teardown_lock as a writer whilst draining the iput taskq so the taskq isn't being filled up again behind us. These changes revert commit @fd23720 and address the deadlock raised in issue openzfs#1988 by removing ZFS_ENTER from zfs_putpage and zpl_writepages. Also remove a redundant zil_commit from zpl_writepages. Signed-off-by: Chris Dunlop <chris@onthe.net.au> Closes openzfs#3281

behlendorf mentioned this issue Jan 8, 2014

Drain iput taskq outside z_teardown_lock #2032

Closed

behlendorf closed this as completed in fd23720 Jan 10, 2014

chrisrd mentioned this issue Apr 13, 2015

PANIC on umount due to iput taskq #3281

Closed

chrisrd mentioned this issue Apr 13, 2015

Drain iput taskq inside z_teardown_lock #3282

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

umount/zfs_iput_taskq deadlock #1988

umount/zfs_iput_taskq deadlock #1988

nedbass commented Dec 18, 2013

lutorm commented Dec 28, 2013

chrisrd commented Dec 30, 2013

chrisrd commented Dec 30, 2013

lutorm commented Dec 30, 2013

chrisrd commented Dec 30, 2013

lutorm commented Dec 30, 2013

behlendorf commented Jan 8, 2014

chrisrd commented Jan 8, 2014

umount/zfs_iput_taskq deadlock #1988

umount/zfs_iput_taskq deadlock #1988

Comments

nedbass commented Dec 18, 2013

lutorm commented Dec 28, 2013

chrisrd commented Dec 30, 2013

chrisrd commented Dec 30, 2013

lutorm commented Dec 30, 2013

chrisrd commented Dec 30, 2013

lutorm commented Dec 30, 2013

behlendorf commented Jan 8, 2014

chrisrd commented Jan 8, 2014