-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dedup + LXD leads to permanently hung tasks #7659
Comments
Now, we are two. Means it is in fact, exactly as you described. Doing the exact same with xenial+hwe in contrast, leads to extreme slowdown and from there to unusable anymore. SSD vs. HDD makes no difference, also. In bionic it is a nearly immediate please-kill-yourself. The situation on a customized-proxmox- and a vsido- -based setup is comparable. When i eventually will be able to narrow down the specific surroundings, i will add a detailed comment, too. @fwaggle, did you by chance experiment with different swap-settings/-types(zram/zcache/zswap) and/or limiting memory-reservation/caching-metadata-only and similar tunables? |
@claudiusraphaelpaeth I'm afraid I didn't do any checking on that, I didn't think to look at swap as once I got it to die rather quickly, the servers aren't even touching swap. I'm not sure where to start with tweaking ZFS tunables either... I didn't save the I'd definitely welcome any suggestions for things I can try though, since I can reproduce this outside of a production environment I can test just about anything. Unfortunately so far the only useful guidance has been "turn off dedup" which is certainly good advice (I can't reproduce the problem at all unless dedup is on, I'm convinced it goes away entirely) but not applicable for this exact use case. :( |
I got the same issue with Ubuntu 18.04. It is not clear whether it is an issue with the ZFS/SPL versions in the Ubuntu Linux kernel or whether it is an upstream issue. |
See also It suggests here to set Update: this workaround works. |
You may find #7693 interesting. |
I've opened #7693 with a proposed fix for this issue. @simos @claudiusraphaelpaeth @fwaggle if it all possible it would be very helpful if you could cherry-pick the fix and verify it resolves the deadlock your seeing. @fwaggle thanks for the detailed bug report, it had everything needed to get to the root cause. |
@behlendorf Thanks! I built ZFS 0.7.9 on Ubuntu Bionic (4.15.0-24-generic), verified I can reproduce the issue on the installed module (it hung after 1 container doing 5 in parallel), then applied the patch in #7693, reinstalled the module, and created 5x20 and 6x20 alpine containers in parallel without any issues a couple of times. Not super-rigorous testing, but I'm convinced. |
I can confirm this bug is fixed by #7693 in my situation too: previously I used old kernel (4.13) to avoid this bug. |
@DeadlyMercury we'll get it in our next point release. Canonical will have to decide if they want to apply it to the version they're shipping. |
I have installed I use I did not manage to get ZFS to crash/deadlock.
|
I continued testing and I managed to get a crash with ZFS (master: commit 2e5dc44).
|
@simos thanks for the additional testing and reporting your results. The hung tasks your now seeing don't appear to be related to ZFS based on the stack traces you posted. So the original issue does appear to be fixed and you've uncovered a new one, which looks to be related to cgroups. |
@behlendorf Thanks. I did a preliminary report at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1781601 and keep an eye on reproducing during more LXD stress-tests. |
this is to remind myself to check if my issue is solved with the mentioned commit |
Commit 93b43af inadvertently introduced the following scenario which can result in a deadlock. This issue was most easily reproduced by LXD containers using a ZFS storage backend but should be reproducible under any workload which is frequently mounting and unmounting. -- THREAD A -- spa_sync() spa_sync_upgrades() rrw_enter(&dp->dp_config_rwlock, RW_WRITER, FTAG); <- Waiting on B -- THREAD B -- mount_fs() zpl_mount() zpl_mount_impl() dmu_objset_hold() dmu_objset_hold_flags() dsl_pool_hold() dsl_pool_config_enter() rrw_enter(&dp->dp_config_rwlock, RW_READER, tag); sget() sget_userns() grab_super() down_write(&s->s_umount); <- Waiting on C -- THREAD C -- cleanup_mnt() deactivate_super() down_write(&s->s_umount); deactivate_locked_super() zpl_kill_sb() kill_anon_super() generic_shutdown_super() sync_filesystem() zpl_sync_fs() zfs_sync() zil_commit() txg_wait_synced() <- Waiting on A Reviewed by: Alek Pinchuk <apinchuk@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#7598 Closes openzfs#7659 Closes openzfs#7691 Closes openzfs#7693
Commit 93b43af inadvertently introduced the following scenario which can result in a deadlock. This issue was most easily reproduced by LXD containers using a ZFS storage backend but should be reproducible under any workload which is frequently mounting and unmounting. -- THREAD A -- spa_sync() spa_sync_upgrades() rrw_enter(&dp->dp_config_rwlock, RW_WRITER, FTAG); <- Waiting on B -- THREAD B -- mount_fs() zpl_mount() zpl_mount_impl() dmu_objset_hold() dmu_objset_hold_flags() dsl_pool_hold() dsl_pool_config_enter() rrw_enter(&dp->dp_config_rwlock, RW_READER, tag); sget() sget_userns() grab_super() down_write(&s->s_umount); <- Waiting on C -- THREAD C -- cleanup_mnt() deactivate_super() down_write(&s->s_umount); deactivate_locked_super() zpl_kill_sb() kill_anon_super() generic_shutdown_super() sync_filesystem() zpl_sync_fs() zfs_sync() zil_commit() txg_wait_synced() <- Waiting on A Reviewed by: Alek Pinchuk <apinchuk@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#7598 Closes openzfs#7659 Closes openzfs#7691 Closes openzfs#7693
Commit 93b43af inadvertently introduced the following scenario which can result in a deadlock. This issue was most easily reproduced by LXD containers using a ZFS storage backend but should be reproducible under any workload which is frequently mounting and unmounting. -- THREAD A -- spa_sync() spa_sync_upgrades() rrw_enter(&dp->dp_config_rwlock, RW_WRITER, FTAG); <- Waiting on B -- THREAD B -- mount_fs() zpl_mount() zpl_mount_impl() dmu_objset_hold() dmu_objset_hold_flags() dsl_pool_hold() dsl_pool_config_enter() rrw_enter(&dp->dp_config_rwlock, RW_READER, tag); sget() sget_userns() grab_super() down_write(&s->s_umount); <- Waiting on C -- THREAD C -- cleanup_mnt() deactivate_super() down_write(&s->s_umount); deactivate_locked_super() zpl_kill_sb() kill_anon_super() generic_shutdown_super() sync_filesystem() zpl_sync_fs() zfs_sync() zil_commit() txg_wait_synced() <- Waiting on A Reviewed by: Alek Pinchuk <apinchuk@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#7598 Closes openzfs#7659 Closes openzfs#7691 Closes openzfs#7693
Commit 93b43af inadvertently introduced the following scenario which can result in a deadlock. This issue was most easily reproduced by LXD containers using a ZFS storage backend but should be reproducible under any workload which is frequently mounting and unmounting. -- THREAD A -- spa_sync() spa_sync_upgrades() rrw_enter(&dp->dp_config_rwlock, RW_WRITER, FTAG); <- Waiting on B -- THREAD B -- mount_fs() zpl_mount() zpl_mount_impl() dmu_objset_hold() dmu_objset_hold_flags() dsl_pool_hold() dsl_pool_config_enter() rrw_enter(&dp->dp_config_rwlock, RW_READER, tag); sget() sget_userns() grab_super() down_write(&s->s_umount); <- Waiting on C -- THREAD C -- cleanup_mnt() deactivate_super() down_write(&s->s_umount); deactivate_locked_super() zpl_kill_sb() kill_anon_super() generic_shutdown_super() sync_filesystem() zpl_sync_fs() zfs_sync() zil_commit() txg_wait_synced() <- Waiting on A Reviewed by: Alek Pinchuk <apinchuk@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#7598 Closes openzfs#7659 Closes openzfs#7691 Closes openzfs#7693
System information
Issue is reproducible each time when starting an lxc container while /var/lib/lxc/rootfs is on a dedup enabled dataset. This could be related: lxc-3.0.2 is needed for lxd to run on 4.18 |
@fling- can you open a new issue with the offending stack traces so we can determine if this is a similar problem. It's not clear to me why dedup would matter here aside from the performance implications. |
I got this recently: This is on datastore4 log # zfs --version datastore4 log # uname -a |
System information
When using ZFS-backed LXC/LXD, hung tasks on concurrent container creation
We run into this very occasionally in production, and I managed to distill a test scenario down to guarantee I run into it.
With multiple concurrent LXD operations ongoing, ZFS will end up in a stuck state and block forever.
We can reproduce this on JonF's 0.7.9 builds as well. I can't seem to work out a scenario not involving LXC.
I spoke to some folks on Freenode about this issue, and the general consensus seems to be "turn dedup off", which we're investigating, but as it's so trivially reproducible for me I figured I'd file a report with all the data anyway.
Describe how to reproduce the problem
Test Environment:
VirtualBox VM: 8GB RAM, 4 vCPUs.
Host is i7-7700k, 16GB non-ECC, on Windows 10. Both vdisks are on a 7200rpm SATA.
10GB vDisk for Root Partition - sda
100GB vDisk for ZFS pool - sdb
Install Ubuntu, install LXD 3.1 from
snap
, initialize lxd with a zfs storage pool on sdb, then turn dedup on for the pool.No other configuration changes are made.
Run three or four concurrent jobs repeatedly running lxc launch, and quite quickly all disk activity will stop and two minutes later you'll get hung task messages on the console. I run four of these and it locks ZFS up very quickly:
After only a couple of successful container creations, everything will stop and the system will go completely idle. Any further attempts to interact with ZFS will block forever and a reboot is required.
The system never recovers, I left it overnight and it stayed stuck.
Include any warning/errors/backtraces from the system logs
perf top output:
Stuck processes after the event:
zpool status -D:
iostat -x for the zpool device:
The text was updated successfully, but these errors were encountered: