-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.zfs/snapshot automount broken #3030
Comments
I've got the same issue, logging the same kernel message. I'm running a current Arch this time with Kernel 3.18.2-2 and - according to the zfs and spl package info - zfs master@d958324f and spl master@03a78353. I would guess it is not related to #2841 (comment) as I cannot access any snapshot. I would also guess it is not directly related to #1768 (comment) as it also fails on the first access of the directory directly after boot. The symptom looks like this from userspace:
|
In this case it seem to be a duplicate of #2841 as the snapshot did not disapear (I can clone, or send them), it just seem's to be a mount problem |
Same issue here. I confirmed the latest spl/zfs code works fine on a 3.14 kernel but produces the "ZFS: Unable to automount" error on 3.18 kernels. The error comes from the call_usermodehelper in zfs_ctldir.c ~line 850. A manual mount.zfs works fine for me mounting any snapshot--just the automount under .zfs/snapshots breaks. |
@mgmartin Thanks for the addeddetail, it sounds like this is an issue to 3.18 which will need to be investigated. |
I give 3.19.0-rc7 a try and this issue disappears |
Happens with 3.19.0-rc7 too. |
I get the same with 3.18.6, while it works with 3.17.8. So definitely 3.18. (zfs/spl 0.6.3-1.2) |
I was able to reproduce this as well on 3.18 but I haven't had a chance to investigate. It would be great if someone has the time to dig in to why the mounts are failing. |
Doing a git bisect lead me to kernel commit bafc9b754f752ea798c39f9b099a228fd56604e0. I hope it helps. Full bisect log is: Bisecting: a merge base must be tested
:040000 040000 8cc719dd130833db305075c8c6a717db54796dbe 07a430172c871cd7de5dc5f31cd3bb23baed30a0 M fs |
I'm seeing the same problem, running kernel 3.18.11 . |
FWIW, I applied the reverse patch of 3ccb354d641..c143c2333c4 from mainline (this contains the changeset which @janlam7 found with his bisection) to a Fedora 3.18.7 kernel, and it started working again. EDIT: Unfortunately, this patch does not apply cleanly on 3.19 anymore. |
Between 3.18 and >=3.19 the problem behaves differently, too:
|
@Ringdingcoder thanks for the hint, I applied c143c2333c4..3ccb354d641 on top of vanilla 3.18.11 and it works again. It's only a workaround (unless upstream does the same - doubtful) but it works. |
Commit torvalds/linux@bafc9b7 caused snapshots automounted by ZFS to be immediately unmounted when the dentry was revalidated. This was a consequence of ZFS invaliding all snapdir dentries to ensure that negative dentries didn't mask new snapshots. This patch modifies the behavior such that only negative dentries are invalidated. This solves the issue and may result in a performance improvement. In addition, the MNT_SHRINKABLE flag is now set after the automount succeeds by travesing down in to the snapshot. This is much cleaner than being forced to do it post in a getattr. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#3030
Thanks Brian! The mounts are working now in a 3.18 kernel I'm testing. However, I can cause a NPE and subsequent kernel panic by:
sorry no debug line numbers--still trying to figure out how to get more debug info. Hopefully, easy enough for you to re-produce. [ 155.476904] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
[ 155.477010] IP: [<ffffffffa0481c21>] zfsctl_mount_snapshot+0x51/0x450 [zfs]
[ 155.477010] PGD b8462067 PUD b841d067 PMD 0
[ 155.477010] Oops: 0000 [#1] SMP
[ 155.477010] Modules linked in: loop netconsole joydev mousedev hid_generic usbhid hid mac_hid cfg80211 rfkill crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel processor fbcon evdev aes_x86_64 psmouse lrw gf128mul serio_raw ppdev glue_helper pvpanic bitblit button ablk_helper cryptd microcode parport_pc parport pcspkr cirrus syscopyarea fbcon_rotate sysfillrect fbcon_ccw fbcon_ud fbcon_cw softcursor tileblit sysimgblt ttm drm_kms_helper drm i2c_piix4 fb fbdev intel_agp intel_gtt i2c_core sch_fq_codel zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO) xfs libcrc32c sd_mod virtio_balloon sr_mod cdrom virtio_scsi virtio_net ata_generic pata_acpi atkbd ata_piix libps2 libata ehci_pci scsi_mod uhci_hcd crc32c_intel ehci_hcd floppy usbcore virtio_pci virtio_ring usb_common virtio i8042 serio
[ 155.477010] CPU: 2 PID: 600 Comm: mount Tainted: P W O 3.18.10-mgm #2
[ 155.477010] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140617_173321-var-lib-archbuild-testing-x86_64-tobias 04/01/2014
[ 155.477010] task: ffff8800379c1a40 ti: ffff8800bb8fc000 task.ti: ffff8800bb8fc000
[ 155.477010] RIP: 0010:[<ffffffffa0481c21>] [<ffffffffa0481c21>] zfsctl_mount_snapshot+0x51/0x450 [zfs]
[ 155.477010] RSP: 0018:ffff8800bb8ffba8 EFLAGS: 00010246
[ 155.477010] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8800379c1a40
[ 155.477010] RDX: ffff88013946a2a0 RSI: ffffffffa04c5fa0 RDI: ffff8800bb8ffd50
[ 155.477010] RBP: ffff8800bb4aef00 R08: 0000000000000000 R09: ffff88013fc9a820
[ 155.477010] R10: ffffffffa03678df R11: ffffea0002d71500 R12: 0000000000000000
[ 155.477010] R13: 0000000000000000 R14: ffff8800bb8ffd50 R15: ffff8800bb4aef00
[ 155.477010] FS: 00007fda2a4e1780(0000) GS:ffff88013fc80000(0000) knlGS:0000000000000000
[ 155.477010] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 155.477010] CR2: 0000000000000028 CR3: 00000000b844a000 CR4: 00000000001006e0
[ 155.477010] Stack:
[ 155.477010] 000000000251a855 ffffffff81177997 0000000000000000 0000000000000000
[ 155.477010] ffff8800bb4aef00 0000000000000000 ffffffffa04dfc34 ffffffffa04dfc3c
[ 155.477010] ffff8800379e8000 ffffffff8117892e 0000000000000000 ffffffff8117a13d
[ 155.477010] Call Trace:
[ 155.477010] [<ffffffff81177997>] ? __d_instantiate+0x27/0xf0
[ 155.477010] [<ffffffff8117892e>] ? d_rehash+0x3e/0x50
[ 155.477010] [<ffffffff8117a13d>] ? d_splice_alias+0xcd/0x1c0
[ 155.477010] [<ffffffffa04b296b>] ? zpl_snapdir_automount+0xb/0x20 [zfs]
[ 155.477010] [<ffffffff8116ca30>] ? follow_managed+0x150/0x3a0
[ 155.477010] [<ffffffff8116ed6e>] ? lookup_slow+0x6e/0xb0
[ 155.477010] [<ffffffff8116fe43>] ? path_lookupat+0x713/0x850
[ 155.477010] [<ffffffff8117039f>] ? filename_lookup.isra.28+0x1f/0x70
[ 155.477010] [<ffffffff81172c86>] ? user_path_at_empty+0x56/0xc0
[ 155.477010] [<ffffffff8104889f>] ? kvm_clock_read+0x1f/0x30
[ 155.477010] [<ffffffff81015475>] ? sched_clock+0x5/0x10
[ 155.477010] [<ffffffff81167d8a>] ? vfs_fstatat+0x5a/0xc0
[ 155.477010] [<ffffffff81168430>] ? SyS_newlstat+0x20/0x50
[ 155.477010] [<ffffffff8101a265>] ? syscall_trace_enter_phase1+0x115/0x180
[ 155.477010] [<ffffffff8147bc65>] ? int_check_syscall_exit_work+0x34/0x3d
[ 155.477010] [<ffffffff8147ba09>] ? system_call_fastpath+0x12/0x17
[ 155.477010] Code: 00 00 48 89 84 24 88 00 00 00 31 c0 48 c7 44 24 30 34 fc 4d a0 48 c7 44 24 38 3c fc 4d a0 48 c7 44 24 28 00 00 00 00 4c 8b 6d 30 <49> 8b 45 28 48 c7 44 24 40 00 00 00 00 48 c7 44 24 48 00 00 00
[ 155.477010] RIP [<ffffffffa0481c21>] zfsctl_mount_snapshot+0x51/0x450 [zfs]
[ 155.477010] RSP <ffff8800bb8ffba8>
[ 155.477010] CR2: 0000000000000028
[ 155.495833] ---[ end trace e31e58e8003080e4 ]---
[ 155.497570] Kernel panic - not syncing: Fatal exception
[ 155.498557] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
[ 155.498557] drm_kms_helper: panic occurred, switching back to text console
[ 155.498557] Rebooting in 60 seconds..
|
Thanks for the quick test. I thought there might be some rough edges still to address. I didn't get a chance to do much testing. When I get some time I'll run down those panics. |
@behlendorf I tracked the issue down to the path->dentry->d_inode being null. The fault occurs when ITOZSB(ip) is called when ip is null . A quick fix was to put a check in zpl_ctldir.c/zpl_snapdir_automount to not call zfsctl_mount_snapshot if the inode is null. + if (path->dentry->d_inode == NULL)
+ return (NULL);
error = -zfsctl_mount_snapshot(path, 0); doing this spits out a "Too many levels of symbolic links error", but avoids the fault. I'm not very familiar with the code, so there's probably a cleaner way. |
@mgmartin returning EISDIR from zfsctl_mount_snapshot() should work as a stop gap for now. I'm working through a slightly cleaner fix now. |
Commit torvalds/linux@bafc9b7 caused snapshots automounted by ZFS to be immediately unmounted when the dentry was revalidated. This was a consequence of ZFS invaliding all snapdir dentries to ensure that negative dentries didn't mask new snapshots. This patch modifies the behavior such that only negative dentries are invalidated. This solves the issue and may result in a performance improvement. In addition, the MNT_SHRINKABLE flag is now set after the automount succeeds by travesing down in to the snapshot. This is much cleaner than being forced to do it post in a getattr. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#3030
Ok so the new kernel 3.18.14 came out and I tried to apply patch made from
Later of those two is interesting, and I gather that the bug it fixes could possibly be responsible for the issue reported here, and also possibly in #3257 . I will try to reproduce this issue on kernel 3.18.14 , also with patch #3344 and see how it works. Will report back later |
FWIW, above two commits belong to merge torvalds/linux@8f502d5 . Other (aside from 3.18.14) stable Linux version where his has been merged is 4.0.2 . |
So, it is not enough to simply upgrade to version 3.18.14 (from 3.18.13) and give up on patch
I will now try #3344 |
Ok so #3344 kind of works for me:
However, when I replaced
This is on vanilla kernel 3.18.14 , zfs version 0.6.4.1 with only #3344 applied and nothing else. I can easily reproduce this "Oops". Furthermore, on computer restart (after two "Oopses" like the above) I got the following
If I do not have any "Oopses" when running the computer (also after |
I tried troubleshooting this "Oops" and found this to be result of shell trying to access a non-existent subdirectory of snapshot directory. I am using grml-zsh which, among other things, tries to find either .svn or .git subdirectory of current and parent directories when building command prompt. It will probe for these two inside snapshot subdirectory I changed into and, if not found, it will next probe inside parent directory (ie. Minimal test to reproduce this is It is the same under kernel 4.0.4 (didn't try 4.0.2) with #3344 applied. Which is actually a progress compared to behaviour without this patch, since for all "normal" cases automounting works and there is no kernel panic. The case described above is a bit of a corner case. |
@Bronek the panic from accessing non-existent snapshots was seen and diagnosed earlier in these threads--see my comment and quick fix above posted on Apr 25 |
@mgmartin I was probaly confused. The system continues to operate, that is not what I'd call "kernel panic". It just kills user process and registers kernel bug (and then again leaked dentry on shutdown). I thought that is different to your experience. |
@Bronek no problem. I typically set my kernels to panic on oops and often interchange the two terms. |
I just installed a 4.0.5 kernel and the snapshots seem to automount again without problems. |
Commit torvalds/linux@bafc9b7 caused snapshots automounted by ZFS to be immediately unmounted when the dentry was revalidated. This was a consequence of ZFS invaliding all snapdir dentries to ensure that negative dentries didn't mask new snapshots. This patch modifies the behavior such that only negative dentries are invalidated. This solves the issue and may result in a performance improvement. In addition, the MNT_SHRINKABLE flag is now set after the automount succeeds by travesing down in to the snapshot. This is much cleaner than being forced to do it post in a getattr. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#3030
Commit torvalds/linux@bafc9b7 caused snapshots automounted by ZFS to be immediately unmounted when the dentry was revalidated. This was a consequence of ZFS invaliding all snapdir dentries to ensure that negative dentries didn't mask new snapshots. This patch modifies the behavior such that only negative dentries are invalidated. This solves the issue and may result in a performance improvement. In addition, the MNT_SHRINKABLE flag is now set after the automount succeeds by travesing down in to the snapshot. This is much cleaner than being forced to do it post in a getattr. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#3030
Re-factor the .zfs/snapshot auto-mouting code to take in to account changes made to the upstream kernels. And to lay the groundwork for enabling access to .zfs snapshots via NFS clients. This patch makes the following core improvements. * All actively auto-mounted snapshots are now tracked in two global trees which are indexed by snapshot name and objset id respectively. This allows for fast lookups of any auto-mounted snapshot regardless without needing access to the parent dataset. * Snapshot entries are added to the tree in zfsctl_snapshot_mount(). However, they are now removed from the tree in the context of the unmount process. This eliminates the need complicated error logic in zfsctl_snapshot_unmount() to handle unmount failures. * References are now taken on the snapshot entries in the tree to ensure they always remain valid while a task is outstanding. * The MNT_SHRINKABLE flag is set on the snapshot vfsmount_t right after the auto-mount succeeds. This allows to kernel to unmount idle auto-mounted snapshots if needed removing the need for the zfsctl_unmount_snapshots() function. * Snapshots in active use will not be automatically unmounted. As long as at least one dentry is revalidated every zfs_expire_snapshot/2 seconds the auto-unmount expiration timer will be extended. * Commit torvalds/linux@bafc9b7 caused snapshots auto-mounted by ZFS to be immediately unmounted when the dentry was revalidated. This was a consequence of ZFS invaliding all snapdir dentries to ensure that negative dentries didn't mask new snapshots. This patch modifies the behavior such that only negative dentries are invalidated. This solves the issue and may result in a performance improvement. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#3589 Issue openzfs#3344 Issue openzfs#3295 Issue openzfs#3257 Issue openzfs#3243 Issue openzfs#3030 Issue openzfs#2841
I've posted a complete fix in issue #3718 for this issue. It's been tested with 4.1 kernels but additional testing on a wider range of kernels would be very helpful. |
NICE! Working well for me after applying the diffs to the latest spl/zfs git on a 4.1.6 kernel. No issues looking for non-existent snapshots, and no issues mounting and looking through ~53 snapshots. Thanks! |
#3718 works for me, too, thanks @behlendorf ! |
Re-factor the .zfs/snapshot auto-mouting code to take in to account changes made to the upstream kernels. And to lay the groundwork for enabling access to .zfs snapshots via NFS clients. This patch makes the following core improvements. * All actively auto-mounted snapshots are now tracked in two global trees which are indexed by snapshot name and objset id respectively. This allows for fast lookups of any auto-mounted snapshot regardless without needing access to the parent dataset. * Snapshot entries are added to the tree in zfsctl_snapshot_mount(). However, they are now removed from the tree in the context of the unmount process. This eliminates the need complicated error logic in zfsctl_snapshot_unmount() to handle unmount failures. * References are now taken on the snapshot entries in the tree to ensure they always remain valid while a task is outstanding. * The MNT_SHRINKABLE flag is set on the snapshot vfsmount_t right after the auto-mount succeeds. This allows to kernel to unmount idle auto-mounted snapshots if needed removing the need for the zfsctl_unmount_snapshots() function. * Snapshots in active use will not be automatically unmounted. As long as at least one dentry is revalidated every zfs_expire_snapshot/2 seconds the auto-unmount expiration timer will be extended. * Commit torvalds/linux@bafc9b7 caused snapshots auto-mounted by ZFS to be immediately unmounted when the dentry was revalidated. This was a consequence of ZFS invaliding all snapdir dentries to ensure that negative dentries didn't mask new snapshots. This patch modifies the behavior such that only negative dentries are invalidated. This solves the issue and may result in a performance improvement. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#3589 Closes openzfs#3344 Closes openzfs#3295 Closes openzfs#3257 Closes openzfs#3243 Closes openzfs#3030 Closes openzfs#2841
Re-factor the .zfs/snapshot auto-mouting code to take in to account changes made to the upstream kernels. And to lay the groundwork for enabling access to .zfs snapshots via NFS clients. This patch makes the following core improvements. * All actively auto-mounted snapshots are now tracked in two global trees which are indexed by snapshot name and objset id respectively. This allows for fast lookups of any auto-mounted snapshot regardless without needing access to the parent dataset. * Snapshot entries are added to the tree in zfsctl_snapshot_mount(). However, they are now removed from the tree in the context of the unmount process. This eliminates the need complicated error logic in zfsctl_snapshot_unmount() to handle unmount failures. * References are now taken on the snapshot entries in the tree to ensure they always remain valid while a task is outstanding. * The MNT_SHRINKABLE flag is set on the snapshot vfsmount_t right after the auto-mount succeeds. This allows to kernel to unmount idle auto-mounted snapshots if needed removing the need for the zfsctl_unmount_snapshots() function. * Snapshots in active use will not be automatically unmounted. As long as at least one dentry is revalidated every zfs_expire_snapshot/2 seconds the auto-unmount expiration timer will be extended. * Commit torvalds/linux@bafc9b7 caused snapshots auto-mounted by ZFS to be immediately unmounted when the dentry was revalidated. This was a consequence of ZFS invaliding all snapdir dentries to ensure that negative dentries didn't mask new snapshots. This patch modifies the behavior such that only negative dentries are invalidated. This solves the issue and may result in a performance improvement. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#3589 Closes openzfs#3344 Closes openzfs#3295 Closes openzfs#3257 Closes openzfs#3243 Closes openzfs#3030 Closes openzfs#2841
Re-factor the .zfs/snapshot auto-mouting code to take in to account changes made to the upstream kernels. And to lay the groundwork for enabling access to .zfs snapshots via NFS clients. This patch makes the following core improvements. * All actively auto-mounted snapshots are now tracked in two global trees which are indexed by snapshot name and objset id respectively. This allows for fast lookups of any auto-mounted snapshot regardless without needing access to the parent dataset. * Snapshot entries are added to the tree in zfsctl_snapshot_mount(). However, they are now removed from the tree in the context of the unmount process. This eliminates the need complicated error logic in zfsctl_snapshot_unmount() to handle unmount failures. * References are now taken on the snapshot entries in the tree to ensure they always remain valid while a task is outstanding. * The MNT_SHRINKABLE flag is set on the snapshot vfsmount_t right after the auto-mount succeeds. This allows to kernel to unmount idle auto-mounted snapshots if needed removing the need for the zfsctl_unmount_snapshots() function. * Snapshots in active use will not be automatically unmounted. As long as at least one dentry is revalidated every zfs_expire_snapshot/2 seconds the auto-unmount expiration timer will be extended. * Commit torvalds/linux@bafc9b7 caused snapshots auto-mounted by ZFS to be immediately unmounted when the dentry was revalidated. This was a consequence of ZFS invaliding all snapdir dentries to ensure that negative dentries didn't mask new snapshots. This patch modifies the behavior such that only negative dentries are invalidated. This solves the issue and may result in a performance improvement. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#3589 Closes openzfs#3344 Closes openzfs#3295 Closes openzfs#3257 Closes openzfs#3243 Closes openzfs#3030 Closes openzfs#2841
Re-factor the .zfs/snapshot auto-mouting code to take in to account changes made to the upstream kernels. And to lay the groundwork for enabling access to .zfs snapshots via NFS clients. This patch makes the following core improvements. * All actively auto-mounted snapshots are now tracked in two global trees which are indexed by snapshot name and objset id respectively. This allows for fast lookups of any auto-mounted snapshot regardless without needing access to the parent dataset. * Snapshot entries are added to the tree in zfsctl_snapshot_mount(). However, they are now removed from the tree in the context of the unmount process. This eliminates the need complicated error logic in zfsctl_snapshot_unmount() to handle unmount failures. * References are now taken on the snapshot entries in the tree to ensure they always remain valid while a task is outstanding. * The MNT_SHRINKABLE flag is set on the snapshot vfsmount_t right after the auto-mount succeeds. This allows to kernel to unmount idle auto-mounted snapshots if needed removing the need for the zfsctl_unmount_snapshots() function. * Snapshots in active use will not be automatically unmounted. As long as at least one dentry is revalidated every zfs_expire_snapshot/2 seconds the auto-unmount expiration timer will be extended. * Commit torvalds/linux@bafc9b7 caused snapshots auto-mounted by ZFS to be immediately unmounted when the dentry was revalidated. This was a consequence of ZFS invaliding all snapdir dentries to ensure that negative dentries didn't mask new snapshots. This patch modifies the behavior such that only negative dentries are invalidated. This solves the issue and may result in a performance improvement. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#3589 Closes openzfs#3344 Closes openzfs#3295 Closes openzfs#3257 Closes openzfs#3243 Closes openzfs#3030 Closes openzfs#2841 Conflicts: config/kernel.m4 module/zfs/zfs_ctldir.c
Here is my setup :
kernel-3.18.3 vanilla
zfs master@b0cf0676c0beb5dcb149774a3264580a18304ac1
spl master@54cccfc2e30fa84463c056e8ad04b2be9448999e
When I try to access to a filesystem snapshot it's not working and I get a kernel message like
ZFS: Unable to automount rpool/backup/test@D16 at /rpool/backup/test/.zfs/snapshot/D16: 512
The text was updated successfully, but these errors were encountered: