Change rangelock handling in FreeBSD's zfs_getpages() #16643

markjdb · 2024-10-13T16:22:25Z

Unconditionally hold the rangelock in zfs_getpages().

Motivation and Context

This is reportedly required for direct I/O support on FreeBSD.

Description

This change modifies zfs_getpages() to uncondtionally acquire the rangelock. To avoid a deadlock, we may need to drop a page busy lock and allocate a new page later on.

How Has This Been Tested?

Smoke testing on FreeBSD.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

mappedread_sf() may allocate pages; if it fails to populate a page can't free it, it needs to ensure that it's placed into a page queue, otherwise it can't be reclaimed until the vnode is destroyed. I think this is quite unlikely to happen in practice, it was noticed by code inspection. Signed-off-by: Mark Johnston <markj@FreeBSD.org>

As a deadlock avoidance measure, zfs_getpages() would only try to acquire a rangelock, falling back to a single-page read if this was not possible. However, this is incompatible with direct I/O. Instead, release the busy lock before trying to acquire the rangelock in blocking mode. This means that it's possible for the page to be replaced, so we have to re-lookup. Signed-off-by: Mark Johnston <markj@FreeBSD.org>

amotin · 2024-10-14T16:35:43Z

module/os/freebsd/zfs/zfs_vnops_os.c

+			vm_page_xunbusy(ma[0]);
+
+			lr = zfs_rangelock_enter(&zp->z_rangelock,
+			    rounddown(start, blksz), len, RL_READER);
+
+			zfs_vmobject_wlock(object);
+			ma[0] = vm_page_grab(object, OFF_TO_IDX(start),
+			    VM_ALLOC_NORMAL | VM_ALLOC_WAITOK | VM_ALLOC_ZERO);
+			zfs_vmobject_wunlock(object);


I'm sorry for my possible ignorance in VM, but is count expected to be 1 here, or why else do we re-grab only ma[0] page here, not the rest?

You're right, we should handle the full array. In practice we always have count == 1 here because ZFS does not implement VOP_BMAP.

behlendorf · 2024-10-14T16:55:17Z

I'm not familiar with the FreeBSD VM layer but it looks like the mmap_mixed test case manage to trip an assert.

From the "vm2: serial console" log, https://github.com/openzfs/zfs/actions/runs/11315921698/job/31476777580?pr=16643

  ZTS run /usr/local/share/zfs/zfs-tests/tests/functional/mmap/mmap_mixed
panic: VERIFY(vm_page_none_valid(m)) failed
  
  cpuid = 1
  time = 1728878112
  KDB: stack backtrace:
  #0 0xffffffff80b9002d at kdb_backtrace+0x5d
  #1 0xffffffff80b43132 at vpanic+0x132
  #2 0xffffffff82841b6a at spl_panic+0x3a
  #3 0xffffffff8284845b at dmu_read_pages+0x7fb
  #4 0xffffffff82865527 at zfs_freebsd_getpages+0x2e7
  #5 0xffffffff80eddf81 at vnode_pager_getpages+0x41
  #6 0xffffffff80ed45c2 at vm_pager_get_pages+0x22
  #7 0xffffffff80eb18df at vm_fault+0x5ef
  #8 0xffffffff80eb11db at vm_fault_trap+0x6b
  #9 0xffffffff8100ca39 at trap_pfault+0x1d9
  #10 0xffffffff8100bfb2 at trap+0x442
  #11 0xffffffff80fe3828 at calltrap+0x8

markjdb · 2024-10-14T17:23:01Z

I'm not familiar with the FreeBSD VM layer but it looks like the mmap_mixed test case manage to trip an assert.

From the "vm2: serial console" log, https://github.com/openzfs/zfs/actions/runs/11315921698/job/31476777580?pr=16643

  ZTS run /usr/local/share/zfs/zfs-tests/tests/functional/mmap/mmap_mixed
panic: VERIFY(vm_page_none_valid(m)) failed
  
  cpuid = 1
  time = 1728878112
  KDB: stack backtrace:
  #0 0xffffffff80b9002d at kdb_backtrace+0x5d
  #1 0xffffffff80b43132 at vpanic+0x132
  #2 0xffffffff82841b6a at spl_panic+0x3a
  #3 0xffffffff8284845b at dmu_read_pages+0x7fb
  #4 0xffffffff82865527 at zfs_freebsd_getpages+0x2e7
  #5 0xffffffff80eddf81 at vnode_pager_getpages+0x41
  #6 0xffffffff80ed45c2 at vm_pager_get_pages+0x22
  #7 0xffffffff80eb18df at vm_fault+0x5ef
  #8 0xffffffff80eb11db at vm_fault_trap+0x6b
  #9 0xffffffff8100ca39 at trap_pfault+0x1d9
  #10 0xffffffff8100bfb2 at trap+0x442
  #11 0xffffffff80fe3828 at calltrap+0x8

Thanks, I think I see what's going on there. I didn't see that panic when I ran the ZTS locally, but if I understand the problem correctly, it's due to a race.

markjdb added 2 commits October 13, 2024 16:13

behlendorf added the Status: Code Review Needed Ready for review and testing label Oct 14, 2024

behlendorf requested review from behlendorf and bwatkinson October 14, 2024 02:30

amotin reviewed Oct 14, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change rangelock handling in FreeBSD's zfs_getpages() #16643

Change rangelock handling in FreeBSD's zfs_getpages() #16643

markjdb commented Oct 13, 2024

amotin Oct 14, 2024

markjdb Oct 14, 2024

behlendorf commented Oct 14, 2024

markjdb commented Oct 14, 2024

Change rangelock handling in FreeBSD's zfs_getpages() #16643

Are you sure you want to change the base?

Change rangelock handling in FreeBSD's zfs_getpages() #16643

Conversation

markjdb commented Oct 13, 2024

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

amotin Oct 14, 2024

Choose a reason for hiding this comment

markjdb Oct 14, 2024

Choose a reason for hiding this comment

behlendorf commented Oct 14, 2024

markjdb commented Oct 14, 2024