Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change rangelock handling in FreeBSD's zfs_getpages() #16643

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

markjdb
Copy link
Contributor

@markjdb markjdb commented Oct 13, 2024

Unconditionally hold the rangelock in zfs_getpages().

Motivation and Context

This is reportedly required for direct I/O support on FreeBSD.

Description

This change modifies zfs_getpages() to uncondtionally acquire the rangelock. To avoid a deadlock, we may need to drop a page busy lock and allocate a new page later on.

How Has This Been Tested?

Smoke testing on FreeBSD.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

mappedread_sf() may allocate pages; if it fails to populate a page
can't free it, it needs to ensure that it's placed into a page queue,
otherwise it can't be reclaimed until the vnode is destroyed.

I think this is quite unlikely to happen in practice, it was noticed by
code inspection.

Signed-off-by: Mark Johnston <markj@FreeBSD.org>
As a deadlock avoidance measure, zfs_getpages() would only try to
acquire a rangelock, falling back to a single-page read if this was not
possible.  However, this is incompatible with direct I/O.

Instead, release the busy lock before trying to acquire the rangelock in
blocking mode.  This means that it's possible for the page to be
replaced, so we have to re-lookup.

Signed-off-by: Mark Johnston <markj@FreeBSD.org>
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Oct 14, 2024
Comment on lines +3959 to +3967
vm_page_xunbusy(ma[0]);

lr = zfs_rangelock_enter(&zp->z_rangelock,
rounddown(start, blksz), len, RL_READER);

zfs_vmobject_wlock(object);
ma[0] = vm_page_grab(object, OFF_TO_IDX(start),
VM_ALLOC_NORMAL | VM_ALLOC_WAITOK | VM_ALLOC_ZERO);
zfs_vmobject_wunlock(object);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry for my possible ignorance in VM, but is count expected to be 1 here, or why else do we re-grab only ma[0] page here, not the rest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, we should handle the full array. In practice we always have count == 1 here because ZFS does not implement VOP_BMAP.

@behlendorf
Copy link
Contributor

I'm not familiar with the FreeBSD VM layer but it looks like the mmap_mixed test case manage to trip an assert.

From the "vm2: serial console" log, https://github.com/openzfs/zfs/actions/runs/11315921698/job/31476777580?pr=16643

  ZTS run /usr/local/share/zfs/zfs-tests/tests/functional/mmap/mmap_mixed
panic: VERIFY(vm_page_none_valid(m)) failed
  
  cpuid = 1
  time = 1728878112
  KDB: stack backtrace:
  #0 0xffffffff80b9002d at kdb_backtrace+0x5d
  #1 0xffffffff80b43132 at vpanic+0x132
  #2 0xffffffff82841b6a at spl_panic+0x3a
  #3 0xffffffff8284845b at dmu_read_pages+0x7fb
  #4 0xffffffff82865527 at zfs_freebsd_getpages+0x2e7
  #5 0xffffffff80eddf81 at vnode_pager_getpages+0x41
  #6 0xffffffff80ed45c2 at vm_pager_get_pages+0x22
  #7 0xffffffff80eb18df at vm_fault+0x5ef
  #8 0xffffffff80eb11db at vm_fault_trap+0x6b
  #9 0xffffffff8100ca39 at trap_pfault+0x1d9
  #10 0xffffffff8100bfb2 at trap+0x442
  #11 0xffffffff80fe3828 at calltrap+0x8

@markjdb
Copy link
Contributor Author

markjdb commented Oct 14, 2024

I'm not familiar with the FreeBSD VM layer but it looks like the mmap_mixed test case manage to trip an assert.

From the "vm2: serial console" log, https://github.com/openzfs/zfs/actions/runs/11315921698/job/31476777580?pr=16643

  ZTS run /usr/local/share/zfs/zfs-tests/tests/functional/mmap/mmap_mixed
panic: VERIFY(vm_page_none_valid(m)) failed
  
  cpuid = 1
  time = 1728878112
  KDB: stack backtrace:
  #0 0xffffffff80b9002d at kdb_backtrace+0x5d
  #1 0xffffffff80b43132 at vpanic+0x132
  #2 0xffffffff82841b6a at spl_panic+0x3a
  #3 0xffffffff8284845b at dmu_read_pages+0x7fb
  #4 0xffffffff82865527 at zfs_freebsd_getpages+0x2e7
  #5 0xffffffff80eddf81 at vnode_pager_getpages+0x41
  #6 0xffffffff80ed45c2 at vm_pager_get_pages+0x22
  #7 0xffffffff80eb18df at vm_fault+0x5ef
  #8 0xffffffff80eb11db at vm_fault_trap+0x6b
  #9 0xffffffff8100ca39 at trap_pfault+0x1d9
  #10 0xffffffff8100bfb2 at trap+0x442
  #11 0xffffffff80fe3828 at calltrap+0x8

Thanks, I think I see what's going on there. I didn't see that panic when I ran the ZTS locally, but if I understand the problem correctly, it's due to a race.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants