Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use vn_io_fault_uiomove on FreeBSD to avoid potential deadlock #10177

Merged
merged 1 commit into from
Apr 8, 2020

Commits on Apr 3, 2020

  1. Use vn_io_fault_uiomove on FreeBSD to avoid potential deadlock

    on page fault.
    
    The comment describing vn_io_fault_uiomove:
    /*
     * Helper function to perform the requested uiomove operation using
     * the held pages for io->uio_iov[0].iov_base buffer instead of
     * copyin/copyout.  Access to the pages with uiomove_fromphys()
     * instead of iov_base prevents page faults that could occur due to
     * pmap_collect() invalidating the mapping created by
     * vm_fault_quick_hold_pages(), or pageout daemon, page laundry or
     * object cleanup revoking the write access from page mappings.
     *
     * Filesystems specified MNTK_NO_IOPF shall use vn_io_fault_uiomove()
     * instead of plain uiomove().
     */
    
    This used for vn_io_fault which has the following motivation:
    /*
     * The vn_io_fault() is a wrapper around vn_read() and vn_write() to
     * prevent the following deadlock:
     *
     * Assume that the thread A reads from the vnode vp1 into userspace
     * buffer buf1 backed by the pages of vnode vp2.  If a page in buf1 is
     * currently not resident, then system ends up with the call chain
     *   vn_read() -> VOP_READ(vp1) -> uiomove() -> [Page Fault] ->
     *     vm_fault(buf1) -> vnode_pager_getpages(vp2) -> VOP_GETPAGES(vp2)
     * which establishes lock order vp1->vn_lock, then vp2->vn_lock.
     * If, at the same time, thread B reads from vnode vp2 into buffer buf2
     * backed by the pages of vnode vp1, and some page in buf2 is not
     * resident, we get a reversed order vp2->vn_lock, then vp1->vn_lock.
     *
     * To prevent the lock order reversal and deadlock, vn_io_fault() does
     * not allow page faults to happen during VOP_READ() or VOP_WRITE().
     * Instead, it first tries to do the whole range i/o with pagefaults
     * disabled. If all pages in the i/o buffer are resident and mapped,
     * VOP will succeed (ignoring the genuine filesystem errors).
     * Otherwise, we get back EFAULT, and vn_io_fault() falls back to do
     * i/o in chunks, with all pages in the chunk prefaulted and held
     * using vm_fault_quick_hold_pages().
     *
     * Filesystems using this deadlock avoidance scheme should use the
     * array of the held pages from uio, saved in the curthread->td_ma,
     * instead of doing uiomove().  A helper function
     * vn_io_fault_uiomove() converts uiomove request into
     * uiomove_fromphys() over td_ma array.
     *
     * Since vnode locks do not cover the whole i/o anymore, rangelocks
     * make the current i/o request atomic with respect to other i/os and
     * truncations.
     */
    
    Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
    mattmacy committed Apr 3, 2020
    Configuration menu
    Copy the full SHA
    dadb4cf View commit details
    Browse the repository at this point in the history