Skip to content

Commit

Permalink
Direct IO support
Browse files Browse the repository at this point in the history
Direct IO via the O_DIRECT flag was originally introduced in XFS by
IRIX for database workloads. Its purpose was to allow the database
to bypass the page and buffer caches to prevent unnecessary IO
operations (e.g.  readahead) while preventing contention for system
memory between the database and kernel caches.

On Illumos, there is a library function called directio(3C) that
allows user space to provide a hint to the file system that Direct IO
is useful, but the file system is free to ignore it. The semantics
are also entirely a file system decision. Those that do not
implement it return ENOTTY.

Since the semantics were never defined in any standard, O_DIRECT is
implemented such that it conforms to the behavior described in the
Linux open(2) man page as follows.

    1.  Minimize cache effects of the I/O.

    By design the ARC is already scan-resistant which helps mitigate
    the need for special O_DIRECT handling.  Data which is only
    accessed once will be the first to be evicted from the cache.
    This behavior is in consistent with Illumos and FreeBSD.

    Future performance work may wish to investigate the benefits of
    immediately evicting data from the cache which has been read or
    written with the O_DIRECT flag.  Functionally this behavior is
    very similar to applying the 'primarycache=metadata' property
    per open file.

    2. O_DIRECT _MAY_ impose restrictions on IO alignment and length.

    No additional alignment or length restrictions are imposed.

    3. O_DIRECT _MAY_ perform unbuffered IO operations directly
       between user memory and block device.

    No unbuffered IO operations are currently supported.  In order
    to support features such as transparent compression, encryption,
    and checksumming a copy must be made to transform the data.

    4. O_DIRECT _MAY_ imply O_DSYNC (XFS).

    O_DIRECT does not imply O_DSYNC for ZFS.  Callers must provide
    O_DSYNC to request synchronous semantics.

    5. O_DIRECT _MAY_ disable file locking that serializes IO
       operations.  Applications should avoid mixing O_DIRECT
       and normal IO or mmap(2) IO to the same file.  This is
       particularly true for overlapping regions.

    All I/O in ZFS is locked for correctness and this locking is not
    disabled by O_DIRECT.  However, concurrently mixing O_DIRECT,
    mmap(2), and normal I/O on the same file is not recommended.

This change is implemented by layering the aops->direct_IO operations
on the existing AIO operations.  Code already existed in ZFS on Linux
for bypassing the page cache when O_DIRECT is specified.

References:
  * http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide/tmp/en-US/html/ch02s09.html
  * https://blogs.oracle.com/roch/entry/zfs_and_directio
  * https://ext4.wiki.kernel.org/index.php/Clarifying_Direct_IO's_Semantics
  * https://illumos.org/man/3c/directio

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#224
  • Loading branch information
behlendorf committed Aug 21, 2018
1 parent 149ce88 commit 5c6223e
Show file tree
Hide file tree
Showing 16 changed files with 616 additions and 1 deletion.
130 changes: 130 additions & 0 deletions config/kernel-vfs-direct_IO.m4
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
dnl #
dnl # Linux 4.6.x API change
dnl #
AC_DEFUN([ZFS_AC_KERNEL_VFS_DIRECT_IO_ITER], [
AC_MSG_CHECKING([whether aops->direct_IO() uses iov_iter])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
ssize_t test_direct_IO(struct kiocb *kiocb,
struct iov_iter *iter) { return 0; }
static const struct address_space_operations
aops __attribute__ ((unused)) = {
.direct_IO = test_direct_IO,
};
],[
],[
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_VFS_DIRECT_IO_ITER, 1,
[aops->direct_IO() uses iov_iter without rw])
zfs_ac_direct_io="yes"
],[
AC_MSG_RESULT([no])
])
])

dnl #
dnl # Linux 4.1.x API change
dnl #
AC_DEFUN([ZFS_AC_KERNEL_VFS_DIRECT_IO_ITER_OFFSET], [
AC_MSG_CHECKING(
[whether aops->direct_IO() uses iov_iter with offset])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
ssize_t test_direct_IO(struct kiocb *kiocb,
struct iov_iter *iter, loff_t offset) { return 0; }
static const struct address_space_operations
aops __attribute__ ((unused)) = {
.direct_IO = test_direct_IO,
};
],[
],[
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_VFS_DIRECT_IO_ITER_OFFSET, 1,
[aops->direct_IO() uses iov_iter with offset])
zfs_ac_direct_io="yes"
],[
AC_MSG_RESULT([no])
])
])

dnl #
dnl # Linux 3.16.x API change
dnl #
AC_DEFUN([ZFS_AC_KERNEL_VFS_DIRECT_IO_ITER_RW_OFFSET], [
AC_MSG_CHECKING(
[whether aops->direct_IO() uses iov_iter with rw and offset])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
ssize_t test_direct_IO(int rw, struct kiocb *kiocb,
struct iov_iter *iter, loff_t offset) { return 0; }
static const struct address_space_operations
aops __attribute__ ((unused)) = {
.direct_IO = test_direct_IO,
};
],[
],[
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_VFS_DIRECT_IO_ITER_RW_OFFSET, 1,
[aops->direct_IO() uses iov_iter with rw and offset])
zfs_ac_direct_io="yes"
],[
AC_MSG_RESULT([no])
])
])

dnl #
dnl # Ancient Linux API (predates git)
dnl #
AC_DEFUN([ZFS_AC_KERNEL_VFS_DIRECT_IO_IOVEC], [
AC_MSG_CHECKING([whether aops->direct_IO() uses iovec])
ZFS_LINUX_TRY_COMPILE([
#include <linux/fs.h>
ssize_t test_direct_IO(int rw, struct kiocb *kiocb,
const struct iovec *iov, loff_t offset,
unsigned long nr_segs) { return 0; }
static const struct address_space_operations
aops __attribute__ ((unused)) = {
.direct_IO = test_direct_IO,
};
],[
],[
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_VFS_DIRECT_IO_IOVEC, 1,
[aops->direct_IO() uses iovec])
zfs_ac_direct_io="yes"
],[
AC_MSG_RESULT([no])
])
])

AC_DEFUN([ZFS_AC_KERNEL_VFS_DIRECT_IO], [
zfs_ac_direct_io="no"
if test "$zfs_ac_direct_io" = "no"; then
ZFS_AC_KERNEL_VFS_DIRECT_IO_ITER
fi
if test "$zfs_ac_direct_io" = "no"; then
ZFS_AC_KERNEL_VFS_DIRECT_IO_ITER_OFFSET
fi
if test "$zfs_ac_direct_io" = "no"; then
ZFS_AC_KERNEL_VFS_DIRECT_IO_ITER_RW_OFFSET
fi
if test "$zfs_ac_direct_io" = "no"; then
ZFS_AC_KERNEL_VFS_DIRECT_IO_IOVEC
fi
if test "$zfs_ac_direct_io" = "no"; then
AC_MSG_ERROR([no; unknown direct IO interface])
fi
])
1 change: 1 addition & 0 deletions config/kernel.m4
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,7 @@ AC_DEFUN([ZFS_AC_CONFIG_KERNEL], [
ZFS_AC_KERNEL_LSEEK_EXECUTE
ZFS_AC_KERNEL_VFS_ITERATE
ZFS_AC_KERNEL_VFS_RW_ITERATE
ZFS_AC_KERNEL_VFS_DIRECT_IO
ZFS_AC_KERNEL_GENERIC_WRITE_CHECKS
ZFS_AC_KERNEL_KMAP_ATOMIC_ARGS
ZFS_AC_KERNEL_FOLLOW_DOWN_ONE
Expand Down
1 change: 1 addition & 0 deletions configure.ac
Original file line number Diff line number Diff line change
Expand Up @@ -280,6 +280,7 @@ AC_CONFIG_FILES([
tests/zfs-tests/tests/functional/hkdf/Makefile
tests/zfs-tests/tests/functional/inheritance/Makefile
tests/zfs-tests/tests/functional/inuse/Makefile
tests/zfs-tests/tests/functional/io/Makefile
tests/zfs-tests/tests/functional/kstat/Makefile
tests/zfs-tests/tests/functional/large_files/Makefile
tests/zfs-tests/tests/functional/largest_pool/Makefile
Expand Down
53 changes: 53 additions & 0 deletions module/zfs/zpl_file.c
Original file line number Diff line number Diff line change
Expand Up @@ -438,6 +438,58 @@ zpl_aio_write(struct kiocb *kiocb, const struct iovec *iovp,
}
#endif /* HAVE_VFS_RW_ITERATE */

#if defined(HAVE_VFS_RW_ITERATE)
static ssize_t
zpl_direct_IO_impl(struct kiocb *kiocb, struct iov_iter *iter)
{
if (iov_iter_rw(iter) == WRITE)
return (zpl_iter_write(kiocb, iter));
else
return (zpl_iter_read(kiocb, iter));
}
#if defined(HAVE_VFS_DIRECT_IO_ITER)
static ssize_t
zpl_direct_IO(struct kiocb *kiocb, struct iov_iter *iter)
{
return (zpl_direct_IO_impl(kiocb, iter));
}
#elif defined(HAVE_VFS_DIRECT_IO_ITER_OFFSET)
static ssize_t
zpl_direct_IO(struct kiocb *kiocb, struct iov_iter *iter, loff_t pos)
{
ASSERT(pos, ==, kiocb->ki_pos);
return (zpl_direct_IO_impl(kiocb, iter));
}
#elif defined(HAVE_VFS_DIRECT_IO_ITER_RW_OFFSET)
static ssize_t
zpl_direct_IO(int rw, struct kiocb *kiocb, struct iov_iter *iter, loff_t pos)
{
ASSERT(rw, ==, iov_iter_rw(iter));
ASSERT(pos, ==, kiocb->ki_pos);
return (zpl_direct_IO_impl(kiocb, iter));
}
#else
#error "Unknown direct IO interface"
#endif

#else

#if defined(HAVE_VFS_DIRECT_IO_IOVEC)
static ssize_t
zpl_direct_IO(int rw, struct kiocb *kiocb, const struct iovec *iovp,
loff_t pos, unsigned long nr_segs)
{
if (rw == WRITE)
return (zpl_aio_write(kiocb, iovp, nr_segs, pos));
else
return (zpl_aio_read(kiocb, iovp, nr_segs, pos));
}
#else
#error "Unknown direct IO interface"
#endif

#endif /* HAVE_VFS_RW_ITERATE */

static loff_t
zpl_llseek(struct file *filp, loff_t offset, int whence)
{
Expand Down Expand Up @@ -929,6 +981,7 @@ const struct address_space_operations zpl_address_space_operations = {
.readpage = zpl_readpage,
.writepage = zpl_writepage,
.writepages = zpl_writepages,
.direct_IO = zpl_direct_IO,
};

const struct file_operations zpl_file_operations = {
Expand Down
3 changes: 2 additions & 1 deletion scripts/commitcheck.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,11 @@ function test_url()
}

# test commit body for length
# lines containing urls are exempt for the length limit.
function test_commit_bodylength()
{
length="72"
body=$(git log -n 1 --pretty=%b "$REF" | grep -E -m 1 ".{$((length + 1))}")
body=$(git log -n 1 --pretty=%b "$REF" | grep -Ev "http(s)*://" | grep -E -m 1 ".{$((length + 1))}")
if [ -n "$body" ]; then
echo "error: commit message body contains line over ${length} characters"
return 1
Expand Down
4 changes: 4 additions & 0 deletions tests/runfiles/linux.run
Original file line number Diff line number Diff line change
Expand Up @@ -558,6 +558,10 @@ tests = ['inherit_001_pos']
pre =
tags = ['functional', 'inheritance']

[tests/functional/io]
tests = ['sync', 'psync', 'libaio', 'posixaio', 'mmap']
tags = ['functional', 'io']

[tests/functional/inuse]
tests = ['inuse_001_pos', 'inuse_003_pos', 'inuse_004_pos',
'inuse_005_pos', 'inuse_006_pos', 'inuse_007_pos', 'inuse_008_pos',
Expand Down
1 change: 1 addition & 0 deletions tests/zfs-tests/tests/functional/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ SUBDIRS = \
hkdf \
inheritance \
inuse \
io \
kstat \
large_files \
largest_pool \
Expand Down
12 changes: 12 additions & 0 deletions tests/zfs-tests/tests/functional/io/Makefile.am
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
pkgdatadir = $(datadir)/@PACKAGE@/zfs-tests/tests/functional/io
dist_pkgdata_SCRIPTS = \
setup.ksh \
cleanup.ksh \
sync.ksh \
psync.ksh \
libaio.ksh \
posixaio.ksh \
mmap.ksh

dist_pkgdata_DATA = \
io.cfg
31 changes: 31 additions & 0 deletions tests/zfs-tests/tests/functional/io/cleanup.ksh
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#!/bin/ksh -p
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or http://www.opensolaris.org/os/licensing.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#

#
# Copyright (c) 2018 by Lawrence Livermore National Security, LLC.
#

. $STF_SUITE/include/libtest.shlib

verify_runnable "global"

default_cleanup
25 changes: 25 additions & 0 deletions tests/zfs-tests/tests/functional/io/io.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#
# CDDL HEADER START
#
# This file and its contents are supplied under the terms of the
# Common Development and Distribution License ("CDDL"), version 1.0.
# You may only use this file in accordance with the terms of version
# 1.0 of the CDDL.
#
# A full copy of the text of the CDDL should have accompanied this
# source. A copy of the CDDL is also available via the Internet at
# http://www.illumos.org/license/CDDL.
#
# CDDL HEADER END
#

#
# Copyright (c) 2018 by Lawrence Livermore National Security, LLC.
#

FIO_COMMON_ARGS="--numjobs=8 --bs=128k --size=16M --fallocate=none --group_reporting --verify=sha1 --minimal"

FIO_READ_ARGS="--name=rw --rw=read $FIO_COMMON_ARGS"
FIO_WRITE_ARGS="--name=rw --rw=write $FIO_COMMON_ARGS"
FIO_RANDREAD_ARGS="--name=rw --rw=randread $FIO_COMMON_ARGS"
FIO_RANDWRITE_ARGS="--name=rw --rw=randwrite $FIO_COMMON_ARGS"
65 changes: 65 additions & 0 deletions tests/zfs-tests/tests/functional/io/libaio.ksh
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
#! /bin/ksh -p
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or http://www.opensolaris.org/os/licensing.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#

#
# Copyright (c) 2018 by Lawrence Livermore National Security, LLC.
#

. $STF_SUITE/include/libtest.shlib
. $STF_SUITE/tests/functional/io/io.cfg

#
# DESCRIPTION:
# Verify Linux native asynchronous IO.
#
# STRATEGY:
# 1. Use fio(1) in verify mode to perform write, read,
# random read, and random write workloads.
# 2. Repeat the test with additional fio(1) options.
#

verify_runnable "global"

function cleanup
{
log_must rm -f "$mntpnt/rw*"
}

log_assert "Verify Linux native asynchronous IO"

log_onexit cleanup

ioengine="--ioengine=libaio"
mntpnt=$(get_prop mountpoint $TESTPOOL/$TESTFS)
dir="--directory=$mntpnt"

set -A fio_arg -- "--sync=0" "--sync=1" "--direct=0" "--direct=1"

for arg in "${fio_arg[@]}"; do
log_must fio $dir $ioengine $arg $FIO_WRITE_ARGS
log_must fio $dir $ioengine $arg $FIO_READ_ARGS
log_must fio $dir $ioengine $arg $FIO_RANDWRITE_ARGS
log_must fio $dir $ioengine $arg $FIO_RANDREAD_ARGS
log_must rm -f "$mntpnt/rw*"
done

log_pass "Verified Linux native asynchronous IO"
Loading

0 comments on commit 5c6223e

Please sign in to comment.