forked from openzfs/zfs
-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Direct IO via the O_DIRECT flag was originally introduced in XFS by IRIX for database workloads. Its purpose was to allow the database to bypass the page and buffer caches to prevent unnecessary IO operations (e.g. readahead) while preventing contention for system memory between the database and kernel caches. On Illumos, there is a library function called directio(3C) that allows user space to provide a hint to the file system that Direct IO is useful, but the file system is free to ignore it. The semantics are also entirely a file system decision. Those that do not implement it return ENOTTY. Since the semantics were never defined in any standard, O_DIRECT is implemented such that it conforms to the behavior described in the Linux open(2) man page as follows. 1. Minimize cache effects of the I/O. By design the ARC is already scan-resistant which helps mitigate the need for special O_DIRECT handling. Data which is only accessed once will be the first to be evicted from the cache. This behavior is in consistent with Illumos and FreeBSD. Future performance work may wish to investigate the benefits of immediately evicting data from the cache which has been read or written with the O_DIRECT flag. Functionally this behavior is very similar to applying the 'primarycache=metadata' property per open file. 2. O_DIRECT _MAY_ impose restrictions on IO alignment and length. No additional alignment or length restrictions are imposed. 3. O_DIRECT _MAY_ perform unbuffered IO operations directly between user memory and block device. No unbuffered IO operations are currently supported. In order to support features such as transparent compression, encryption, and checksumming a copy must be made to transform the data. 4. O_DIRECT _MAY_ imply O_DSYNC (XFS). O_DIRECT does not imply O_DSYNC for ZFS. Callers must provide O_DSYNC to request synchronous semantics. 5. O_DIRECT _MAY_ disable file locking that serializes IO operations. Applications should avoid mixing O_DIRECT and normal IO or mmap(2) IO to the same file. This is particularly true for overlapping regions. All I/O in ZFS is locked for correctness and this locking is not disabled by O_DIRECT. However, concurrently mixing O_DIRECT, mmap(2), and normal I/O on the same file is not recommended. This change is implemented by layering the aops->direct_IO operations on the existing AIO operations. Code already existed in ZFS on Linux for bypassing the page cache when O_DIRECT is specified. References: * http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide/tmp/en-US/html/ch02s09.html * https://blogs.oracle.com/roch/entry/zfs_and_directio * https://ext4.wiki.kernel.org/index.php/Clarifying_Direct_IO's_Semantics * https://illumos.org/man/3c/directio Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#224
- Loading branch information
1 parent
2711b1d
commit 61bd6cc
Showing
18 changed files
with
640 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
dnl # | ||
dnl # Linux 4.6.x API change | ||
dnl # | ||
AC_DEFUN([ZFS_AC_KERNEL_VFS_DIRECT_IO_ITER], [ | ||
AC_MSG_CHECKING([whether aops->direct_IO() uses iov_iter]) | ||
ZFS_LINUX_TRY_COMPILE([ | ||
#include <linux/fs.h> | ||
ssize_t test_direct_IO(struct kiocb *kiocb, | ||
struct iov_iter *iter) { return 0; } | ||
static const struct address_space_operations | ||
aops __attribute__ ((unused)) = { | ||
.direct_IO = test_direct_IO, | ||
}; | ||
],[ | ||
],[ | ||
AC_MSG_RESULT([yes]) | ||
AC_DEFINE(HAVE_VFS_DIRECT_IO_ITER, 1, | ||
[aops->direct_IO() uses iov_iter without rw]) | ||
zfs_ac_direct_io="yes" | ||
],[ | ||
AC_MSG_RESULT([no]) | ||
]) | ||
]) | ||
|
||
dnl # | ||
dnl # Linux 4.1.x API change | ||
dnl # | ||
AC_DEFUN([ZFS_AC_KERNEL_VFS_DIRECT_IO_ITER_OFFSET], [ | ||
AC_MSG_CHECKING( | ||
[whether aops->direct_IO() uses iov_iter with offset]) | ||
ZFS_LINUX_TRY_COMPILE([ | ||
#include <linux/fs.h> | ||
ssize_t test_direct_IO(struct kiocb *kiocb, | ||
struct iov_iter *iter, loff_t offset) { return 0; } | ||
static const struct address_space_operations | ||
aops __attribute__ ((unused)) = { | ||
.direct_IO = test_direct_IO, | ||
}; | ||
],[ | ||
],[ | ||
AC_MSG_RESULT([yes]) | ||
AC_DEFINE(HAVE_VFS_DIRECT_IO_ITER_OFFSET, 1, | ||
[aops->direct_IO() uses iov_iter with offset]) | ||
zfs_ac_direct_io="yes" | ||
],[ | ||
AC_MSG_RESULT([no]) | ||
]) | ||
]) | ||
|
||
dnl # | ||
dnl # Linux 3.16.x API change | ||
dnl # | ||
AC_DEFUN([ZFS_AC_KERNEL_VFS_DIRECT_IO_ITER_RW_OFFSET], [ | ||
AC_MSG_CHECKING( | ||
[whether aops->direct_IO() uses iov_iter with rw and offset]) | ||
ZFS_LINUX_TRY_COMPILE([ | ||
#include <linux/fs.h> | ||
ssize_t test_direct_IO(int rw, struct kiocb *kiocb, | ||
struct iov_iter *iter, loff_t offset) { return 0; } | ||
static const struct address_space_operations | ||
aops __attribute__ ((unused)) = { | ||
.direct_IO = test_direct_IO, | ||
}; | ||
],[ | ||
],[ | ||
AC_MSG_RESULT([yes]) | ||
AC_DEFINE(HAVE_VFS_DIRECT_IO_ITER_RW_OFFSET, 1, | ||
[aops->direct_IO() uses iov_iter with rw and offset]) | ||
zfs_ac_direct_io="yes" | ||
],[ | ||
AC_MSG_RESULT([no]) | ||
]) | ||
]) | ||
|
||
dnl # | ||
dnl # Ancient Linux API (predates git) | ||
dnl # | ||
AC_DEFUN([ZFS_AC_KERNEL_VFS_DIRECT_IO_IOVEC], [ | ||
AC_MSG_CHECKING([whether aops->direct_IO() uses iovec]) | ||
ZFS_LINUX_TRY_COMPILE([ | ||
#include <linux/fs.h> | ||
ssize_t test_direct_IO(int rw, struct kiocb *kiocb, | ||
const struct iovec *iov, loff_t offset, | ||
unsigned long nr_segs) { return 0; } | ||
static const struct address_space_operations | ||
aops __attribute__ ((unused)) = { | ||
.direct_IO = test_direct_IO, | ||
}; | ||
],[ | ||
],[ | ||
AC_MSG_RESULT([yes]) | ||
AC_DEFINE(HAVE_VFS_DIRECT_IO_IOVEC, 1, | ||
[aops->direct_IO() uses iovec]) | ||
zfs_ac_direct_io="yes" | ||
],[ | ||
AC_MSG_RESULT([no]) | ||
]) | ||
]) | ||
|
||
AC_DEFUN([ZFS_AC_KERNEL_VFS_DIRECT_IO], [ | ||
zfs_ac_direct_io="no" | ||
if test "$zfs_ac_direct_io" = "no"; then | ||
ZFS_AC_KERNEL_VFS_DIRECT_IO_ITER | ||
fi | ||
if test "$zfs_ac_direct_io" = "no"; then | ||
ZFS_AC_KERNEL_VFS_DIRECT_IO_ITER_OFFSET | ||
fi | ||
if test "$zfs_ac_direct_io" = "no"; then | ||
ZFS_AC_KERNEL_VFS_DIRECT_IO_ITER_RW_OFFSET | ||
fi | ||
if test "$zfs_ac_direct_io" = "no"; then | ||
ZFS_AC_KERNEL_VFS_DIRECT_IO_IOVEC | ||
fi | ||
if test "$zfs_ac_direct_io" = "no"; then | ||
AC_MSG_ERROR([no; unknown direct IO interface]) | ||
fi | ||
]) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -27,6 +27,7 @@ SUBDIRS = \ | |
hkdf \ | ||
inheritance \ | ||
inuse \ | ||
io \ | ||
kstat \ | ||
large_files \ | ||
largest_pool \ | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
pkgdatadir = $(datadir)/@PACKAGE@/zfs-tests/tests/functional/io | ||
dist_pkgdata_SCRIPTS = \ | ||
setup.ksh \ | ||
cleanup.ksh \ | ||
sync.ksh \ | ||
psync.ksh \ | ||
libaio.ksh \ | ||
posixaio.ksh \ | ||
mmap.ksh | ||
|
||
dist_pkgdata_DATA = \ | ||
io.cfg |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
#!/bin/ksh -p | ||
# | ||
# CDDL HEADER START | ||
# | ||
# The contents of this file are subject to the terms of the | ||
# Common Development and Distribution License (the "License"). | ||
# You may not use this file except in compliance with the License. | ||
# | ||
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE | ||
# or http://www.opensolaris.org/os/licensing. | ||
# See the License for the specific language governing permissions | ||
# and limitations under the License. | ||
# | ||
# When distributing Covered Code, include this CDDL HEADER in each | ||
# file and include the License file at usr/src/OPENSOLARIS.LICENSE. | ||
# If applicable, add the following below this CDDL HEADER, with the | ||
# fields enclosed by brackets "[]" replaced with your own identifying | ||
# information: Portions Copyright [yyyy] [name of copyright owner] | ||
# | ||
# CDDL HEADER END | ||
# | ||
|
||
# | ||
# Copyright (c) 2018 by Lawrence Livermore National Security, LLC. | ||
# | ||
|
||
. $STF_SUITE/include/libtest.shlib | ||
|
||
verify_runnable "global" | ||
|
||
default_cleanup |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# | ||
# CDDL HEADER START | ||
# | ||
# This file and its contents are supplied under the terms of the | ||
# Common Development and Distribution License ("CDDL"), version 1.0. | ||
# You may only use this file in accordance with the terms of version | ||
# 1.0 of the CDDL. | ||
# | ||
# A full copy of the text of the CDDL should have accompanied this | ||
# source. A copy of the CDDL is also available via the Internet at | ||
# http://www.illumos.org/license/CDDL. | ||
# | ||
# CDDL HEADER END | ||
# | ||
|
||
# | ||
# Copyright (c) 2018 by Lawrence Livermore National Security, LLC. | ||
# | ||
|
||
FIO_COMMON_ARGS="--numjobs=1 --bs=32k --size=32M --fallocate=none --group_reporting --verify=sha1 --minimal" | ||
|
||
FIO_READ_ARGS="--name=rw --rw=read $FIO_COMMON_ARGS" | ||
FIO_WRITE_ARGS="--name=rw --rw=write $FIO_COMMON_ARGS" | ||
FIO_RANDREAD_ARGS="--name=rw --rw=randread $FIO_COMMON_ARGS" | ||
FIO_RANDWRITE_ARGS="--name=rw --rw=randwrite $FIO_COMMON_ARGS" |
Oops, something went wrong.