- Feature Name:
fs2
- Start Date: 2015-04-04
- RFC PR: rust-lang#1044
- Rust Issue: rust-lang/rust#24796
Expand the scope of the std::fs
module by enhancing existing functionality,
exposing lower-level representations, and adding a few new functions.
The current std::fs
module serves many of the basic needs of interacting with
a filesystem, but is missing a lot of useful functionality. For example, none of
these operations are possible in stable Rust today:
- Inspecting a file's modification/access times
- Reading low-level information like that contained in
libc::stat
- Inspecting the unix permission bits on a file
- Blanket setting the unix permission bits on a file
- Leveraging
DirEntry
for the extra metadata it might contain - Reading the metadata of a symlink (not what it points at)
- Resolving all symlink in a path
There is some more functionality listed in the RFC issue, but this RFC
will not attempt to solve the entirety of that issue at this time. This RFC
strives to expose APIs for much of the functionality listed above that is on the
track to becoming #[stable]
soon.
There are a few areas of the std::fs
API surface which are not considered
goals for this RFC. It will be left for future RFCs to add new APIs for these
areas:
- Enhancing
copy
to copy directories recursively or configuring how copying happens. - Enhancing or stabilizing
walk
and its functionality. - Temporary files or directories
First, a vision for how lowering APIs in general will be presented, and then a number of specific APIs will each be proposed. Many of the proposed APIs are independent from one another and this RFC may not be implemented all-in-one-go but instead piecemeal over time, allowing the designs to evolve slightly in the meantime.
One of the principles of IO reform was to:
Provide hooks for integrating with low-level and/or platform-specific APIs.
The original RFC went into some amount of detail for how this would look, in
particular by use of the os
module. Part of the goal of this RFC is to flesh
out that vision in more detail.
Ultimately, the organization of os
is planned to look something like the
following:
os
unix applicable to all cfg(unix) platforms; high- and low-level APIs
io extensions to std::io
fs extensions to std::fs
net extensions to std::net
env extensions to std::env
process extensions to std::process
...
linux applicable to linux only
io, fs, net, env, process, ...
macos ...
windows ...
APIs whose behavior is platform-specific are provided only within the std::os
hierarchy, making it easy to audit for usage of such APIs. Organizing the
platform modules internally in the same way as std
makes it easy to find
relevant extensions when working with std
.
It is emphatically not the goal of the std::os::*
modules to provide
bindings to all system APIs for each platform; this work is left to external
crates. The goals are rather to:
-
Facilitate interop between abstract types like
File
thatstd
provides and the underlying system. This is done via "lowering": extension traits likeAsRawFd
allow you to extract low-level, platform-specific representations out ofstd
types likeFile
andTcpStream
. -
Provide high-level but platform-specific APIs that feel like those in the rest of
std
. Just as with the rest ofstd
, the goal here is not to include all possible functionality, but rather the most commonly-used or fundamental.
Lowering makes it possible for external crates to provide APIs that work
"seamlessly" with std
abstractions. For example, a crate for Linux might
provide an epoll
facility that can work directly with std::fs::File
and
std::net::TcpStream
values, completely hiding the internal use of file
descriptors. Eventually, such a crate could even be merged into std::os::unix
,
with minimal disruption -- there is little distinction between std
and other
crates in this regard.
Concretely, lowering has two ingredients:
-
Introducing one or more "raw" types that are generally direct aliases for C types (more on this in the next section).
-
Providing an extension trait that makes it possible to extract a raw type from a
std
type. In some cases, it's possible to go the other way around as well. The conversion can be by reference or by value, where the latter is used mainly to avoid the destructor associated with astd
type (e.g. to extract a file descriptor from aFile
and eliminate theFile
object, without closing the file).
While we do not seek to exhaustively bind types or APIs from the underlying
system, it is a goal to provide lowering operations for every high-level type
to a system-level data type, whenever applicable. This RFC proposes several such
lowerings that are currently missing from std::fs
.
Each of the primitives in the standard library will expose the ability to be lowered into its component abstraction, facilitating the need to define these abstractions and organize them in the platform-specific modules. This RFC proposes the following guidelines for doing so:
- Each platform will have a
raw
module inside ofstd::os
which houses all of its platform specific definitions. - Only type definitions will be contained in
raw
modules, no function bindings, methods, or trait implementations. - Cross-platform types (e.g. those shared on all
unix
platforms) will be located in the respective cross-platform module. Types which only differ in the width of an integer type are considered to be cross-platform. - Platform-specific types will exist only in the
raw
module for that platform. A platform-specific type may have different field names, components, or just not exist on other platforms.
Differences in integer widths are not considered to be enough of a platform
difference to define in each separate platform's module, meaning that it will be
possible to write code that uses os::unix
but doesn't compile on all Unix
platforms. It is believed that most consumers of these types will continue to
store the same type (e.g. not assume it's an i32
) throughout the application
or immediately cast it to a known type.
To reiterate, it is not planned for each raw
module to provide exhaustive
bindings to each platform. Only those abstractions which the standard library is
lowering into will be defined in each raw
module.
Currently the Metadata
structure exposes very few pieces of information about
a file. Some of this is because the information is not available across all
platforms, but some of it is also because the standard library does not have the
appropriate abstraction to return at this time (e.g. time stamps). The raw
contents of Metadata
(a stat
on Unix), however, should be accessible via
lowering no matter what.
The following trait hierarchy and new structures will be added to the standard library.
mod os::windows::fs {
pub trait MetadataExt {
fn file_attributes(&self) -> u32; // `dwFileAttributes` field
fn creation_time(&self) -> u64; // `ftCreationTime` field
fn last_access_time(&self) -> u64; // `ftLastAccessTime` field
fn last_write_time(&self) -> u64; // `ftLastWriteTime` field
fn file_size(&self) -> u64; // `nFileSizeHigh`/`nFileSizeLow` fields
}
impl MetadataExt for fs::Metadata { ... }
}
mod os::unix::fs {
pub trait MetadataExt {
fn as_raw(&self) -> &Metadata;
}
impl MetadataExt for fs::Metadata { ... }
pub struct Metadata(raw::stat);
impl Metadata {
// Accessors for fields available in `raw::stat` for *all* unix platforms
fn dev(&self) -> raw::dev_t; // st_dev field
fn ino(&self) -> raw::ino_t; // st_ino field
fn mode(&self) -> raw::mode_t; // st_mode field
fn nlink(&self) -> raw::nlink_t; // st_nlink field
fn uid(&self) -> raw::uid_t; // st_uid field
fn gid(&self) -> raw::gid_t; // st_gid field
fn rdev(&self) -> raw::dev_t; // st_rdev field
fn size(&self) -> raw::off_t; // st_size field
fn blksize(&self) -> raw::blksize_t; // st_blksize field
fn blocks(&self) -> raw::blkcnt_t; // st_blocks field
fn atime(&self) -> (i64, i32); // st_atime field, (sec, nsec)
fn mtime(&self) -> (i64, i32); // st_mtime field, (sec, nsec)
fn ctime(&self) -> (i64, i32); // st_ctime field, (sec, nsec)
}
}
// st_flags, st_gen, st_lspare, st_birthtim, st_qspare
mod os::{linux, macos, freebsd, ...}::fs {
pub mod raw {
pub type dev_t = ...;
pub type ino_t = ...;
// ...
pub struct stat {
// ... same public fields as libc::stat
}
}
pub trait MetadataExt {
fn as_raw_stat(&self) -> &raw::stat;
}
impl MetadataExt for os::unix::fs::RawMetadata { ... }
impl MetadataExt for fs::Metadata { ... }
}
The goal of this hierarchy is to expose all of the information in the OS-level metadata in as cross-platform of a method as possible while adhering to the design principles of the standard library.
The interesting part about working in a "cross platform" manner here is that the
makeup of libc::stat
on unix platforms can vary quite a bit between platforms.
For example some platforms have a st_birthtim
field while others do not.
To enable as much ergonomic usage as possible, the os::unix
module will expose
the intersection of metadata available in libc::stat
across all unix
platforms. The information is still exposed in a raw fashion (in terms of the
values returned), but methods are required as the raw structure is not exposed.
The unix platforms then leverage the more fine-grained modules in std::os
(e.g. linux
and macos
) to return the raw libc::stat
structure. This will
allow full access to the information in libc::stat
in all platforms with clear
opt-in to when you're using platform-specific information.
One of the major goals of the os::unix::fs
design is to enable as much
functionality as possible when programming against "unix in general" while still
allowing applications to choose to only program against macos, for example.
At this time there is no suitable type in the standard library to represent the return type of these two functions. The type would either have to be some form of time stamp or moment in time, both of which are difficult abstractions to add lightly.
Consequently, both of these functions will be deprecated in favor of
requiring platform-specific code to access the modification/access time of
files. This information is all available via the MetadataExt
traits listed
above.
Eventually, once a std
type for cross-platform timestamps is available, these
methods will be re-instated as returning that type.
Note: this section only describes behavior on unix.
Currently there is no stable method of inspecting the permission bits on a file,
and it is unclear whether the current unstable methods of doing so,
PermissionsExt::mode
, should be stabilized. The main question around this
piece of functionality is whether to provide a higher level abstractiong (e.g.
similar to the bitflags
crate) for the permission bits on unix.
This RFC proposes considering the methods for stabilization as-is and not pursuing a higher level abstraction of the unix permission bits. To facilitate in their inspection and manipulation, however, the following constants will be added:
mod os::unix::fs {
pub const USER_READ: raw::mode_t;
pub const USER_WRITE: raw::mode_t;
pub const USER_EXECUTE: raw::mode_t;
pub const USER_RWX: raw::mode_t;
pub const OTHER_READ: raw::mode_t;
pub const OTHER_WRITE: raw::mode_t;
pub const OTHER_EXECUTE: raw::mode_t;
pub const OTHER_RWX: raw::mode_t;
pub const GROUP_READ: raw::mode_t;
pub const GROUP_WRITE: raw::mode_t;
pub const GROUP_EXECUTE: raw::mode_t;
pub const GROUP_RWX: raw::mode_t;
pub const ALL_READ: raw::mode_t;
pub const ALL_WRITE: raw::mode_t;
pub const ALL_EXECUTE: raw::mode_t;
pub const ALL_RWX: raw::mode_t;
pub const SETUID: raw::mode_t;
pub const SETGID: raw::mode_t;
pub const STICKY_BIT: raw::mode_t;
}
Finally, the set_permissions
function of the std::fs
module is also proposed
to be marked #[stable]
soon as a method of blanket setting permissions for a
file.
Currently there is no method to construct an instance of Permissions
on any
platform. This RFC proposes adding the following APIs:
mod os::unix::fs {
pub trait PermissionsExt {
fn from_mode(mode: raw::mode_t) -> Self;
}
impl PermissionsExt for Permissions { ... }
}
This RFC does not propose yet adding a cross-platform way to construct a
Permissions
structure due to the radical differences between how unix and
windows handle permissions.
Currently the standard library does not expose an API which allows setting the
permission bits on unix or security attributes on Windows. This RFC proposes
adding the following API to std::fs
:
pub struct DirBuilder { ... }
impl DirBuilder {
/// Creates a new set of options with default mode/security settings for all
/// platforms and also non-recursive.
pub fn new() -> Self;
/// Indicate that directories create should be created recursively, creating
/// all parent directories if they do not exist with the same security and
/// permissions settings.
pub fn recursive(&mut self, recursive: bool) -> &mut Self;
/// Create the specified directory with the options configured in this
/// builder.
pub fn create<P: AsRef<Path>>(&self, path: P) -> io::Result<()>;
}
mod os::unix::fs {
pub trait DirBuilderExt {
fn mode(&mut self, mode: raw::mode_t) -> &mut Self;
}
impl DirBuilderExt for DirBuilder { ... }
}
mod os::windows::fs {
// once a `SECURITY_ATTRIBUTES` abstraction exists, this will be added
pub trait DirBuilderExt {
fn security_attributes(&mut self, ...) -> &mut Self;
}
impl DirBuilderExt for DirBuilder { ... }
}
This sort of builder is also extendable to other flavors of functions in the future, such as C++'s template parameter:
/// Use the specified directory as a "template" for permissions and security
/// settings of the new directories to be created.
///
/// On unix this will issue a `stat` of the specified directory and new
/// directories will be created with the same permission bits. On Windows
/// this will trigger the use of the `CreateDirectoryEx` function.
pub fn template<P: AsRef<Path>>(&mut self, path: P) -> &mut Self;
At this time, however, it it not proposed to add this method to
DirBuilder
.
Currently there is no enumeration or newtype representing a list of "file types" on the local filesystem. This is partly done because the need is not so high right now. Some situations, however, imply that it is more efficient to learn the file type at once instead of testing for each individual file type itself.
For example some platforms' DirEntry
type can know the FileType
without an
extra syscall. If code were to test a DirEntry
separately for whether it's a
file or a directory, it may issue more syscalls necessary than if it instead
learned the type and then tested that if it was a file or directory.
The full set of file types, however, is not always known nor portable across platforms, so this RFC proposes the following hierarchy:
#[derive(Copy, Clone, PartialEq, Eq, Hash)]
pub struct FileType(..);
impl FileType {
pub fn is_dir(&self) -> bool;
pub fn is_file(&self) -> bool;
pub fn is_symlink(&self) -> bool;
}
Extension traits can be added in the future for testing for other more flavorful kinds of files on various platforms (such as unix sockets on unix platforms).
Currently the fs::Metadata
structure exposes stable is_file
and is_dir
accessors. The struct will also grow a file_type
accessor for this newtype
struct being added. It is proposed that Metadata
will retain the
is_{file,dir}
convenience methods, but no other "file type testers" will be
added.
Currently the std::fs
module provides a soft_link
and read_link
function,
but there is no method of doing other symlink related tasks such as:
- Testing whether a file is a symlink
- Reading the metadata of a symlink, not what it points to
The following APIs will be added to std::fs
:
/// Returns the metadata of the file pointed to by `p`, and this function,
/// unlike `metadata` will **not** follow symlinks.
pub fn symlink_metadata<P: AsRef<Path>>(p: P) -> io::Result<Metadata>;
There's a long-standing issue that the unix function realpath
is
not bound, and this RFC proposes adding the following API to the fs
module:
/// Canonicalizes the given file name to an absolute path with all `..`, `.`,
/// and symlink components resolved.
///
/// On unix this function corresponds to the return value of the `realpath`
/// function, and on Windows this corresponds to the `GetFullPathName` function.
///
/// Note that relative paths given to this function will use the current working
/// directory as a base, and the current working directory is not managed in a
/// thread-local fashion, so this function may need to be synchronized with
/// other calls to `env::change_dir`.
pub fn canonicalize<P: AsRef<Path>>(p: P) -> io::Result<PathBuf>;
Currently the PathExt
trait is unstable, yet it is quite convenient! The main
motivation for its #[unstable]
tag is that it is unclear how much
functionality should be on PathExt
versus the std::fs
module itself.
Currently a small subset of functionality is offered, but it is unclear what the
guiding principle for the contents of this trait are.
This RFC proposes a few guiding principles for this trait:
-
Only read-only operations in
std::fs
will be exposed onPathExt
. All operations which require modifications to the filesystem will require calling methods throughstd::fs
itself. -
Some inspection methods on
Metadata
will be exposed onPathExt
, but only those where it logically makes sense forPath
to be theself
receiver. For examplePathExt::len
will not exist (size of the file), butPathExt::is_dir
will exist.
Concretely, the PathExt
trait will be expanded to:
pub trait PathExt {
fn exists(&self) -> bool;
fn is_dir(&self) -> bool;
fn is_file(&self) -> bool;
fn metadata(&self) -> io::Result<Metadata>;
fn symlink_metadata(&self) -> io::Result<Metadata>;
fn canonicalize(&self) -> io::Result<PathBuf>;
fn read_link(&self) -> io::Result<PathBuf>;
fn read_dir(&self) -> io::Result<ReadDir>;
}
impl PathExt for Path { ... }
Currently the DirEntry
API is quite minimalistic, exposing very few of the
underlying attributes. Platforms like Windows actually contain an entire
Metadata
inside of a DirEntry
, enabling much more efficient walking of
directories in some situations.
The following APIs will be added to DirEntry
:
impl DirEntry {
/// This function will return the filesystem metadata for this directory
/// entry. This is equivalent to calling `fs::symlink_metadata` on the
/// path returned.
///
/// On Windows this function will always return `Ok` and will not issue a
/// system call, but on unix this will always issue a call to `stat` to
/// return metadata.
pub fn metadata(&self) -> io::Result<Metadata>;
/// Return what file type this `DirEntry` contains.
///
/// On some platforms this may not require reading the metadata of the
/// underlying file from the filesystem, but on other platforms it may be
/// required to do so.
pub fn file_type(&self) -> io::Result<FileType>;
/// Returns the file name for this directory entry.
pub fn file_name(&self) -> OsString;
}
mod os::unix::fs {
pub trait DirEntryExt {
fn ino(&self) -> raw::ino_t; // read the d_ino field
}
impl DirEntryExt for fs::DirEntry { ... }
}
-
This is quite a bit of surface area being added to the
std::fs
API, and it may perhaps be best to scale it back and add it in a more incremental fashion instead of all at once. Most of it, however, is fairly straightforward, so it seems prudent to schedule many of these features for the 1.1 release. -
Exposing raw information such as
libc::stat
orWIN32_FILE_ATTRIBUTE_DATA
possibly can hamstring altering the implementation in the future. At this point, however, it seems unlikely that the exposed pieces of information will be changing much.
-
Instead of exposing accessor methods in
MetadataExt
on Windows, the rawWIN32_FILE_ATTRIBUTE_DATA
could be returned. We may change, however, to usingBY_HANDLE_FILE_INFORMATION
one day which would make the return value from this function more difficult to implement. -
A
std::os::MetadataExt
trait could be added to access truly common information such as modification/access times across all platforms. The return value would likely be au64
"something" and would be clearly documented as being a lossy abstraction and also only having a platform-specific meaning. -
The
PathExt
trait could perhaps be implemented onDirEntry
, but it doesn't necessarily seem appropriate for all the methods and using inherent methods also seems more logical.
- What is the ultimate role of crates like
liblibc
, and how do we draw the line between them andstd::os
definitions?