Skip to content

Latest commit

 

History

History
558 lines (446 loc) · 22.3 KB

1044-io-fs-2.1.md

File metadata and controls

558 lines (446 loc) · 22.3 KB

Summary

Expand the scope of the std::fs module by enhancing existing functionality, exposing lower-level representations, and adding a few new functions.

Motivation

The current std::fs module serves many of the basic needs of interacting with a filesystem, but is missing a lot of useful functionality. For example, none of these operations are possible in stable Rust today:

  • Inspecting a file's modification/access times
  • Reading low-level information like that contained in libc::stat
  • Inspecting the unix permission bits on a file
  • Blanket setting the unix permission bits on a file
  • Leveraging DirEntry for the extra metadata it might contain
  • Reading the metadata of a symlink (not what it points at)
  • Resolving all symlink in a path

There is some more functionality listed in the RFC issue, but this RFC will not attempt to solve the entirety of that issue at this time. This RFC strives to expose APIs for much of the functionality listed above that is on the track to becoming #[stable] soon.

Non-goals of this RFC

There are a few areas of the std::fs API surface which are not considered goals for this RFC. It will be left for future RFCs to add new APIs for these areas:

  • Enhancing copy to copy directories recursively or configuring how copying happens.
  • Enhancing or stabilizing walk and its functionality.
  • Temporary files or directories

Detailed design

First, a vision for how lowering APIs in general will be presented, and then a number of specific APIs will each be proposed. Many of the proposed APIs are independent from one another and this RFC may not be implemented all-in-one-go but instead piecemeal over time, allowing the designs to evolve slightly in the meantime.

Lowering APIs

The vision for the os module

One of the principles of IO reform was to:

Provide hooks for integrating with low-level and/or platform-specific APIs.

The original RFC went into some amount of detail for how this would look, in particular by use of the os module. Part of the goal of this RFC is to flesh out that vision in more detail.

Ultimately, the organization of os is planned to look something like the following:

os
  unix          applicable to all cfg(unix) platforms; high- and low-level APIs
    io            extensions to std::io
    fs            extensions to std::fs
    net           extensions to std::net
    env           extensions to std::env
    process       extensions to std::process
    ...
  linux         applicable to linux only
    io, fs, net, env, process, ...
  macos         ...
  windows       ...

APIs whose behavior is platform-specific are provided only within the std::os hierarchy, making it easy to audit for usage of such APIs. Organizing the platform modules internally in the same way as std makes it easy to find relevant extensions when working with std.

It is emphatically not the goal of the std::os::* modules to provide bindings to all system APIs for each platform; this work is left to external crates. The goals are rather to:

  1. Facilitate interop between abstract types like File that std provides and the underlying system. This is done via "lowering": extension traits like AsRawFd allow you to extract low-level, platform-specific representations out of std types like File and TcpStream.

  2. Provide high-level but platform-specific APIs that feel like those in the rest of std. Just as with the rest of std, the goal here is not to include all possible functionality, but rather the most commonly-used or fundamental.

Lowering makes it possible for external crates to provide APIs that work "seamlessly" with std abstractions. For example, a crate for Linux might provide an epoll facility that can work directly with std::fs::File and std::net::TcpStream values, completely hiding the internal use of file descriptors. Eventually, such a crate could even be merged into std::os::unix, with minimal disruption -- there is little distinction between std and other crates in this regard.

Concretely, lowering has two ingredients:

  1. Introducing one or more "raw" types that are generally direct aliases for C types (more on this in the next section).

  2. Providing an extension trait that makes it possible to extract a raw type from a std type. In some cases, it's possible to go the other way around as well. The conversion can be by reference or by value, where the latter is used mainly to avoid the destructor associated with a std type (e.g. to extract a file descriptor from a File and eliminate the File object, without closing the file).

While we do not seek to exhaustively bind types or APIs from the underlying system, it is a goal to provide lowering operations for every high-level type to a system-level data type, whenever applicable. This RFC proposes several such lowerings that are currently missing from std::fs.

std::os::platform::raw

Each of the primitives in the standard library will expose the ability to be lowered into its component abstraction, facilitating the need to define these abstractions and organize them in the platform-specific modules. This RFC proposes the following guidelines for doing so:

  • Each platform will have a raw module inside of std::os which houses all of its platform specific definitions.
  • Only type definitions will be contained in raw modules, no function bindings, methods, or trait implementations.
  • Cross-platform types (e.g. those shared on all unix platforms) will be located in the respective cross-platform module. Types which only differ in the width of an integer type are considered to be cross-platform.
  • Platform-specific types will exist only in the raw module for that platform. A platform-specific type may have different field names, components, or just not exist on other platforms.

Differences in integer widths are not considered to be enough of a platform difference to define in each separate platform's module, meaning that it will be possible to write code that uses os::unix but doesn't compile on all Unix platforms. It is believed that most consumers of these types will continue to store the same type (e.g. not assume it's an i32) throughout the application or immediately cast it to a known type.

To reiterate, it is not planned for each raw module to provide exhaustive bindings to each platform. Only those abstractions which the standard library is lowering into will be defined in each raw module.

Lowering Metadata (all platforms)

Currently the Metadata structure exposes very few pieces of information about a file. Some of this is because the information is not available across all platforms, but some of it is also because the standard library does not have the appropriate abstraction to return at this time (e.g. time stamps). The raw contents of Metadata (a stat on Unix), however, should be accessible via lowering no matter what.

The following trait hierarchy and new structures will be added to the standard library.

mod os::windows::fs {
    pub trait MetadataExt {
        fn file_attributes(&self) -> u32; // `dwFileAttributes` field
        fn creation_time(&self) -> u64; // `ftCreationTime` field
        fn last_access_time(&self) -> u64; // `ftLastAccessTime` field
        fn last_write_time(&self) -> u64; // `ftLastWriteTime` field
        fn file_size(&self) -> u64; // `nFileSizeHigh`/`nFileSizeLow` fields
    }
    impl MetadataExt for fs::Metadata { ... }
}

mod os::unix::fs {
    pub trait MetadataExt {
        fn as_raw(&self) -> &Metadata;
    }
    impl MetadataExt for fs::Metadata { ... }

    pub struct Metadata(raw::stat);
    impl Metadata {
        // Accessors for fields available in `raw::stat` for *all* unix platforms
        fn dev(&self) -> raw::dev_t; // st_dev field
        fn ino(&self) -> raw::ino_t; // st_ino field
        fn mode(&self) -> raw::mode_t; // st_mode field
        fn nlink(&self) -> raw::nlink_t; // st_nlink field
        fn uid(&self) -> raw::uid_t; // st_uid field
        fn gid(&self) -> raw::gid_t; // st_gid field
        fn rdev(&self) -> raw::dev_t; // st_rdev field
        fn size(&self) -> raw::off_t; // st_size field
        fn blksize(&self) -> raw::blksize_t; // st_blksize field
        fn blocks(&self) -> raw::blkcnt_t; // st_blocks field
        fn atime(&self) -> (i64, i32); // st_atime field, (sec, nsec)
        fn mtime(&self) -> (i64, i32); // st_mtime field, (sec, nsec)
        fn ctime(&self) -> (i64, i32); // st_ctime field, (sec, nsec)
    }
}

// st_flags, st_gen, st_lspare, st_birthtim, st_qspare
mod os::{linux, macos, freebsd, ...}::fs {
    pub mod raw {
        pub type dev_t = ...;
        pub type ino_t = ...;
        // ...
        pub struct stat {
            // ... same public fields as libc::stat
        }
    }
    pub trait MetadataExt {
        fn as_raw_stat(&self) -> &raw::stat;
    }
    impl MetadataExt for os::unix::fs::RawMetadata { ... }
    impl MetadataExt for fs::Metadata { ... }
}

The goal of this hierarchy is to expose all of the information in the OS-level metadata in as cross-platform of a method as possible while adhering to the design principles of the standard library.

The interesting part about working in a "cross platform" manner here is that the makeup of libc::stat on unix platforms can vary quite a bit between platforms. For example some platforms have a st_birthtim field while others do not. To enable as much ergonomic usage as possible, the os::unix module will expose the intersection of metadata available in libc::stat across all unix platforms. The information is still exposed in a raw fashion (in terms of the values returned), but methods are required as the raw structure is not exposed. The unix platforms then leverage the more fine-grained modules in std::os (e.g. linux and macos) to return the raw libc::stat structure. This will allow full access to the information in libc::stat in all platforms with clear opt-in to when you're using platform-specific information.

One of the major goals of the os::unix::fs design is to enable as much functionality as possible when programming against "unix in general" while still allowing applications to choose to only program against macos, for example.

Fate of Metadata::{accesed, modified}

At this time there is no suitable type in the standard library to represent the return type of these two functions. The type would either have to be some form of time stamp or moment in time, both of which are difficult abstractions to add lightly.

Consequently, both of these functions will be deprecated in favor of requiring platform-specific code to access the modification/access time of files. This information is all available via the MetadataExt traits listed above.

Eventually, once a std type for cross-platform timestamps is available, these methods will be re-instated as returning that type.

Lowering and setting Permissions (Unix)

Note: this section only describes behavior on unix.

Currently there is no stable method of inspecting the permission bits on a file, and it is unclear whether the current unstable methods of doing so, PermissionsExt::mode, should be stabilized. The main question around this piece of functionality is whether to provide a higher level abstractiong (e.g. similar to the bitflags crate) for the permission bits on unix.

This RFC proposes considering the methods for stabilization as-is and not pursuing a higher level abstraction of the unix permission bits. To facilitate in their inspection and manipulation, however, the following constants will be added:

mod os::unix::fs {
    pub const USER_READ: raw::mode_t;
    pub const USER_WRITE: raw::mode_t;
    pub const USER_EXECUTE: raw::mode_t;
    pub const USER_RWX: raw::mode_t;
    pub const OTHER_READ: raw::mode_t;
    pub const OTHER_WRITE: raw::mode_t;
    pub const OTHER_EXECUTE: raw::mode_t;
    pub const OTHER_RWX: raw::mode_t;
    pub const GROUP_READ: raw::mode_t;
    pub const GROUP_WRITE: raw::mode_t;
    pub const GROUP_EXECUTE: raw::mode_t;
    pub const GROUP_RWX: raw::mode_t;
    pub const ALL_READ: raw::mode_t;
    pub const ALL_WRITE: raw::mode_t;
    pub const ALL_EXECUTE: raw::mode_t;
    pub const ALL_RWX: raw::mode_t;
    pub const SETUID: raw::mode_t;
    pub const SETGID: raw::mode_t;
    pub const STICKY_BIT: raw::mode_t;
}

Finally, the set_permissions function of the std::fs module is also proposed to be marked #[stable] soon as a method of blanket setting permissions for a file.

Constructing Permissions

Currently there is no method to construct an instance of Permissions on any platform. This RFC proposes adding the following APIs:

mod os::unix::fs {
    pub trait PermissionsExt {
        fn from_mode(mode: raw::mode_t) -> Self;
    }
    impl PermissionsExt for Permissions { ... }
}

This RFC does not propose yet adding a cross-platform way to construct a Permissions structure due to the radical differences between how unix and windows handle permissions.

Creating directories with permissions

Currently the standard library does not expose an API which allows setting the permission bits on unix or security attributes on Windows. This RFC proposes adding the following API to std::fs:

pub struct DirBuilder { ... }

impl DirBuilder {
    /// Creates a new set of options with default mode/security settings for all
    /// platforms and also non-recursive.
    pub fn new() -> Self;

    /// Indicate that directories create should be created recursively, creating
    /// all parent directories if they do not exist with the same security and
    /// permissions settings.
    pub fn recursive(&mut self, recursive: bool) -> &mut Self;

    /// Create the specified directory with the options configured in this
    /// builder.
    pub fn create<P: AsRef<Path>>(&self, path: P) -> io::Result<()>;
}

mod os::unix::fs {
    pub trait DirBuilderExt {
        fn mode(&mut self, mode: raw::mode_t) -> &mut Self;
    }
    impl DirBuilderExt for DirBuilder { ... }
}

mod os::windows::fs {
    // once a `SECURITY_ATTRIBUTES` abstraction exists, this will be added
    pub trait DirBuilderExt {
        fn security_attributes(&mut self, ...) -> &mut Self;
    }
    impl DirBuilderExt for DirBuilder { ... }
}

This sort of builder is also extendable to other flavors of functions in the future, such as C++'s template parameter:

/// Use the specified directory as a "template" for permissions and security
/// settings of the new directories to be created.
///
/// On unix this will issue a `stat` of the specified directory and new
/// directories will be created with the same permission bits. On Windows
/// this will trigger the use of the `CreateDirectoryEx` function.
pub fn template<P: AsRef<Path>>(&mut self, path: P) -> &mut Self;

At this time, however, it it not proposed to add this method to DirBuilder.

Adding FileType

Currently there is no enumeration or newtype representing a list of "file types" on the local filesystem. This is partly done because the need is not so high right now. Some situations, however, imply that it is more efficient to learn the file type at once instead of testing for each individual file type itself.

For example some platforms' DirEntry type can know the FileType without an extra syscall. If code were to test a DirEntry separately for whether it's a file or a directory, it may issue more syscalls necessary than if it instead learned the type and then tested that if it was a file or directory.

The full set of file types, however, is not always known nor portable across platforms, so this RFC proposes the following hierarchy:

#[derive(Copy, Clone, PartialEq, Eq, Hash)]
pub struct FileType(..);

impl FileType {
    pub fn is_dir(&self) -> bool;
    pub fn is_file(&self) -> bool;
    pub fn is_symlink(&self) -> bool;
}

Extension traits can be added in the future for testing for other more flavorful kinds of files on various platforms (such as unix sockets on unix platforms).

Dealing with is_{file,dir} and file_type methods

Currently the fs::Metadata structure exposes stable is_file and is_dir accessors. The struct will also grow a file_type accessor for this newtype struct being added. It is proposed that Metadata will retain the is_{file,dir} convenience methods, but no other "file type testers" will be added.

Enhancing symlink support

Currently the std::fs module provides a soft_link and read_link function, but there is no method of doing other symlink related tasks such as:

  • Testing whether a file is a symlink
  • Reading the metadata of a symlink, not what it points to

The following APIs will be added to std::fs:

/// Returns the metadata of the file pointed to by `p`, and this function,
/// unlike `metadata` will **not** follow symlinks.
pub fn symlink_metadata<P: AsRef<Path>>(p: P) -> io::Result<Metadata>;

Binding realpath

There's a long-standing issue that the unix function realpath is not bound, and this RFC proposes adding the following API to the fs module:

/// Canonicalizes the given file name to an absolute path with all `..`, `.`,
/// and symlink components resolved.
///
/// On unix this function corresponds to the return value of the `realpath`
/// function, and on Windows this corresponds to the `GetFullPathName` function.
///
/// Note that relative paths given to this function will use the current working
/// directory as a base, and the current working directory is not managed in a
/// thread-local fashion, so this function may need to be synchronized with
/// other calls to `env::change_dir`.
pub fn canonicalize<P: AsRef<Path>>(p: P) -> io::Result<PathBuf>;

Tweaking PathExt

Currently the PathExt trait is unstable, yet it is quite convenient! The main motivation for its #[unstable] tag is that it is unclear how much functionality should be on PathExt versus the std::fs module itself. Currently a small subset of functionality is offered, but it is unclear what the guiding principle for the contents of this trait are.

This RFC proposes a few guiding principles for this trait:

  • Only read-only operations in std::fs will be exposed on PathExt. All operations which require modifications to the filesystem will require calling methods through std::fs itself.

  • Some inspection methods on Metadata will be exposed on PathExt, but only those where it logically makes sense for Path to be the self receiver. For example PathExt::len will not exist (size of the file), but PathExt::is_dir will exist.

Concretely, the PathExt trait will be expanded to:

pub trait PathExt {
    fn exists(&self) -> bool;
    fn is_dir(&self) -> bool;
    fn is_file(&self) -> bool;
    fn metadata(&self) -> io::Result<Metadata>;
    fn symlink_metadata(&self) -> io::Result<Metadata>;
    fn canonicalize(&self) -> io::Result<PathBuf>;
    fn read_link(&self) -> io::Result<PathBuf>;
    fn read_dir(&self) -> io::Result<ReadDir>;
}

impl PathExt for Path { ... }

Expanding DirEntry

Currently the DirEntry API is quite minimalistic, exposing very few of the underlying attributes. Platforms like Windows actually contain an entire Metadata inside of a DirEntry, enabling much more efficient walking of directories in some situations.

The following APIs will be added to DirEntry:

impl DirEntry {
    /// This function will return the filesystem metadata for this directory
    /// entry. This is equivalent to calling `fs::symlink_metadata` on the
    /// path returned.
    ///
    /// On Windows this function will always return `Ok` and will not issue a
    /// system call, but on unix this will always issue a call to `stat` to
    /// return metadata.
    pub fn metadata(&self) -> io::Result<Metadata>;

    /// Return what file type this `DirEntry` contains.
    ///
    /// On some platforms this may not require reading the metadata of the
    /// underlying file from the filesystem, but on other platforms it may be
    /// required to do so.
    pub fn file_type(&self) -> io::Result<FileType>;

    /// Returns the file name for this directory entry.
    pub fn file_name(&self) -> OsString;
}

mod os::unix::fs {
    pub trait DirEntryExt {
        fn ino(&self) -> raw::ino_t; // read the d_ino field
    }
    impl DirEntryExt for fs::DirEntry { ... }
}

Drawbacks

  • This is quite a bit of surface area being added to the std::fs API, and it may perhaps be best to scale it back and add it in a more incremental fashion instead of all at once. Most of it, however, is fairly straightforward, so it seems prudent to schedule many of these features for the 1.1 release.

  • Exposing raw information such as libc::stat or WIN32_FILE_ATTRIBUTE_DATA possibly can hamstring altering the implementation in the future. At this point, however, it seems unlikely that the exposed pieces of information will be changing much.

Alternatives

  • Instead of exposing accessor methods in MetadataExt on Windows, the raw WIN32_FILE_ATTRIBUTE_DATA could be returned. We may change, however, to using BY_HANDLE_FILE_INFORMATION one day which would make the return value from this function more difficult to implement.

  • A std::os::MetadataExt trait could be added to access truly common information such as modification/access times across all platforms. The return value would likely be a u64 "something" and would be clearly documented as being a lossy abstraction and also only having a platform-specific meaning.

  • The PathExt trait could perhaps be implemented on DirEntry, but it doesn't necessarily seem appropriate for all the methods and using inherent methods also seems more logical.

Unresolved questions

  • What is the ultimate role of crates like liblibc, and how do we draw the line between them and std::os definitions?