Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Files and FileSystems support #241

Open
fzhinkin opened this issue Nov 24, 2023 · 5 comments
Open

Files and FileSystems support #241

fzhinkin opened this issue Nov 24, 2023 · 5 comments

Comments

@fzhinkin
Copy link
Collaborator

The ability to work with files and filesystems is one of the crucial features that a programming language could provide through its standard library. Even if an application is not built around reading and writing data from storage devices, it still may need access to a filesystem to, for example, read configuration files or write logs.

A constant stream of GitHub issues and feedback on kotlinx-io suggests a demand for fully-featured multiplatform files and filesystems API.

The API exposed through kotlinx.io.files package was created to provide some partial file support quickly and is neither well-designed nor covers basic user needs. I'm proposing to review it, extend or redesign it if necessary.

For the purposes of this proposal, file and filesystem-related features could be split into two coarse categories: basic and extended.
Basic features include the minimal necessary features required for working with files and filesystems, something one may expect from any FS-related API. Extended features include everything else.

Basic features

Below is the list of features to be considered as basic file and filesystem features.

Paths

  • concatenation/joining;
  • segmentation (get a parent path, get file name, file extension);
  • resolution (convert relative path to absolute);
  • relativization (make a path relative to a base path).

FileSystem features

  • create/delete files;
  • create/delete directories;
  • list the content of a directory;
  • files/directory renaming/moving (both atomic and non-atomic);
  • files copying;
  • directory tree traversal;
  • file/directory metadata querying and update (atime/mtime/size/etc., touch);
  • basic permissions support (query and update);
  • symbolic links support (create, resolve);
  • query and change the current working dir;
  • temporary file/directory creation.

Files

  • read (including reading from an arbitrary offset/position);
  • write (including writing at an arbitrary offset/position);
  • truncate/resize.

FileSystems

  • default system filesystem;
  • filesystem aimed for testing.

Most of the features listed above are presented in the vast majority of modern filesystems. However, implementation details for some of them (like file metadata or permissions) vary significantly not only between different operating systems but also within filesystems on the same OS. The way these features will be supported should be decided during the design phase.

Extended features

Extended features include mainly OS- or FS-specific features, niche functionality, or something that is hard to implement reliably.
The list is non-exhaustive.

Paths

  • an ability to manipulate with a path corresponding to a filesystem not represented in a current system (i.e. ability to explicitly process Windows paths on Unix host; see Dart paths package docs for an example).

FileSystem features

  • special files support (pipes, fifo, locks, shmem, etc.);
  • watching FS updates;
  • extended permissions support;
  • hard links support;
  • globs support (find all paths matching a glob like */.txt);
  • partitions/mount points/volumes support (query name, root dir, size, capacity, etc.).

FileSystems

  • archive file systems: Zip, Tar, etc.

Files

  • sendfile/splice support;
  • memory mapped files support.

Nice to have features

Not fitting to either of two previous categories, but nice to have features:

  • allow wrapping java.nio.file.FileSystem on JVM to reuse existing third-party filesystem implementations, like S3-FS.

The plan

The overall plan is to review existing files and filesystem API, redesign it if needed, and then concentrate on supporting all the basic features.
The list of basic features is the subject of change, further subdivision and prioritization.

There are no particular plans regarding features considered extended, but the proposed design should be flexible enough to allow their support in the future.

@willflier
Copy link

Do you have plans to support non-UTF-8 encoding systems for file paths? This is crucial for some non-English systems, such as the GBK encoding used in Windows. Currently, OKIO does not support this(Native part), which makes it very inconvenient to use.

@fzhinkin
Copy link
Collaborator Author

fzhinkin commented Dec 1, 2023

@willflier that's a great question, thanks! Speaking of different encodings support in general, there were no plans to start working on it in the near future. However, I will definitely check what could be done with the fliename/path encoding on Windows.

@fzhinkin
Copy link
Collaborator Author

Yet another option to keep in mind is a support a family openat-based filesystem operations (openat, linkat, renameat, mkdirat, etc).

@zhanghai
Copy link

zhanghai commented Mar 31, 2024

Hi, as mentioned in #163 (comment) I have also been working on a Kotlin file system API, and now I have a proposed API surface at https://github.com/zhanghai/MaterialFiles/tree/filesystem/app/src/main/java/me/zhanghai/kotlin/filesystem. Since this issue is also about a better and more powerful file system API, I hope my design and the rationale behind it might be helpful:

The proposed API surface is designed with my experience/reading in:

  • Java NIO File API and implementing an Android file manager based on it, including a custom default FileSystemProvider based on Linux syscalls (due to missing Java 8 desugaring a few years ago)
  • SFTP/SMB/WebDAV/DocumentsProvider/libarchive protocols/APIs and implementing FileSystemProviders for them.
  • Okio APIs
  • Windows FS APIs (fileapi.h, ntifs.h)
  • NodeJs/libuv FS APIs
  • Web FileSystemHandle APIs

It appears similar to the Java NIO File API, but with a number of (opinionated) choices:

  • Async:
  • Path:
    • Path is an independent data class like in Okio, instead of one type per file system provider like in Java.
    • Path still has the scheme and URI concepts like in Java, to be extensible and allow different types of paths.
    • Path is based on byte strings instead of strings, so that we can correctly represent and work with non-unicode paths.
    • Path does not internally store the name separator (it holds a list of name segments), and it is up to the file system to convert it to an actual underlying representation (and potentially caching that). (This may also be changed if we find a performance issue since it's an impl detail.)
    • Path has a root URI and the URI representation of any path is created by replacing the path component of the root URI with the name segments of that path.
    • Path is simplified conceptually to always have a single root path because paths need to be convertible to URIs and (absolute) URIs must have absolute paths anyway.
    • Path intentionally doesn't support platform-dependent relative paths like "C:foo".
  • FileSystemProvider
    • File system provider is now merely a provider for creating new FileSystem instances and isn't responsible for file operations. Different implementations can have their own way to allow different options when creating file systems, e.g. how to retrieve credentials given a particular SFTP path.
    • FileSystemRegistry now stores all the file systems.
  • FileSystem:
    • All path-based file system operations happens on FileSystem instead of FileSystemProvider.
    • File systems are identified by their root URI.
    • There is only one root directory and it is always the path created from the root URI.
    • All file system operations are provided as extension functions on Path as well.
  • FileMetadata(View)
    • This class replaces BasicFileAttributes(View).
    • There is no longer a concept of attribute view names and the file system always returns one file metadata instance for a file. It may implement a more specific interface like PosixFileMetadata to offer more information specific to a platform.
    • FileMetadataView is Closeable so that it may hold on to a certain file descriptor. This is more efficient for remtoe file systems like SFTP and SMB, and also allows a potential API to use an existing FileHandle/FileDescriptor to open a FileMetadataView.
  • FileContent:
    • This class replaces FileChannel and SeekableByteChannel as well as Okio FileHandle. It is not named FileHandle because its meant to represent only the content and not a generic file descriptor on POSIX or file handle on Windows.
    • It doesn't have a concept of position, and always allows random access, similar to Okio. This simplifies locking and aligns much better with remote file systems like SMB/SFTP etc.
    • It provides openSource() and openSink() ultities to help callers who want sequential access.
  • DirectoryStream:
    • This class returns instances of DirectoryEntry objects upon read().
    • DirectoryEntry contains name instead of Path instances so that the class may work with an existing FileHandle/FileDescriptor in the future (possible with fdopendir/NtQueryDirectoryFile). Helpers like FileSystem.readDirectory(): List<Path> can simplify cases where only the paths are needed.
    • Additional options like READ_METADATA can be passed into FileSystem.openDirectoryStream() so that DirectoryEntry will contain a non-null metadata field. This is designed for remote file system protocols like SFTP and SMB where metadata can be queried when listing a directory for better performance.

Things like kdoc, watch, walking directory tree, cross-provider copy/move, progress listener option during copy/move, a 100% compliant multi-platform URI parser implementation that supports byte strings, are not there yet but shouldn't affect the design in a significant way either.

@fzhinkin fzhinkin added this to the kotlinx-io stabilization milestone May 6, 2024
@zhanghai
Copy link

zhanghai commented May 21, 2024

I made some updates to the code above and finally got it published as a separate personal project https://github.com/zhanghai/filesystem-kt . You can check out its API reference here, e.g. Path.

Notable changes include:

  • Relative paths are now represented by a separate RelativePath class to reduce confusion, e.g. in resolve and relativize APIs when mix matching absolute and relative paths.

    Paths are now always absolute and FileSystem only works with these absolute Paths. To be fair, not many file systems support current working directories except for the local file system, and JVM didn't have an actual runtime modifiable CWD anyway, plus a mutable CWD doesn't work well with multi-threading, so I believe requiring Paths for file system operations is reasoanble. RelativePaths can still be used for creating/manipulating paths.

  • FileSystemProvider is removed and people should use FileSystemRegistry directly.

    This gives back developer the freedom for when/how file systems are created and destroyed, since automatically creating file system instances right before a file operation may not be ideal/possible.

  • A JVM implementation with PlatformFileSystem is added, while a non-JVM variant without a PlatformFileSystem is also available.

  • KDoc was added for some of the classes (notably Path), while the rest of the docs is still WIP.

  • Watching file changes, walking directory tree and cross-provider copy/move are still TBD.

I should note again that this isn't any official Google effort despite that I'm an employee, but just a personal project from my experience working on zhanghai/MaterialFiles and released in the hope it may help with the API design for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants