vfs abstractions and tarfs #1037

wedsonaf · 2023-09-29T21:54:09Z

No description provided.

rust/kernel/fs.rs

rust/kernel/types.rs

y86-dev

I will leave more comments tomorrow.

y86-dev · 2023-09-29T22:25:46Z

rust/kernel/types.rs

+/// Derive [`ReadableFromBytes`] for the struct defined.
+///
+/// # Examples
+///
+/// ```
+/// kernel::derive_readable_from_bytes! {
+///     #[repr(C)]
+///     struct Inode {
+///         a: u16,
+///         b: u16,
+///         c: u32,
+///     }
+/// }
+/// ```
+#[macro_export]
+macro_rules! derive_readable_from_bytes {
+    ($($(#[$outer:meta])* $outerv:vis struct $name:ident {
+        $($(#[$m:meta])* $v:vis $id:ident : $t:ty),* $(,)?
+    })*)=> {
+        $(
+            $(#[$outer])*
+            $outerv struct $name {
+                $(
+                    $(#[$m])*
+                    $v $id: $t,
+                )*
+            }
+            unsafe impl $crate::types::ReadableFromBytes for $name {}
+            const _: () = {
+                const fn is_readable_from_bytes<T: $crate::types::ReadableFromBytes>() {}
+                $(is_readable_from_bytes::<$t>();)*
+            };
+        )*
+    };
+}


for the RFC this is fine, but I am wondering if we want to have a proper derive macro when we upstream it.

We definitely want a derive macro.

This is what I mentioned to you in Kangrejos: we should generalise your code that derives Zeroable to also derive ReadableFromBytes (and WritableToBytes maybe).

not sure if we want to merge them, since bool is Zeroable, but not ReadableFromBytes.

I don't mean to merge them. I meant to generalise the implementation; your derive implementation for Zeroable basically implements Zeroable for a struct as long as all fields are also Zeroable. So we could implement functionality that implements X for a struct as long as all fields are also X; then we instantiate this for Zeroable and ReadableFromBytes.

I see, not sure how that is going to work with derive macros, but the backend could definitely use some generalization.

Sure, the top-level function is not reusable, but I think something like this would work:

[proc_macro_derive(Zeroable)] pub fn derive_zeroable(input: TokenStream) -> TokenStream { mimic_autotrait::derive("Zeroable", input) } [proc_macro_derive(FromBytes)] pub fn derive_zeroable(input: TokenStream) -> TokenStream { mimic_autotrait::derive("FromBytes", input) }

y86-dev · 2023-09-29T22:47:27Z

rust/kernel/mem_cache.rs

+    pub(crate) fn try_new<T>(
+        name: &'static CStr,
+        init: Option<unsafe extern "C" fn(*mut core::ffi::c_void)>,
+    ) -> Result<Self> {


this function should have some docs. also what exactly am I allowed to input as init? since it is an unsafe function, almost anything can happen, so maybe this should be an unsafe function as well?

I'll add docs.

As for making this function unsafe, I'm not sure it's needed: although init is unsafe, we still require an unsafe block within it to do anything that is unsafe.

My usual test for marking a function unsafe is: does it allow callers to get UB without unsafe blocks? I don't think it's the case here. (Though I'd be happy to be shown otherwise.)

I don't think we want the callee to reason about who's caller. It should only reason about its safety requirement, which is much more than what an unsafe keyword on fn signature can convey.

So I think this function should be unsafe since it's try_new's caller's responsibility to ensure that the safety requirement matches.

What are the safety requirements you guys have in mind?

I don't have any.

what exactly should init do? where does it come from?

It's called when an entry in the cache is allocated. It's meant to initialise the entry.

So it is an initializer for type T? I think using an unsafe extern fn pointer doesnt really fit then.

Why is it optional? What default would the C side know to use for custom Rust types?

Why not use fn() -> T or if you need in-place/pinned init fn() -> impl [Pin]Init<T>?

It's a bit more complicated. It's called when entries are allocated into the cache and only once, so the sequence is: a "slab" is allocated, entries in it are "initialised-once" via this this call (e.g., see https://elixir.bootlin.com/linux/latest/source/fs/inode.c#L416), then when the entry is actually allocated for use, another per-use init is performed (this is the traditional init, but assumes that init-once was called before). Eventually, when the entry is dropped, some form of destructor is called (but leaves the init-once state consistent) and the entry goes back to free state; when it's allocated again (if ever), init-once doesn't have to run again, just the regular init runs, and so on.

The idea is to avoid repeating work for fields that remain consistent. We don't really use this concept in Rust, but the C side for inodes does, so we need to accommodate for that.

Having said all of this, this is not a public API for Rust modules. The only reason I wrote this was to automatically destroy it when it goes out of scope during pin-init and I wasn't sure if scope guards would work as expected. I am also not interested in developing this abstraction any further than I need to for this. Perhaps this will change when it's actually needed by some module.

I see, then I would just hardcode that usage and not take the Option<fn> parameter at all. That way you do not need any safety requirements on that function.

rust/kernel/mem_cache.rs

rust/kernel/fs.rs

y86-dev · 2023-09-29T23:03:27Z

rust/kernel/fs.rs

+        /// Maximum size of a file.
+        ///
+        /// The maximum allowed value is [`super::MAX_LFS_FILESIZE`].
+        pub maxbytes: i64,


what happens when the maximum is exceeded?

Are you asking about when a file has more than maxbytes bytes or when maxbytes is greater than MAX_LFS_FILESIZE?

For the former, I'm not sure, we need to check the C code -- if it can break safety, I will add a check when initialising an inode. The latter is currently not possible because MAX_LFS_FILESIZE happens to be i64::MAX -- I tried to add a check and got an error saying that it's always false, so I removed it.

I meant the latter, is that architecture dependent? If yes, we should add a (static?) check for that. maybe also add it regardless, since it doesn't hurt (of course only if it does not produce a warning)?

The number is architecture-dependent: https://elixir.bootlin.com/linux/latest/source/include/linux/fs.h#L1022

I'll add a check.

y86-dev · 2023-09-29T23:09:19Z

rust/kernel/fs.rs

+    /// An inode that is locked and hasn't been initialised yet.
+    #[repr(transparent)]
+    pub struct NewINode<T: Type + ?Sized>(ARef<INode<T>>);


no type invariant?

What kind of invariant do you have in mind for this?

I tend to write invariants when they help with safety comments.

below you write safety comments like:

// SAFETY: This is a new inode, so it's safe to manipulate it mutably. // SAFETY: inode is a new inode, so it is valid for write. // SAFETY: The new inode failed to be turned into an initialised inode, so it's safe // (and in fact required) to call `iget_failed` on it.

and I do not particularly like them (but haven't given any thought as to what would fit better), maybe a type invariant might help here (e.g. the ARef is unique).

ojeda · 2023-09-29T23:20:59Z

Nits in commit messages:

"This is a modifid version" typo
"This allows file system" -> "This allows file systems" (x2)
"associated to inodes": "to" or "with"?
"xattr_handler" -> "xattr_handler"
"for simple structs that whose fields" -> "for simple structs whose fields"
"the rust version" -> "the Rust version"
"by rust file systems" -> "by Rust file systems"

rust/kernel/fs.rs

fbq · 2023-09-29T23:54:24Z

rust/kernel/lib.rs

+impl<T: Module> InPlaceModule for T {
+    type Init = impl init::PinInit<Self, error::Error>;
+
+    fn init(module: &'static ThisModule) -> Self::Init {


How about implementing this as a default implementation in Module other than the new InPlaceModule?

I.e.

pub trait Module: Sized + Send + Sync { fn pin_init(module: &'static ThisModule) -> impl init::PinInit<Self, error::Error> { ... } }

Of course, this requires RPITIT, but IIUC, it's going to be stabilized soon: rust-lang/rust#115822

Ah, I see, in the sample, impls may be overwritten by a particular module, so it makes sense to have a different trait. But still you can use RPITIT ;-)

We probably can Module be pin-init by default?

Ah, I see, in the sample, impls may be overwritten by a particular module, so it makes sense to have a different trait. But still you can use RPITIT ;-)

RPITIT is indeed more ergonomic. Switched to it.

We probably can Module be pin-init by default?

Perhaps, but not on this patch. Let's discuss this in the meeting.

rust/kernel/time.rs

rust/kernel/types.rs

y86-dev

I think the commit messages could also use some more explanation

y86-dev · 2023-09-30T07:46:07Z

rust/kernel/fs.rs

+        fn read_dir(
+            inode: &INode<Self>,
+            pos: i64,
+            report: impl FnMut(&[u8], i64, u64, DirEntryType) -> bool,
+        ) -> Result<i64>;


why does an iterator not work here?

An iterator is convenient to use, but not so much to write.

Consider the read_dir implementation of tarfs, in particular this part: https://github.com/wedsonaf/linux/blob/451b47e1ededf9fac0c5eae1b76c18dd94f84043/fs/tarfs/tar.rs#L223

We're looping over the data blocks (from the block device) that make up the inode contents, then we're looping over the directory entries within each block, then we're reporting them. It seems much simpler to write these in straight-line code.

If Rust had generators, perhaps an iterator would be the best solution.

rust/kernel/fs.rs

y86-dev · 2023-09-30T07:51:30Z

rust/kernel/folio.rs

+impl Folio {
+    /// Returns the byte position of this folio in its file.
+    pub fn pos(&self) -> i64 {
+        // SAFETY: The folio is valid because the shared reference implies a non-zero refcount.


just make this part of the type invariant.

rust/kernel/folio.rs

y86-dev · 2023-09-30T07:58:21Z

rust/kernel/folio.rs

+/// A locked [`Folio`].
+pub struct LockedFolio<'a>(&'a Folio);


What does it mean for a folio to be locked?

Here's a full description: https://elixir.bootlin.com/linux/latest/source/include/linux/pagemap.h#L934

Since this series concerns read-only file systems, we are only concerned with reading a folio: on success we have to mark it up to date, then unlock it. (The caller of read_page will wait for it to be unlocked, then check if it's up to date; this means the unlock may be asynchronous, but since none of the FS we are implementing require this, we don't support it. Eventually we may need to have to express the fact that we're holding a reference to a locked folio.)

I would suggest to link/mention that lock function from the C side.

y86-dev · 2023-09-30T08:06:44Z

rust/kernel/fs.rs

+        /// Returns the number of bytes written to `outbuf`. If it is too small, returns the number
+        /// of bytes needs to hold the attribute.


I am a bit skeptical about returning Ok when outbuf is too small. What about Result<Either<usize, usize>> (not sure if we have Either already in the kernel, but Alice also needs it for binder)?

This is the way the C API works: a negative number indicates an error, non-negative numbers indicate the length how much is written or the total size, whichever is greater.

Either is already merged upstream. But what would this accomplish?

The implementation of read_xattr for tarfs ends as follows:

if !outbuf.is_empty() { outbuf[0] = b'y'; } Ok(1)

it was just a little weird to me

y86-dev · 2023-09-30T08:11:09Z

rust/kernel/fs.rs

+        /// Named pipe (first-in, first-out) type.
+        Fifo,
+
+        /// Character device type.
+        Chr(u32, u32),
+
        /// Directory type.
        Dir,

+        /// Block device type.
+        Blk(u32, u32),
+
        /// Regular file type.
        Reg,
+
+        /// Symbolic link type.
+        Lnk,
+
+        /// Named unix-domain socket type.
+        Sock,


Are we sure that we want to keep these short names?

These names come from the C definitions, for example, S_IFSOCK, S_IFREG, S_IFLNK, etc. I just removed the S_IF prefix and changed the capitalisation.

The reason I did it this way was to keep a simple relationship between the two.

rust/kernel/fs.rs

nbdd0121

Here's some comments for the non-FS generic parts. I'll try to review the FS parts in the upcoming days.

nbdd0121 · 2023-09-30T12:41:59Z

rust/kernel/lib.rs

+impl<T: Module> InPlaceModule for T {
+    type Init = impl init::PinInit<Self, error::Error>;
+
+    fn init(module: &'static ThisModule) -> Self::Init {


We probably can Module be pin-init by default?

nbdd0121 · 2023-09-30T12:45:37Z

rust/kernel/lib.rs

+        let ptr = $ptr as *const _ as *const u8;
+        let offset = ::core::mem::offset_of!($type, $($f)*) as isize;
+        ptr.wrapping_offset(-offset) as *const $type


Suggested change

let ptr = $ptr as *const _ as *const u8;

let offset = ::core::mem::offset_of!($type, $($f)*) as isize;

ptr.wrapping_offset(-offset) as *const $type

let ptr = $ptr as *const _ as *const u8;

let offset = ::core::mem::offset_of!($type, $($f)*);

ptr.sub(offset) as *const $type

I have a version of this patch in my series: Darksonn@56f551f

I think we should try to use identical copies of this patch.

Yeah, and we could merge it this cycle too, independently. Would you like to send it on its own, Wedson?

Sure, I'll send a single one for us to merge.

I added a build_assert, so it looks like this now:

macro_rules! container_of { ($ptr:expr, $type:ty, $($f:tt)*) => {{ let ptr = $ptr as *const _ as *const u8; let offset = ::core::mem::offset_of!($type, $($f)*); $crate::build_assert!(offset <= isize::MAX as usize); ptr.wrapping_offset(-(offset as isize)) as *const $type }} }

I like it because it is safe. Dereferencing the pointer continues to be unsafe.

WDYT?

Why not wrapping_sub then?

That works for me too, I'll make the change.

rust/kernel/time.rs

nbdd0121 · 2023-09-30T12:56:31Z

rust/kernel/types.rs

@@ -395,3 +395,48 @@ pub enum Either<L, R> {
    /// Constructs an instance of [`Either`] containing a value of type `R`.
    Right(R),
 }
+
+/// A type that can be represented in little-endian bytes.
+pub trait LittleEndian {


Instead of LittleEndian trait, this should be a Num trait, with to_le which forwards to the to_le inherent impl. to_native function could be named from_le and forward to inherent impl.

Num trait makes it trivially to add be methods later (and Copy can also be a supertrait).

I don't see what the difficult is of adding BigEndian later.

I'd rather keep it as LittleEndian.

LittleEndian makes little sense to me, as it describes neither an action or a property.

It describes the property of being encodable in little-endian format.

rust/kernel/types.rs

nbdd0121 · 2023-09-30T12:57:56Z

rust/kernel/types.rs

+
+impl<T: LittleEndian + Copy> LE<T> {
+    /// Returns the native-endian value.
+    pub fn value(&self) -> T {


What's the rationale for one direction to be explicit and one direction implicit? I think either both direction should be explicit or both direction explicit.

I wanted both of them to be implicit, but I get the following error:

RUSTC L rust/kernel.o error[E0210]: type parameter `T` must be covered by another type when it appears before the first local type (`LE<T>`) --> rust/kernel/types.rs:438:6 | 438 | impl<T: LittleEndian + Copy> core::convert::From<LE<T>> for T { | ^ type parameter `T` must be covered by another type when it appears before the first local type (`LE<T>`) | = note: implementing a foreign trait is only possible if at least one of the types for which it is implemented is local, and no uncovered type parameters appear before that first local type = note: in this case, 'before' refers to the following order: `impl<..> ForeignTrait<T1, ..., Tn> for T0`, where `T0` is the first and `Tn` is the last

So I added the value function.

rust/kernel/types.rs

scripts/Makefile.build

Darksonn · 2023-09-30T15:49:39Z

rust/kernel/lib.rs

+        let ptr = $ptr as *const _ as *const u8;
+        let offset = ::core::mem::offset_of!($type, $($f)*) as isize;
+        ptr.wrapping_offset(-offset) as *const $type


I have a version of this patch in my series: Darksonn@56f551f

I think we should try to use identical copies of this patch.

y86-dev

I only looked at the first few patches and will need more time to look at the filesystem stuff again.
I noticed some comments that you have not yet replied to, do you still plan to do that or did they maybe fly under your radar?

rust/kernel/types.rs

y86-dev · 2023-10-14T15:55:33Z

rust/kernel/types.rs

+        // SAFETY: The memory is valid for read because we have a reference to it. We have just
+        // checked the minimum alignment as well, and the length of the slice is calculated from
+        // the length of `Self`.
+        Some(unsafe { core::slice::from_raw_parts(ptr.cast(), data.len() / size_of::<Self>()) })


Move all of the safe operations to the outside of the unsafe block.

y86-dev · 2023-10-14T15:56:57Z

rust/kernel/types.rs

+/// }
+/// ```
+#[macro_export]
+macro_rules! derive_readable_from_bytes {


do you want me to write a proper derive macro for when you want to send this to the list? I do not think that we should upstream a declarative derive macro, since these are incompatible with each other.

Compatibility is not really a concern. We can change anything anytime.

My plan is to send only the VFS-related patches as an RFC. Depending on how that goes we can prioritize improving the dependencies.

I do not want others to try to upstream these kinds of derive macros. I think we should not have this even if it is only temporary.

An RFC is never upstreamed.

If you want to work on a derive macro for this, go for it.

y86-dev · 2023-10-14T15:57:52Z

rust/kernel/mem_cache.rs

+    pub(crate) fn try_new<T>(
+        name: &'static CStr,
+        init: Option<unsafe extern "C" fn(*mut core::ffi::c_void)>,
+    ) -> Result<Self> {


what exactly should init do? where does it come from?

rust/kernel/fs.rs

y86-dev · 2023-10-14T16:02:33Z

rust/kernel/fs.rs

+        // SAFETY: If an instance of `Self` has been successfully created, a call to
+        // `register_filesystem` has necessarily succeeded. So it's ok to call
+        // `unregister_filesystem` on the previously registered fs.


Just make this a type invariant

samples/rust/rust_rofs.rs

rust/kernel/fs.rs

As it is currently declared, the xattr_handler structs are const but the array containing their pointers is not. This patch makes it so that fs modules can place them in .rodata, which makes it harder for accidental/malicious modifications at runtime. Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>

This allows modules to be initialised in-place in pinned memory, which enables the usage of pinned types (e.g., mutexes, spinlocks, driver registrations, etc.) in modules without any extra allocations. Drivers that don't need this may continue to implement `Module` without any changes. Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>

This is a modified version of rust_minimal that is initialised in-place. Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>

This is the Rust version of the macro with the same name in C. It produces a raw pointer to an outer type from a pointer to an inner field. Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>

We'll need it, for example, when calling `register_filesystem` to initialise a file system registration. Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>

This is just the basics of `kmem_cache` to be used in file systems for inodes. All dead-code annotations will be removed in the next commit. Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>

The `allocator_api` feature is needed by drivers that allocate memory. Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>

Allow basic registration and unregistration of Rust file system types. Unregistration happens automatically when a registration variable is dropped (e.g., when it goes out of scope). File systems registered this way are visible in `/proc/filesystems` but cannot be mounted yet because `init_fs_context` fails. Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>

Simplify the declaration of modules that only expose a file system type. They can now do it using the `module_fs` macro. Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>

Introduce a basic sample that for now only registers the file system and doesn't really provide any functionality beyond having it listed in `/proc/filesystems`. New functionality will be added to the sample in subsequent patches as their abstractions are introduced. Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to initialise superblocks, which allows them to be mounted (though they are still empty). Some scaffolding code is added to create an empty directory as the root. It is replaced by proper inode creation in a subsequent patch in this series. Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to handle typed and ref-counted inodes. This is in preparation for creating new inodes (for example, to create the root inode of a new superblock), which comes in the next patch in the series. Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to specify their root directory. Also allow them to create (and do cache lookups of) directory inodes. (More types of inodes are added in subsequent patches in the series.) The `NewINode` type ensures that a new inode is properly initialised before it is marked so. It also facilitates error paths by automatically marking inodes as failed if they're not properly initialised. Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to report the contents of their directory inodes. The reported entries cannot be opened yet. Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to create inodes that are children of a directory inode when they're looked up by name. Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to handle ref-counted folios. Provide the minimum needed to implement `read_folio` (part of `struct address_space_operations`) in read-only file systems and to read uncached blocks. Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to create regular file inodes backed by the page cache. The contents of such files are read into folios via `read_folio`. Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to expose xattrs associated with inodes. `overlayfs` uses an xattr to indicate that a directory is opaque (i.e., that lower layers should not be looked up). The planned file systems need to support opaque directories, so they must be able to implement this. Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to expose their stats. `overlayfs` requires that this be implemented by all file systems that are part of an overlay. The planned file systems need to be overlayed with overlayfs, so they must be able to implement this. Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file system modules to create inodes that are symlinks, pipes, sockets, char devices and block devices (in addition to the already-supported directories and regular files). Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to associate [typed] data to super blocks when they're created. Since we only have a pointer-sized field in which to store the state, it must implement the `ForeignOwnable` trait. Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>

Introduce the abstractions that will be used by modules to handle buffer heads, which will be used to access cached blocks from block devices. All dead-code annotations are removed in the next commit in the series. Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems that are backed by block devices (in addition to in-memory ones). Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file systems to attach extra [typed] data to each inode. If no data is needed, use the regular inode kmem_cache, otherwise we create a new one. Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>

Allow Rust file system modules to use these constants if needed. Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>

It is a file system based on tar files and an index appended to them (to facilitate finding fs entries without having to traverse the whole tar file). Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>