Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap #98

Closed
joshlf opened this issue Oct 23, 2022 · 8 comments
Closed

Roadmap #98

joshlf opened this issue Oct 23, 2022 · 8 comments

Comments

@joshlf
Copy link
Member

joshlf commented Oct 23, 2022

Overview

This issue describes zerocopy's high-level roadmap both in terms of goals and in terms of concrete steps to achieve those goals.

A slogan often associated with Rust is "Fast, Reliable, Productive. Pick Three." Zerocopy's mission is to make that slogan true by making it so that 100% safe Rust code is just as fast and ergonomic as unsafe Rust code.

In order to live up to that mission, we need to do the following things:

  • Hold ourselves to a high standard for soundness, including in the face of future compiler changes
  • Frame zerocopy in a way that is legible to various user bases, including:
    • Users who don't conceive of themselves as users of unsafe
    • Users who are especially security-conscious
    • Users who care about the crates.io ecosystem
  • Identify gaps which prevent users from choosing zerocopy, and close those gaps
  • Identify features which can improve the ergonomics or performance of code which uses zerocopy, and implement those features

Motivation

A user story

Imagine you are a systems programmer. Any sort of systems software will do, but we need a specific example, so let's say you're writing a networking stack. You care about your software's performance, you care about your software's correctness, and you care about your team's productivity. In order to achieve maximum performance, you want your code to do as few things as possible, and that means avoiding any situation where your data must be converted between representations in the course of processing it. For example, if you are parsing a network packet, you want to operate on the packet in-place: so-called "zero-copy" parsing (hey, that's the name of the crate!).

Your first impulse might be to use unsafe code. Perhaps you write a parsing routine like:

struct UdpHeader {
    src_port: u16,
    dst_port: u16,
    length: u16,
    checksum: u16,
}

struct UdpPacket<'a> {
    header: &'a UdpHeader,
    body: &'a [u8],
}

fn parse_udp_packet(bytes: &[u8]) -> Option<UdpPacket<'_>> {
    if bytes.len() <  {
        return None;
    }

    let (header, body) = bytes.split_at(size_of::<UdpHeader>());
    let header = unsafe { &*header.as_ptr().cast::<UdpHeader>() };
    Some(UdpPacket { header, body })
}

One of your goals is performance, and this code is fast! But you also care about your code's correctness, and you know that unsafe is notoriously difficult to get right (in fact, this implementation is unsound in two ways - can you spot them?). So you decide to be more careful. You spend the day poring over the Rustonomicon and the language reference. You find a fix some bugs in your code, and you even write a pseudo-proof of correctness in a "SAFETY" comment so that others can check your work.

#[repr(C)]
struct UdpHeader {
    src_port: [u8; 16],
    dst_port: [u8; 16],
    length: [u8; 16],
    checksum: [u8; 16],
}

struct UdpPacket<'a> {
    header: &'a UdpHeader,
    body: &'a [u8],
}

fn parse_udp_packet(bytes: &[u8]) -> Option<UdpPacket<'_>> {
    if bytes.len() < size_of::<UdpHeader>() {
        return None;
    }

    let (header, body) = bytes.split_at(size_of::<UdpHeader>());

    // SAFETY: We've validated that `bytes` is at least as long as `UdpHeader`. We know
    // that `UdpHeader` has no alignment requirement because all of its fields are `u8`
    // arrays, which don't have any alignment requirement, and it's `#[repr(C)]` so its
    // alignment is equal to the maximum of the alignments of its fields. Thus, the reference
    // we create here satisfies the layout properties of a `&UdpHeader`.
    //
    // We also know that any sequence of bytes of length `size_of::<UdpHeader>()` is a
    // valid instance of `UdpHeader` because that is true of all of its fields. That means that,
    // regardless of the contents of `bytes`, those contents represent a valid `UdpHeader`,
    // and so this conversion is unconditionally sound.
    //
    // Finally, we know that the created reference has the correct lifetime because of Rust's
    // lifetime elision rules. In particular, the type signature of this function guarantees that
    // the argument and return types have the same lifetime. Thus, the returned `UdpPacket`
    // cannot outlive the bytes it was parsed from.
    let header = unsafe { &*header.as_ptr().cast::<UdpHeader>() };
    Some(UdpPacket { header, body })
}

One of your goals is correctness, and this code is much more likely to be correct than the previous version! But you also care about your productivity, and you just spent an entire day writing a few lines of code. And what happens when you need to change the code? How much work will it take to convince yourself that a change is still correct? What if other, less experienced developers want to work on this section of code? Will they feel comfortable following your logic and feel confident in their ability to make changes without introducing bugs? So you decide to commit to never using unsafe. You modify your code to get rid of it and make whatever changes you need to get it to compile:

#[repr(C)]
struct UdpHeader {
    src_port: u16,
    dst_port: u16,
    length: u16,
    checksum: u16,
}

struct UdpPacket<'a> {
    header: UdpHeader,
    body: &'a [u8],
}

fn parse_udp_packet(bytes: &[u8]) -> Option<UdpPacket<'_>> {
    if bytes.len() < size_of::<UdpHeader>() {
        return None;
    }

    let (src_port_bytes, rest) = bytes.split_at(size_of::<u16>());
    let (dst_port_bytes, rest) = bytes.split_at(size_of::<u16>());
    let (length_bytes, rest) = bytes.split_at(size_of::<u16>());
    let (checksum_bytes, rest) = bytes.split_at(size_of::<u16>());

    let mut src_port = [0; 2];
    let mut dst_port = [0; 2];
    let mut length = [0; 2];
    let mut checksum = [0; 2];

    (&mut src_port[..]).copy_from(src_port_bytes);
    (&mut dst_port[..]).copy_from(dst_port_bytes);
    (&mut length[..]).copy_from(length_bytes);
    (&mut checksum[..]).copy_from(checksum_bytes);

    let header = UdpHeader {
        src_port: u16::from_be_bytes(src_port),
        dst_port: u16::from_be_bytes(dst_port),
        length: u16::from_be_bytes(length),
        checksum: u16::from_be_bytes(checksum),
    };

    Some(UdpPacket { header, body: rest })
}

One of your goals is productivity, and this code is easy to verify, so it was fast to write and will be fast to change in the future! But you also care about performance, and you're doing a lot more bounds checking and copying than you were before. Maybe the optimizer will improve things for you, but there's no way to be sure without benchmarking it, and even if the optimizer is smart enough this time, you might get unlucky with a future change that makes the code just confusing enough to stump the optimizer, leading to unexpected performance cliffs.

You think back on all of these attempts. You wanted fast code, so you used unsafe, but that made you worried about correctness. You also wanted correct code, so you spent a long time reasoning about your code's correctness and you wrote down that reasoning so others could check your work, but that took an entire day and resulted in code that would be slow to change in the future. You wanted to be productive, so you got rid of all of the unsafe, but that made your code slow again. It seems like you just can't win!

Moral

The moral of this story is that, when it comes to operations that touch memory directly, the Rust language and standard library are not on their own sufficient to achieve "Fast, Reliable, Productive. Pick Three." While the basic ingredients are all there, putting them together unavoidably requires sacrifices along one of the dimensions of speed, reliability, and productivity. Zerocopy aims to fill this gap. In the Design section, we outline the current state of zerocopy, identify the gaps between zerocopy's current state and its aspirational future, and outline the steps required reach that future.

Design

As mentioned above, zerocopy's mission is to make good on the slogan Fast, Reliable, Productive. Pick Three. by making it so that 100% safe Rust code is just as fast and ergonomic as unsafe Rust code. Using zerocopy, you could write the parsing code from the previous section like this:

use zerocopy::{FromBytes, Ref, Unaligned};

#[derive(FromBytes, Unaligned)]
#[repr(C)]
struct UdpHeader {
    src_port: [u8; 16],
    dst_port: [u8; 16],
    length: [u8; 16],
    checksum: [u8; 16],
}

struct UdpPacket<'a> {
    header: UdpHeader,
    body: &'a [u8],
}

fn parse_udp_packet(bytes: &[u8]) -> Option<UdpPacket<'_>> {
    let (header, body) = Ref::new_unaligned_from_prefix(bytes)?;
    Some(UdpPacket { header: header.into_ref(), body })
}

This is already a huge step above what you can do with just the standard library, and illustrates what it's like to have an API that takes care of all of this for you.

Thanks to ergonomics and safety like this, the building blocks that zerocopy provides are already being used in a diverse array of domains. Networking is zerocopy's origin and its bread and butter, but it is also used in embedded security firmware, in software emulation, in hypervisors, in filesystems, in high-frequency trading, and much more. However, it still has a ways to go before it can replace most of the unsafe code in the Rust ecosystem.

Gaps

User model

In order to identify gaps, it's helpful to say a bit about who we hope to reach with zerocopy.

Not looking to use unsafe code

A lot of use of unsafe code is by programmers who conceive of themselves primarily as trying to solve some practical problem. If they think about it at all, they think about unsafe code as a tool, not as an object of contemplation. They may have a vague sense of what the phrase "memory safe" means, and they may even know that pointers need to be aligned. They likely don't know that, in order to be able to convert a type to a byte slice, the type must not contain any uninitialized bytes, and they almost certainly have never heard of pointer provenance.

Often, these users don't know a priori that unsafe code is a tool they should consider. Instead, in trying to solve a particular problem, they may come across a crate or a Google search result which points them towards unsafe, or at least points them towards a crate which makes use of unsafe.

In order to reach users in this camp, we must:

  • Frame our APIs in terms that makes sense for their use cases instead of in terms of the language semantics concepts that underlie them. For example, the AsBytes trait should speak primarily about viewing a type as bytes; details about uninitialized bytes should be saved for the "Safety" section of the doc comment.
  • Advertise zerocopy in terms that these users will recognize as describing their needs. This is an area of active development, and threading the needle correctly is difficult.

Security-conscious

On the other end of the spectrum, many of our users come from domains which generally have a high bar for correctness - kernels, hypervisors, cryptography, security hardware, etc. These users are extremely wary of taking external dependencies, and only take dependencies when they absolutely need to or when they have a high degree of trust in an external software artifact.

In order to reach users in this camp, we must:

  • Hold ourselves to a high standard for correctness and soundness
  • Articulate this standard concisely but in sufficient technical detail that a user in this camp can come away from our docs comfortable with taking a dependency on zerocopy

Care about the open-source ecosystem

Many potential users are the authors of crates which are published on crates.io. These users have concerns which are specific to publishing software in an open-source ecosystem. For example:

  • They care about API stability, especially when their use of zerocopy would be visible in their own API
  • They care about compile times
  • They care about the optics of relying on pre-1.0 crates

In order to reach users in this camp, we must have good open-source hygiene. We must:

  • Provide the ability to disable features which are expensive to compile, especially including zerocopy-derive
  • Document and test compliance with a minimum supported Rust version (MSRV)
  • Decide what it would take for us to reach a 1.0 release; while versioning like this may not matter in some worlds (such as monorepos like Google's, where zerocopy was first developed), version numbers are taken as indicators of quality and stability in the open-source world. We need to think about what sorts of long-term API stability guarantees we're willing to make, and then be serious about it when we make them.

Memory model instability and zerocopy's future-soundness guarantee

Rust doesn't have a well-defined memory model. As a result, it's possible that code which is sound under today's compiler may become unsound at some point in the future. If zerocopy wants to be a trustworthy replacement for unsafe code, and ask its users not to worry about soundness, it needs to promise not only soundness, but soundness under any future compiler behavior and under any future memory model.

This work is tracked in #61.

Feature-completeness

Building-block API

Currently, we have a lot of support for combinations of operations. For example, if you want to convert a &mut [u8] to a &mut [T], and you want to check at runtime that your byte slice has the right size and alignment, you would do Ref::new_slice(bytes)?.into_mut_slice(). If you wanted to do the same, but first zero the bytes of the &mut [u8], you'd use the new_slice_zeroed constructor. Even though most of the logic is the same, there's an entirely different constructor.

This has a few downsides:

  • Operations are often fallible when they don't need to be. For example, casting from &[u8; size_of::<T>()] to &T where T: FromBytes + Unaligned can in principle be an infallible operation. However, since all of our APIs take the more general &[u8] type, we have no choice but to perform a bounds check, and thus to return an Option<&T> instead of just &T. This forces the user to .unwrap() or similar, and provides fewer guarantees about codegen.
  • Only explicitly-supported combinations are expressible. If we haven't gotten around to supporting a particular combination, there is no alternative.
  • Users must reach first and only for an API with a very_long_name_that_describes_exactly_what_they_want, and there are a ton to choose from.
  • Our API doesn't encourage users to understand what operations their behavior can be decomposed into.

To address these issues, we want to move towards a world in which there are small "building blocks" which can be combined to perform larger operations. Convenience methods for common combinations will probably still be supported, but we may remove some of the less-frequently used bits of the API so long as users can still express the same behavior using the new building blocks. So far, we intend to build:

  • ByteArray<T> - a polyfill for [u8; size_of::<T>()] until the latter type is stable in a generic context
  • Align<T, A> - a T whose alignment is rounded up to that of A
  • Various conversions which use the ByteArray, Unalign, and Align types to elide length and alignment checks. A few examples:
    • fn unaligned_ref_from_bytes(bytes: &ByteArray<T>) -> &Unalign<T> where T: FromBytes + Sized
    • fn mut_from_bytes(bytes: &mut ByteArray<T>) -> Option<&mut T> where T: FromBytes + AsBytes + Sized
    • fn as_byte_array(&self) -> &ByteArray<Self> where Self: AsBytes + Sized

Another added benefit of these building blocks is that it will make it easier to reason about the soundness of our implementations. Since many of our functions/methods encode complex behavior (exactly what we're talking about in this section), safety arguments are similarly complex. If we were instead able to decompose these into smaller (still unsafe) operations, we could make it easier to reason about the safety of the resulting implementations.

For example, currently, the implementation of Ref::into_ref looks like this:

Current impl
impl<'a, B, T> Ref<B, T>
where
    B: 'a + ByteSlice,
    T: FromBytes,
{
    /// Converts this `Ref` into a reference.
    ///
    /// `into_ref` consumes the `Ref`, and returns a reference to
    /// `T`.
    pub fn into_ref(self) -> &'a T {
        // SAFETY: This is sound because `B` is guaranteed to live for the
        // lifetime `'a`, meaning that a) the returned reference cannot outlive
        // the `B` from which `self` was constructed and, b) no mutable methods
        // on that `B` can be called during the lifetime of the returned
        // reference. See the documentation on `deref_helper` for what
        // invariants we are required to uphold.
        self.deref_helper()
    }
}

impl<B, T> Ref<B, T>
where
    B: ByteSlice,
    T: FromBytes,
{
    /// Creates an immutable reference to `T` with a specific lifetime.
    ///
    /// # Safety
    ///
    /// The type bounds on this method guarantee that it is safe to create an
    /// immutable reference to `T` from `self`. However, since the lifetime `'a`
    /// is not required to be shorter than the lifetime of the reference to
    /// `self`, the caller must guarantee that the lifetime `'a` is valid for
    /// this reference. In particular, the referent must exist for all of `'a`,
    /// and no mutable references to the same memory may be constructed during
    /// `'a`.
    unsafe fn deref_helper<'a>(&self) -> &'a T {
        &*self.0.as_ptr().cast::<T>()
    }
}

I'm sure that this is sound, but I've always been a bit nervous about how complex the argument is. By contrast, we can simplify this using the building blocks we intend to introduce. In 2c67380 (this commit hasn't been merged, and may be deleted at some point), we change the above code to:

New impl
impl<'a, B, T> Ref<B, T>
where
    B: ByteSlice + Into<&'a [u8]>,
    T: FromBytes,
{
    /// Converts this `Ref` into a reference.
    ///
    /// `into_ref` consumes the `Ref`, and returns a reference to
    /// `T`.
    pub fn into_ref(self) -> &'a T {
        let bytes = self.0.into();
        // SAFETY: `Ref` upholds the invariant that `.0`'s length is
        // equal to `size_of::<T>()`. `size_of::<ByteArray<T>>() ==
        // size_of::<T>()`, so this call is sound.
        let byte_array = unsafe { ByteArray::from_slice_unchecked(bytes) };
        // SAFETY: `Ref` upholds the invariant that `.0` satisfies
        // `T`'s alignment requirement.
        unsafe { T::ref_from_bytes_unchecked(byte_array) }
    }
}

I find this implementation much easier to reason about. The safety invariants on ByteArray::from_slice_unchecked and FromBytes::ref_from_bytes_unchecked are straightforward, and it is much more obvious from reading those functions that the lifetimes are propagated correctly. (Note that this commit also adds a requirement to ByteSlice about what an Into<&'a [u8]> impl is required to return.)

Simplify ByteSlice's definition and make it un-sealed

Currently, ByteSlice has both a Deref<Target=[u8]> bound and an as_ptr(&self) -> *const u8 method. The latter is probably redundant given the former, and adds another method that we have to document safety invariants for. ByteSlice's safety invariants are somewhat subtle, so getting rid of as_ptr would be very nice.

It would also make it easier for others to implement ByteSlice for their own types. We've had users request this, but it's currently impossible because ByteSlice is sealed. While we are confident that our existing impls of ByteSlice and ByteSliceMut are sound for our use cases, we would need to formalize the safety requirements for any types to implement these traits before we make them un-sealed. This is probably a good idea anyway because it may surface ways that we can simplify the API.

Split ByteSlice so that split_at is in a different trait (#1)

Currently, ByteSlice has a split_at(self, mid: usize) -> (Self, Self) method analogous to the slice method of the same name. Our performance design requires this method to be very cheap, which precludes implementing ByteSlice for types like Vec, for which split_at would require allocation.

Instead, #1 tracks splitting ByteSlice into two traits so a type such as Vec can implement the base ByteSlice trait without needing to implement split_at. Most of the zerocopy API can operate on this simpler trait, while a few functions and methods would still require the ability to call split_at.

Elide length or alignment checks when they can be verified statically

Tracked in #280.

Support types which are not FromBytes, but which can be converted from a sequence of zeroes

Tracked in #30.

Support fallible conversions

Tracked in #5; in progress.

Support conversions in const fn

Tracked in #115.

Support converting &[[u8; size_of::<T>()]] to &[T]

What is says on the tin.

Rename LayoutVerified to Ref (#68)

What is says on the tin. LayoutVerified is descriptive if you understand type theory and the concept of a "witness" (although we probably should have put "witness" in the name...), but it's a meaningless term for most users. We should rename it to Ref or similar - after all, it's just a reference with a few niceties.

Miscellaneous features

API polish

Documentation is complete, thorough, and up-to-date (#32)

High confidence in correctness and soundness

Tested and stable on all platforms

Usable in Cargo and crates.io ecosystem

Compile-time performance

Known bugs are fixed

Code quality

Developer experience

@joshlf joshlf changed the title [WIP] 2022 Q4 Roadmap 2022 Q4 Roadmap Oct 27, 2022
@qwandor
Copy link

qwandor commented Apr 3, 2023

Hello, is there a plan to release a 1.0 version once this roadmap is complete?

@joshlf
Copy link
Member Author

joshlf commented Apr 26, 2023

There are no concrete plans to release a 1.0 version. Currently we're working on version 0.7 (our latest alpha release is 0.7.0-alpha.3), and I expect that a lot of the work on this roadmap will go into that release (although not necessarily all of it).

@joshlf joshlf changed the title 2022 Q4 Roadmap 2023 Roadmap Apr 26, 2023
@KodrAus
Copy link

KodrAus commented Apr 27, 2023

Does the 0.7 version of zerocopy support implementing AsBytes and FromBytes manually without using a custom derive? Requiring a custom derive makes optional integration more difficult, especially for libraries like bitflags where code is generated in end-user's crates.

@joshlf
Copy link
Member Author

joshlf commented Aug 1, 2023

Does the 0.7 version of zerocopy support implementing AsBytes and FromBytes manually without using a custom derive? Requiring a custom derive makes optional integration more difficult, especially for libraries like bitflags where code is generated in end-user's crates.

Yes, and so does 0.6.x. The only speedbump is the #[doc(hidden)] method, only_derive_is_allowed_to_implement_this_trait, which you can of course implement yourself. It's just meant to steer folks towards the custom derives.

@joshlf
Copy link
Member Author

joshlf commented Aug 7, 2023

Hey @qwandor and @KodrAus, just wanted to let you know that we just released 0.6.2, which includes all of the non-breaking changes from 0.7.0.

@joshlf joshlf changed the title 2023 Roadmap Roadmap Aug 20, 2023
@joshlf joshlf mentioned this issue Sep 3, 2023
15 tasks
@briansmith
Copy link

These users are extremely wary of taking external dependencies, and only take dependencies when they absolutely need to or when they have a high degree of trust in an external software artifact.

This describes me well. I considered multiple times using zerocopy and extending zerocopy. Pulling it out of Fuchsia and making accessible here on GitHub was a big step towards that. But then the formation of the "safe transmute" working group in the rust-lang organization made it seem like it would be just better to wait until the most important functionality is in libstd and guaranteed by the language/lib spec more explicitly.

@joshlf made a PR for ring that shows how zerocopy benefits ring by reducing its unsafe code that's duplicative of zerocopy. And zerocopy has a higher bar for documenting and validating its correctness than my more informal approach in my own code. OTOH zerocopy is so large that I pretty much just have to take it on faith that y'all care so much about getting all the details right that I don't need to bother looking through it, because I can't (don't have time).

Ultimately, I think people just would rather have as much of the functinoality w.r.t. transmuting and casting in the standard library as possible. I think if this is seen as a prototype that helps that happen, more people will be eager to adopt it.

@joshlf
Copy link
Member Author

joshlf commented Oct 10, 2023

These users are extremely wary of taking external dependencies, and only take dependencies when they absolutely need to or when they have a high degree of trust in an external software artifact.

This describes me well. I considered multiple times using zerocopy and extending zerocopy. Pulling it out of Fuchsia and making accessible here on GitHub was a big step towards that. But then the formation of the "safe transmute" working group in the rust-lang organization made it seem like it would be just better to wait until the most important functionality is in libstd and guaranteed by the language/lib spec more explicitly.

@joshlf made a PR for ring that shows how zerocopy benefits ring by reducing its unsafe code that's duplicative of zerocopy. And zerocopy has a higher bar for documenting and validating its correctness than my more informal approach in my own code. OTOH zerocopy is so large that I pretty much just have to take it on faith that y'all care so much about getting all the details right that I don't need to bother looking through it, because I can't (don't have time).

Ultimately, I think people just would rather have as much of the functinoality w.r.t. transmuting and casting in the standard library as possible. I think if this is seen as a prototype that helps that happen, more people will be eager to adopt it.

Just wanted to give you a heads up that we've added documentation that address a lot of this; if you get a chance to take a look, let me know if you have any feedback. ring is exactly the kind of consumer we're hoping to target with our policies - we hope to get to a point where crates like ring feel comfortable taking a dependency on zerocopy - so your feedback here has been really helpful, and it helped shape what we wrote in these PRs.

See #405, #485, and #484. These are available on main in README.md and POLICIES.md.

@joshlf
Copy link
Member Author

joshlf commented Oct 6, 2024

We're not using this issue as a tracking issue anymore.

@joshlf joshlf closed this as completed Oct 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants