migrate to rkyv-0.8 #574

bunnie · 2024-09-17T17:41:16Z

bunnie
Sep 17, 2024
Maintainer

Xous currently uses rkyv 0.4.3 as a method for serializing Rust objects into memory messages, so that structures can be conveniently passed between processes.

The subject of upgrading rkyv was tabled until rkvv "stabilized" - according to the maintainer, 0.8 is the big stabilization release (see rkyv/rkyv#67). As a side note, there's a whole bunch of dependabot flags generated (https://github.com/betrusted-io/xous-core/security/dependabot) related to rkyv. None of them actually apply to how we use it, but a side effect of upgrading is maybe we'll clear some of those too.

So, now it's time to have a look inside rkyv and see where it's at.

I've taken a stab at doing a trial integration of rkyv with the Xous API in a new directly I've created called planning. The intention of the planning directory is a place where we can stick experiments that aren't built into the tree, but is in the main branch of the repo so we can refer to it easily in discussions.

The Opportunities

There are some significant opportunities that come from upgrading to rkyv-0.8:

We can now potentially sling heap-allocated structures (such as Vec, String) between processes without having to manually serialize/deserialize them.
The flat version of the structures - i.e., the zerocopy versions that are created with the as_flat() method - are much more usable due to the ability to Derive traits like Eq, PartialEq on them
Potential to remove an unsafe operation in the transmutation of the Buffers because the latest rkyv offers buffer validation. This comes at a computational overhead.

A little more on flat versus original structures - as a reminder, this is the routine that generates flat versions:

xous-core/xous-ipc/src/buffer.rs

Lines 228 to 238 in fd0eaaa

    
           /// Zero-copy representation of the data on the receiving side, wrapped in an "Archived" trait and left in 
        
           /// the heap. Cheap so uses "as_" prefix. 
        
           #[allow(dead_code)] 
        
           pub fn as_flat<T, U>(&self) -> core::result::Result<&U, ()> 
        
           where 
        
               T: rkyv::Archive<Archived = U>, 
        
           { 
        
               let pos = self.offset.map(|o| o.get()).unwrap_or_default(); 
        
               let r = unsafe { rkyv::archived_value::<T>(self.slice, pos) }; 
        
               Ok(r) 
        
           }

And this generates the original:

xous-core/xous-ipc/src/buffer.rs

Lines 243 to 252 in fd0eaaa

    
               pub fn to_original<T, U>(&self) -> core::result::Result<T, ()> 
        
               where 
        
                   T: rkyv::Archive<Archived = U>, 
        
                   U: rkyv::Deserialize<T, dyn Fallible<Error = XousUnreachable>>, 
        
               { 
        
                   let pos = self.offset.map(|o| o.get()).unwrap_or_default(); 
        
                   let r = unsafe { rkyv::archived_value::<T>(self.slice, pos) }; 
        
                   Ok(r.deserialize(&mut XousDeserializer {}).unwrap()) 
        
               } 
        
           }

Flat versions map the ArchivedT-style trait directly onto the received data pages mapped into the receiver's process space. This means that no time or space is wasted deserializing the data.

Original versions do a full deserialization, creating effectively a second, redundant copy of the passed structure.

Unfortunately, to a first order, 98% of the IPC calls in Xous use the original version, because the flat version has type ArchivedT, but we actually need T. The latest rkyv creates an opportunity for more flat usage by Derive-ing more useful attributes on ArchivedT, such as Eq, PartialEq, etc. which means we can now pass some Enum of say, EnumT and on the receiving side do more operations on ArchivedEnumT because the traits now exist to do that.

The Difficulties

An attempt was made to refactor in a planning crate the Buffer routine from xous-ipc, which forms the backbone of IPC calls between crates in Xous. The code can be seen here:

https://github.com/betrusted-io/xous-core/blob/main/planning/rkyv-migration2/src/buffer.rs

This version of Buffer creates a "sham" memory region whose intent is to demonstrate a complex, compound structure being mapped into a binary format, manipulated, and returned. I picked the TextView object as it is one of the richest objects in the system which also includes a String-like object (uses the Xous IPC String in fact).

The Buffer routine patches the send_message routine to "fake" the sending and instead just parse the TextView.

The supporting definitions for the TextView were copied here:

https://github.com/betrusted-io/xous-core/blob/main/planning/rkyv-migration2/src/testcases.rs

In general, the good news is that the API more or less makes it through the migration intact. However, there's a couple of changes that might make sense, but I'm unsure on how to proceed, hence, the discussion.

Do we deprecate the IPC String routine entirely? Or leave it around for backward compatibility.
Do we rework the core into_buf() routine so that it is "infallible"?

Deprecating IPC String

The xous-ipc String was always a bit weird because you had to define the maximum String length and the serialization is fallible in case you exceed the length of the String. For example, in xous-names, a Registration had to have a defined maximum length for the server name:

xous-core/api/xous-api-names/src/api.rs

Lines 86 to 90 in fd0eaaa

    
           #[derive(Debug, rkyv::Archive, rkyv::Serialize, rkyv::Deserialize)] 
        
           pub struct Registration { 
        
               pub name: xous_ipc::String<64>, 
        
               pub conn_limit: Option<u32>, 
        
           }

In some cases, this is a feature, because if you are mapping to a fixed-length target, this exposes that limitation an the API level. But in other places, it's just an arbitrary number we whack in as a guess to try and round out the size of the mapped structure to not exceed a page, like this arbitrary limit on String length in USB console strings:

xous-core/services/usb-device-xous/src/api.rs

Lines 150 to 156 in fd0eaaa

    
           pub const SERIAL_ASCII_BUFLEN: usize = 512; 
        
           pub const SERIAL_BINARY_BUFLEN: usize = 128; 
        
           #[derive(Debug, rkyv::Archive, rkyv::Serialize, rkyv::Deserialize, Copy, Clone)] 
        
           pub struct UsbSerialAscii { 
        
               pub s: xous_ipc::String<SERIAL_ASCII_BUFLEN>, 
        
               pub delimiter: Option<char>, 
        
           }

or this wart in TextView:

xous-core/services/graphics-server/src/api/text.rs

Lines 116 to 120 in fd0eaaa

    
               // this field tracks the state of a busy animation, if `Some` 
        
               pub busy_animation_state: Option<u32>, 
        
               pub text: String<3072>, 
        
           }

I think there is some merit to just remove the xous-ipc version of String and use only Rust native String, but it would touch almost every crate to strip out that function, so looking for thoughts and opinions about doing that.

Infallible `into_buf()`

This is the current implementation of into_buf():

xous-core/xous-ipc/src/buffer.rs

Lines 169 to 179 in fd0eaaa

    
           pub fn into_buf<S>(src: S) -> core::result::Result<Self, ()> 
        
           where 
        
               S: rkyv::Serialize<rkyv::ser::serializers::BufferSerializer<Buffer<'a>>>, 
        
           { 
        
               let buf = Self::new(core::mem::size_of::<S>()); 
        
               let mut ser = rkyv::ser::serializers::BufferSerializer::new(buf); 
        
               let pos = ser.serialize_value(&src).or(Err(()))?; 
        
               let mut buf = ser.into_inner(); 
        
               buf.offset = MemoryAddress::new(pos); 
        
               Ok(buf) 
        
           }

It fails only if there is some failure in serialization, which in practice I think there isn't ever one, so it's layer of error handling code that never gets used. Also, what do you do anyways if there's a serialization error? Is there even a reasonable way to propagate that up the stack?

A thought I have is perhaps to rework the API to make into_buf() infallible-or-panic. This has at least two ramifications off the top of my head.

Error handling in serialization would need to be handled inside into_buf(). But I think that's a reasonable take? Because rkyv now supports variable-length structures, serialization can clearly fail. However, the recovery from that seems like it should be algorithmic and handled within the into_buf() function instead of kicking it back to every caller. The method would work like this:

Try to serialize the object into the smallest chunk of memory, e.g. one page. 99% of the serializations in Xous fit in this, and would not fail.
If it doesn't fit in the page, allocate another page, and retry the serialization.
Repeat (2) until there is either a hard-panic due to OOM, or you succeed.

This may or may not be the right behavior. The advantage of "serialize or die" is that it's simple and works in almost every case. The disadvantage is that you're deprived of the opportunity to modify the serialization request in case there was a reasonable thing you could do to try to make it work. For example, perhaps if it was a block of text being passed around, you could break it into paragraphs and process it in smaller chunks. But I think in practice I think this is arguably a check that should be happening anyways, and perhaps upstream of a serialization attempt - basically look at an incoming text block and if it's larger than some hundreds of kilobytes, handle it differently.

The other ramification is it makes handling very large chunks of data less efficient, because you iteratively, with O(N) time, "discover" the true size of an object.

Perhaps one option is to leave into_buf() with the exact same behavior as before, but create new methods for into_buf_autosize(), and/or into_buf_sized(), where the autosize option will do the iterative guessing of the objects size, and sized would take an explicit argument where the user says "this will take X pages to send" and always allocates that number of pages, and if it doesn't fit, it gives up.

Summary

The open questions are thus:

Is it worth it to migrate to a new rkyv?
What should we do about the xous-ipc String type, now that we have support for Rust String?
Are there any API modifications we can do to make things more ergonomic (e.g. changing behavior of into_buf())?
Should we endeavor to turn on the bytecheck feature and make Buffer transformations safe? It might seem like an obvious thing to do but it incurs a computational and code size cost on a slow system with limited memory, and we primarily use rkyv to send data between trusted servers inside the same machine. This would be a feature you'd want on any data coming over the network or stored somewhere you didn't trust, so it's not exactly clear to me what we'd be getting by turning it on.

And perhaps some other questions I haven't thought of, will edit this as things come up.

bunnie · 2024-09-18T06:25:10Z

bunnie
Sep 18, 2024
Maintainer Author

I thought maybe we could get away with a String::<N> that "just worked" so we could have API backward compatibility by implementing a direct serialization of String::<N>: https://github.com/betrusted-io/xous-core/blob/rkyv-0.8-planning/xous-ipc/src/string.rs

This is a bad idea because actually, you need to be able to derive an rkyv-able String. So in fact probably a better approach is to simply just implement String::<N> with a pass-through of an rkyv String-native type or something. Or implement a custom serializer for String:: that's used in parallel to rkyv String just for the sake of backward compatibility.

I kind of don't like any of those approaches and am leaning in favor of just eliminating the IPC String and using the rkyv String, because you get a much richer String type to play with, even though it is a much bigger rewrite of the code base.

Does open the door to the discussion about how we deal with messages that take multiple pages to send, and how do we deal with Strings that really should be strictly limited to a length because of other API issues. But I think maybe the right thing to do is to make the length limiting a thing that is done at a level of abstraction above String. Again, a bigger re-write of the code base, but I think cleaner and more maintainable overall.

0 replies

xobs · 2024-09-18T08:27:36Z

xobs
Sep 18, 2024
Maintainer

The ability to work with flattened versions is very nice indeed.

When this was first designed, we didn't even have libstd, so we didn't even have std::string::String. The xous_ipc::String<N> was a kind of minimum viable string, and it would be nice to see that gone.

For performant messages, the best approach is still to manually serialize and deserialize like we do in libstd itself for networking. rkyv is used for convenience, so I'm largely okay with the sledgehammer approach.

One thing to note -- libstd manually packs these structs in at least one case. The dns lookup process writes into a buffer and reads the response, so in that sense the existing behavior will need to be maintained. Fortunately I think that's the only place where that's used, since the other instances of lend_mut() getting called are all to the network server, which doesn't use rkyv.

Nice job getting the API to work with the new version.

0 replies

xobs · 2024-09-19T02:11:10Z

xobs
Sep 19, 2024
Maintainer

I think there's actually a very important pair of questions to ask: What is this trying to accomplish, and what are the threat models?

When you link two crates together, Rust will ensure the types map between them. It may decide to reorder fields, but even within the same crate things like pointers are valid.

With C, when you link a library, you're relying on the header to not lie. It's perfectly valid to call a function with completely invalid arguments, and you might even get away with it some of the time. C does not enforce ABI compatibility, and considers the ABI boundary to be trustable.

The use of a crate like rkyv gives two nice properties:

It prevents leaking pointers from one process to another, and even allows for sending some pointer-like values; and
You can trust the values coming from the other side.

There are a number of options:

Use rkyv. It's a bit heavy, but it works. It will allow you to trust both sides at the cost of some overhead.
Just transmute the object. Potentially very dangerous, but very fast. It should be fine for primitive types and anything marked #[repr(C)] without pointers.
Only support &[u8]. This is for performant operations.
Come up with our own specialized system.

0 replies

bunnie · 2024-09-19T03:03:07Z

bunnie
Sep 19, 2024
Maintainer Author

I think the threat model in the case of rkyv is primarily rustc vs the developer, and the developer vs. themselves. A third model might present itself once we allow loadable apps, which is different versions of rustc versus each other.

I recall early on we tried using transmute on objects, but kept on getting tripped up on some subtleties - alignment, uninit, pinned things and pointers. It feels fragile, and it is unsafe after all so I prefer to avoid it, because "anything" could happen across a transmute boundary.

&[u8] is something I do use sometimes, but it's subject to "me versus myself" threats - transcribing objects manually into &[u8] is error-prone, and different versions of my own code aren't going to work with each other. I know we do this in the std APIs because we want to avoid pulling rkyv in as a dependency for std itself, and there's a bunch of things we do like hard-coding constants into the ABI that are fine for something as stable as std but I think it's not appropriate for every user application, especially ones that have a somewhat fluid API.

I think the main benefit of rkyv is it gives us a simple and (hopefully) reliable way to serialize objects between processes while enjoying all of the strong guarantees of Rust. In this context, it's not meant to be used as a way to store data to the PDDB or transmit across the network (although users are welcome to make that choice, it's outside the scope of this discussion).

I suppose rkyv would ensure some inter-rustc inter-operability? Because its serialization format does not depend on the compiler, if the one process was compiled with a different version of rustc, as long as the same version of rkyv was used between the two you wouldn't have to worry about compiler incompatibilities (as you would if you did a transmute).

Own specialized system - I think we'd be basically re-inventing the derive macro system of rkyv and it wouldn't be nearly as feature-rich as it is today. That we're even discussing the possibility of having std String and Vec as first-class objects to send between processes is pretty cool.

0 replies

gsora · 2024-09-19T13:45:09Z

gsora
Sep 19, 2024

Hi!

Is it worth it to migrate to a new rkyv?

If the new rkyv behavior allow for less unsafe usages, more flexibility in what types can be passed around through the messaging system, I'd say go for it.

The latest rkyv creates an opportunity for more flat usage by Derive-ing more useful attributes on ArchivedT, such as Eq, PartialEq, etc. which means we can now pass some Enum of say, EnumT and on the receiving side do more operations on ArchivedEnumT because the traits now exist to do that.

Is there a specific example in which this functionality comes in handy?

What should we do about the xous-ipc String type, now that we have support for Rust String?

I think there is some merit to just remove the xous-ipc version of String and use only Rust native String, but it would touch almost every crate to strip out that function, so looking for thoughts and opinions about doing that.

I've had some exposure to the xous_ipc::String type, and while the API itself isn't hard to use it brings some cognitive overload for the average developer: questions like "why are they using it instead of String?", "am I better off using xous_ipc::String or a plain String and call conversion methods?" or "how big should this xous_ipc::String be?" were surely in my mind when I first approached Xous.

IMO getting rid of xous_ipc::String is the better course of action now that we have a full version of std, the downside is the gargantuan human-hour capital to be spent to refactor its usage.

We could leave it in place and discourage its usage via documentation, and follow the boyscout rule when we encounter usage in the wild: if new code touches a pre-existing xous_ipc::String, the developer should also take care of refactoring it out in favor of plain Strings.

I'm also expecting a natural decrease of String usage: new developers are inclined to use Rust-specific types, if they never encounter a xous_ipc::String they won't bother using it.

Are there any API modifications we can do to make things more ergonomic (e.g. changing behavior of `into_buf()`)?

Also, what do you do anyways if there's a serialization error?

In my experience (outside Xous, that is) serialization errors are almost always fatal - so far I've seen OOM and bugs inside the serializer codes, which are two things you can't work around.

The disadvantage is that you're deprived of the opportunity to modify the serialization request in case there was a reasonable thing you could do to try to make it work.

I'm in favor of having an infallible into_buf() provided there are tools for developers to recognize onerous calls.

What if into_buf() printed a debug log if a call does too many iterations?

This way a developer could notice they're doing something wrong performance-wise and optimize their code accordingly.

I think Android does something similar if developers write bad ListView code, which are onerous from the view data container creation point of view.

0 replies

xobs · 2024-09-20T04:24:00Z

xobs
Sep 20, 2024
Maintainer

As a second option, I've come up with flatipc that combines (2) and (4). It allows for transparent usage of traits via Deref and DerefMut, so for example it's possible to compare an Ipc and a non-Ipc version of the trait by simply adding * to the front of the Ipc version.]

The #[derive(Ipc)] macro doesn't work unless the trait is #[repr(C)], ensuring it has a well-defined representation. Additionally, it adds safety in the form of signature checking of types, though collisions are not impossible as it's only a 32-bit value.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

migrate to rkyv-0.8 #574

{{title}}

Replies: 6 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

migrate to rkyv-0.8 #574

bunnie Sep 17, 2024 Maintainer

The Opportunities

The Difficulties

Deprecating IPC String

Infallible into_buf()

Summary

Replies: 6 comments

bunnie Sep 18, 2024 Maintainer Author

xobs Sep 18, 2024 Maintainer

xobs Sep 19, 2024 Maintainer

bunnie Sep 19, 2024 Maintainer Author

gsora Sep 19, 2024

Is it worth it to migrate to a new rkyv?

What should we do about the xous-ipc String type, now that we have support for Rust String?

Are there any API modifications we can do to make things more ergonomic (e.g. changing behavior of into_buf())?

xobs Sep 20, 2024 Maintainer

bunnie
Sep 17, 2024
Maintainer

Infallible `into_buf()`

bunnie
Sep 18, 2024
Maintainer Author

xobs
Sep 18, 2024
Maintainer

xobs
Sep 19, 2024
Maintainer

bunnie
Sep 19, 2024
Maintainer Author

gsora
Sep 19, 2024

Are there any API modifications we can do to make things more ergonomic (e.g. changing behavior of `into_buf()`)?

xobs
Sep 20, 2024
Maintainer