Limit the use of unsafe Rust code #499

Expyron · 2020-07-04T18:42:36Z

Rust-protobuf currently has 33 uses of unsafe Rust code.
Most of them, although not all, appear to be for performance reasons.

As this crate can be used to parse untrusted protobuf inputs, security is an important topic.
It is also a general recommendation in the Rust community to reduce unsafe code as much as possible.

Ideally, every use of unsafe should be clearly commented with an explanation of why it is safe and necessary.
As I went through the code, a lot of them couldn't be immediately understood.

I am not saying that this crate currently has security issues, but right now it is very difficult to audit it to show that it doesn't.

Below is a run-through all the unsafe uses I found (I only checked on the master branch, as the latest version 2.15.1 still uses unsafe in generated code).

./protobuf/src/buf_read_iter.rs: 7 uses
./protobuf/src/stream.rs: 5 uses

These two are obviously for performance reasons, although I believe they can be rewritten in safe code without impacting performance at all.
I am willing to try and work on it if wanted.

./protobuf/src/chars.rs: 2 uses
./protobuf/src/json/base64.rs: 1 use

These are examples of "good" uses of unsafe.
I recommend adding a short comment for each one explaining the reason behind it.

./protobuf-codegen-pure/src/convert.rs: 2 uses

Can be replaced trivially by f64::to_bits() and f32::to_bits()

./protobuf/src/lazy.rs: 4 uses

After reading the code, I am not able to understand the use behind this lazy initialization code, as it is never used in any of the 'hot' code paths (i.e. parsing or writing).
This is also storing global state, which is usually frowned upon in Rust, and not something that users expect from a parsing library.
Either way, using an dedicated crate, like lazy-static or once_cell, would make this code much easier to audit.

./protobuf/src/core.rs: 4 uses
./protobuf/src/misc.rs: 3 uses
./protobuf/src/reflect/enums.rs: 4 uses
./protobuf/src/reflect/runtime_type_dynamic.rs: 1 use
./protobuf/src/reflect/transmute_eq.rs: 1 use

That is a lot of transmutes and raw pointer dereferences, and the whole code appears to be built around it.
Rust is usually not a dynamically typed language, but this use of std::any everywhere makes it look like one.
In addition, because the code is using raw dyn traits everywhere, and not Box<dyn Trait>, the whole downcast and transmute code was re-implemented instead of using the standard Box::downcast(), which would have been safe.

Overall, I believe this to be the most problematic part for anyone trying to perform a security audit of this library.
Using dyn Trait everywhere (instead of impl Trait) also has huge performance impacts, as virtual function calls are slow...

The text was updated successfully, but these errors were encountered:

stepancheg · 2020-07-06T05:41:39Z

These two are obviously for performance reasons, although I believe they can be rewritten in safe code without impacting performance at all.

Yes. Basically, CodedInputStream and CodedOutputStream are non-generic types, but still want to work with raw bytes regardless of what actually input is. So they need to "pin" byte arrays, which is not possible in safe Rust.

Also, some unsafe are range checks which are quite expensive when checked over and over again.

I am willing to try and work on it if wanted.

If you can remove any unsafe without sacrificing performance and without breaking backwards compatibility, these patches are certainly welcome.

If performance degrades, or we need to change API, it might be more complicated.

These are examples of "good" uses of unsafe.
I recommend adding a short comment for each one explaining the reason behind it.

Added a couple comments.

Can be replaced trivially by f64::to_bits() and f32::to_bits()

Done. Thanks!

Either way, using a dedicated crate, like lazy-static or once_cell, would make this code much easier to audit.

Dependency on a third-party crate is a liability, many times I was hit by incompatible changes in third-party crates.

I'm waiting for this RFC rust-lang/rfcs#1977 to be implemented to be able to use third-party dependencies freely.

That is a lot of transmutes and raw pointer dereferences
Rust is usually not a dynamically typed language, but this use of std::any everywhere makes it look like one
...

It's a complicated issue.

Runtime reflection is nice, but it is hard to do in safe code.

If I started the library from scratch, maybe I'd do something differently, I don't know.

If you have suggestions on how to change something, please let me know.

Using dyn Trait everywhere (instead of impl Trait) also has huge performance impacts

Default serialization/deserialization paths happen without virtual calls. And virtual calls should make code smaller.

stepancheg · 2022-02-08T01:25:30Z

Closing due to inactivity.

PRs which do not regress performance are welcome.

stepancheg closed this as completed Feb 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit the use of unsafe Rust code #499

Limit the use of unsafe Rust code #499

Expyron commented Jul 4, 2020

stepancheg commented Jul 6, 2020

stepancheg commented Feb 8, 2022

Limit the use of unsafe Rust code #499

Limit the use of unsafe Rust code #499

Comments

Expyron commented Jul 4, 2020

stepancheg commented Jul 6, 2020

stepancheg commented Feb 8, 2022