Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit the use of unsafe Rust code #499

Closed
Expyron opened this issue Jul 4, 2020 · 2 comments
Closed

Limit the use of unsafe Rust code #499

Expyron opened this issue Jul 4, 2020 · 2 comments

Comments

@Expyron
Copy link

Expyron commented Jul 4, 2020

Rust-protobuf currently has 33 uses of unsafe Rust code.
Most of them, although not all, appear to be for performance reasons.

As this crate can be used to parse untrusted protobuf inputs, security is an important topic.
It is also a general recommendation in the Rust community to reduce unsafe code as much as possible.

Ideally, every use of unsafe should be clearly commented with an explanation of why it is safe and necessary.
As I went through the code, a lot of them couldn't be immediately understood.

I am not saying that this crate currently has security issues, but right now it is very difficult to audit it to show that it doesn't.


Below is a run-through all the unsafe uses I found (I only checked on the master branch, as the latest version 2.15.1 still uses unsafe in generated code).

  • ./protobuf/src/buf_read_iter.rs: 7 uses
    ./protobuf/src/stream.rs: 5 uses

These two are obviously for performance reasons, although I believe they can be rewritten in safe code without impacting performance at all.
I am willing to try and work on it if wanted.

  • ./protobuf/src/chars.rs: 2 uses
    ./protobuf/src/json/base64.rs: 1 use

These are examples of "good" uses of unsafe.
I recommend adding a short comment for each one explaining the reason behind it.

  • ./protobuf-codegen-pure/src/convert.rs: 2 uses

Can be replaced trivially by f64::to_bits() and f32::to_bits()

  • ./protobuf/src/lazy.rs: 4 uses

After reading the code, I am not able to understand the use behind this lazy initialization code, as it is never used in any of the 'hot' code paths (i.e. parsing or writing).
This is also storing global state, which is usually frowned upon in Rust, and not something that users expect from a parsing library.
Either way, using an dedicated crate, like lazy-static or once_cell, would make this code much easier to audit.

  • ./protobuf/src/core.rs: 4 uses
    ./protobuf/src/misc.rs: 3 uses
    ./protobuf/src/reflect/enums.rs: 4 uses
    ./protobuf/src/reflect/runtime_type_dynamic.rs: 1 use
    ./protobuf/src/reflect/transmute_eq.rs: 1 use

That is a lot of transmutes and raw pointer dereferences, and the whole code appears to be built around it.
Rust is usually not a dynamically typed language, but this use of std::any everywhere makes it look like one.
In addition, because the code is using raw dyn traits everywhere, and not Box<dyn Trait>, the whole downcast and transmute code was re-implemented instead of using the standard Box::downcast(), which would have been safe.

Overall, I believe this to be the most problematic part for anyone trying to perform a security audit of this library.
Using dyn Trait everywhere (instead of impl Trait) also has huge performance impacts, as virtual function calls are slow...

@stepancheg
Copy link
Owner

These two are obviously for performance reasons, although I believe they can be rewritten in safe code without impacting performance at all.

Yes. Basically, CodedInputStream and CodedOutputStream are non-generic types, but still want to work with raw bytes regardless of what actually input is. So they need to "pin" byte arrays, which is not possible in safe Rust.

Also, some unsafe are range checks which are quite expensive when checked over and over again.

I am willing to try and work on it if wanted.

If you can remove any unsafe without sacrificing performance and without breaking backwards compatibility, these patches are certainly welcome.

If performance degrades, or we need to change API, it might be more complicated.

These are examples of "good" uses of unsafe.
I recommend adding a short comment for each one explaining the reason behind it.

Added a couple comments.

Can be replaced trivially by f64::to_bits() and f32::to_bits()

Done. Thanks!

Either way, using a dedicated crate, like lazy-static or once_cell, would make this code much easier to audit.

Dependency on a third-party crate is a liability, many times I was hit by incompatible changes in third-party crates.

I'm waiting for this RFC rust-lang/rfcs#1977 to be implemented to be able to use third-party dependencies freely.

That is a lot of transmutes and raw pointer dereferences
Rust is usually not a dynamically typed language, but this use of std::any everywhere makes it look like one
...

It's a complicated issue.

Runtime reflection is nice, but it is hard to do in safe code.

If I started the library from scratch, maybe I'd do something differently, I don't know.

If you have suggestions on how to change something, please let me know.

Using dyn Trait everywhere (instead of impl Trait) also has huge performance impacts

Default serialization/deserialization paths happen without virtual calls. And virtual calls should make code smaller.

@stepancheg
Copy link
Owner

Closing due to inactivity.

PRs which do not regress performance are welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants