Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fuzzing support with docs; RawVal comparison proptests #957

Merged
merged 16 commits into from
Jun 21, 2023

Conversation

brson
Copy link
Contributor

@brson brson commented May 20, 2023

What

Preliminary support for fuzzing Soroban contracts with cargo-fuzz. This patch is primarily concerned with introducing a pattern for implementing the Arbitrary trait, by which fuzz tests generate semi-random Rust values.

This is a new revision of previous pr #878. Little has changed functionally since that pr.

This patch additionally uses the Arbitrary impls with proptest-arbitrary-interop crate to add proptests that help ensure that:
RawVals and ScVals can be converted between each other and their their comparison functions are equivalent, which closes stellar/rs-soroban-env#743.

This patch introduces the SorobanArbitrary trait which looks like this:

    pub trait SorobanArbitrary:
        TryFromVal<Env, Self::Prototype> + IntoVal<Env, RawVal> + TryFromVal<Env, RawVal>
    {
        type Prototype: for<'a> Arbitrary<'a>;
    }

Basically every type relevant to soroban contracts implements (or should) this trait, including i32, u32, i64, u64, i128, u128, (), bool, I256Val, U256Val, Error, Bytes, BytesN, Vec, Map, Address, Symbol, TimepointVal, DurationVal.

The #[contracttype] macro automatically derives an implementation, along with a type that implements SorobanArbitrary::Prototype.

In use the trait looks like

use soroban_sdk::{Address, Env, Vec};
use soroban_sdk::contracttype;
use soroban_sdk::arbitrary::{Arbitrary, SorobanArbitrary};
use std::vec::Vec as RustVec;

#[derive(Arbitrary, Debug)]
struct TestInput {
    deposit_amount: i128,
    claim_address: <Address as SorobanArbitrary>::Prototype,
    time_bound: <TimeBound as SorobanArbitrary>::Prototype,
}

#[contracttype]
pub struct TimeBound {
    pub kind: TimeBoundKind,
    pub timestamp: u64,
}

#[contracttype]
pub enum TimeBoundKind {
    Before,
    After,
}

fuzz_target!(|input: TestInput| {
    let env = Env::default();
    let claim_address: Address = input.claim_address.into_val(&env);
    let time_bound: TimeBound = input.time_bound.into_val(&env).
    // fuzz the program based on the input
});

A more complete example is at https://github.com/brson/rs-soroban-sdk/blob/val-fuzz/soroban-sdk/fuzz/fuzz_targets/fuzz_rawval_cmp.rs

Why

This patch assumes it is desirable to fuzz Soroban contracts with cargo-fuzz.

Soroban reference types can only be constructed with an Env, but the Arbitrary trait constructs values from nothing but bytes. The SorobanArbitrary trait provides a pattern to bridge this gap, expecting fuzz tests to construct SorobanArbitrary::Prototype types, and then convert them to their final type with FromVal/IntoVal.

There are a lot of docs here and hopefully they explain what's going on well enough.

fuzz_catch_panic

This patch also introduces a helper function, fuzz_catch_panic, which is built off of the call_with_suppressed_panic_hook function in soroban-env-host.

The fuzz_target! macro overrides the Rust panic hook to abort on panic, assuming all panics are bugs, but Soroban contracts fail by panicking, and I have found it desirable to fuzz failing contracts. fuzz_catch_panic temporarily prevents the fuzzer from aborting on panic.

Known limitations

The introduction of SorobanArbitrary requires a bunch of extra documentation to explain why Soroban contracts can't just use the stock Arbitrary trait.

As an alternative to this approach, we could instead expect users to construct XDR types, not SorobanArbitrary::Prototype types, and convert those to RawVals. I have been assuming that contract authors should rather concern themselves with contract types, and not the serialization types; and presently there are a number of XDR types that have values which can't be converted to contract types. The SorobanArbitrary::Prototype approach does allow more flexibility in the generation of contract types, e.g. we can generate some adversarial types like invalid object references and bad tags.

Contracts must use IntoVal to create the final types, but these traits are under-constrained for this purpose and always require type hints:

fuzz_target!(|input: <Address as SorobanArbitrary>::Prototype| {
    let env = Env::default();
    let address: Address = input.into_val(&env);
    // fuzz the program based on the input
});

This is quite unfortunate because it means the real type must be named twice.

This patch introduces a new private module arbitrary_extra which simply defines a bunch of new conversions like

impl TryFromVal<Env, u32> for u32 {
    type Error = ConversionError;
    fn try_from_val(_env: &Env, v: &u32) -> Result<Self, Self::Error> {
        Ok(*v)
    }
}

These conversions are required by SorobanArbitrary, which is only defined when cfg(feature = "testutils"); the arbitrary_extra module defines these conversions for all cfgs to ensure type inference is always the same, but the impls are probably useless outside of the SorobanArbitrary prototype pattern.

Crates that use #[contracttype] need to define a "testutils" feature if they want to use Arbitrary.

This patch doesn't generate "adversarial" values, which might include:

  • RawVals with bad tags
  • objects that have invalid references
  • objects that are reused multiple times
  • deeply nested vecs and maps
  • vecs and maps that contain heterogenous element types

The arbitrary module has unit tests, and these help especially ensure the macros for custom types are correct, but the tests are not exhaustive.

@brson brson changed the title Add fuzzing support with docs, RawVal comparison proptests Add fuzzing support with docs; RawVal comparison proptests May 20, 2023
@brson
Copy link
Contributor Author

brson commented May 20, 2023

The cargo-deny CI error is about duplicate crates. I'll investigate another time.

Copy link
Member

@leighmcculloch leighmcculloch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow this is awesome. I've started reviewing but haven't finished yet and will have a play with it. The integration with proptest will be helpful.

I'm also particularly interested in some of those invalid RawVal cases, especially Vec (and Map) types where the contents aren't all the same type, because that's a valid Vec, for things like tuples or argument lists.

@brson
Copy link
Contributor Author

brson commented May 23, 2023

I looked into the cargo-deny errors, and pushed two commits to resolve them, but they may not be acceptable.

The first commit just upgrades proptest, which resolves the quick-error dupe.

The second adds rand 0.8 and its dependencies to the deny.toml ban whitelist, allowing dupes of rand, rand_core, rand_chacha, and getrandom. The problem here seems to be that all revisions of ed25519-dalek are on rand 0.7, while all revisions of proptest are on rand 0.8. I added comments to deny.toml indicating that proptest is only a dev-dependency so the dupes won't cause contract bloat, but it's possible that by being on a whitelist rand dupes could sneak into contracts in the future.

I'm still poking at other solutions to the duplicated rand crate, but am not super optimistic.

edit: it looks like curve25519-dalek 4.0 upgrades rand, but that crate is not released yet.

@leighmcculloch leighmcculloch requested review from graydon and removed request for graydon June 3, 2023 03:37
@brson
Copy link
Contributor Author

brson commented Jun 9, 2023

I found something I need to change here: I have implemented Arbitrary for U256Val, but I should be using U256 instead.

@brson brson force-pushed the arbitrary-for-upstream branch from 684bf79 to ea350a0 Compare June 10, 2023 18:51
@brson
Copy link
Contributor Author

brson commented Jun 10, 2023

I rebased this and made the following changes:

  • Implemented SorobanArbitrary for I256/U256 instead of I256Val, U256Val
  • Removed the implementations for DurationVal and TimepointVal. These are not user-facing types. I am assuming there are two types yet to be implemented that wrap these and/or convert to them.
  • Added some test cases.

I also discovered that Error cannot be stored in user-defined types, and posted a fix for that here: stellar/rs-soroban-env#846

@brson
Copy link
Contributor Author

brson commented Jun 10, 2023

It's also worth noting that RawVal cannot be stored in UDTs: #939

@brson
Copy link
Contributor Author

brson commented Jun 10, 2023

It looks like this is the issue for Timepoint and Duration: stellar/rs-soroban-env#724

Also indicates that the standard Rust String will be expected to be usable in contracts in the future, so will need implementations for Soroban Arbitrary.

It's worth noting that this SorobanArbitrary is definitely a maintenance burden, as every new type that is expected to be used in contracts needs to be given explicit compatibility; and every type that might be stored in a UDT needs an implementation, and probably tests verifying they work, and I am still finding bugs and omissions like mentioned in previous comments. I expect that there will be bugs reported against it by people using UDTs in ways I have not anticipated.

@brson brson force-pushed the arbitrary-for-upstream branch from 8b6210b to 6017207 Compare June 19, 2023 18:12
@brson
Copy link
Contributor Author

brson commented Jun 19, 2023

Rebased.

@leighmcculloch leighmcculloch self-requested a review June 20, 2023 16:28
Copy link
Member

@leighmcculloch leighmcculloch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏🏻 Thanks @brson! This looks great.

There's a couple minor things with the testutils features I've commented inline, and I'll push fixes for because they're trivial to fix.

I've been playing around with it with a simple contract, and run into a couple issues. What I'm going to do is push a contract into this PR with some comments in the contract with the issue.

soroban-sdk/src/arbitrary.rs Outdated Show resolved Hide resolved
soroban-sdk/src/tests/proptest_scval_cmp.rs Outdated Show resolved Hide resolved
soroban-sdk/src/lib.rs Show resolved Hide resolved
Copy link
Member

@leighmcculloch leighmcculloch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed an example that uses the fuzzer in 22dd445. I think it'd be helpful for us to have a complete example in the test vectors of this repo. We've tried to keep the test vectors few and wherever possible keep any tests inside the sdk crate, but given that fuzzing is an out-of-crate experience, I think a test vector is suitable.

I've hit some compile errors I'm not sure what to do with though. See inline comment.

Also note the example I pushed isn't a particularly good one, I was just trying to get it working before doing anything meaningful.

tests/fuzz/Cargo.toml Show resolved Hide resolved
tests/fuzz/fuzz/fuzz_targets/fuzz_target_1.rs Outdated Show resolved Hide resolved
@leighmcculloch
Copy link
Member

leighmcculloch commented Jun 20, 2023

I pushed some more changes so that the sdk doesn't ever generate testutils dependent code if it isn't dependent on testutils. Also, cleaned up the example that I had added.

@leighmcculloch
Copy link
Member

leighmcculloch commented Jun 20, 2023

I think there's three things left we need to figure a story for:

  • The arbitrary re-export wasn't sufficient for me in the test_fuzz example's fuzzing target code. What do we need to do to fix that?

  • What do we do about the rustc / cargo multiple crate-type issue with LTO disappearing from the cdylib type? Do we recommend folks build their contract in an rlib, and re-export their contract in a cdylib that imports the rlib? Feel pretty heavy. Tell people to temporarily add rlib, but to remove it when finished fuzzing? 😕

  • Any objection to renaming the module from arbitrary to fuzz, or should we keep the name arbitrary?

@graydon
Copy link
Contributor

graydon commented Jun 21, 2023

Pushed a fix for the arbitrary re-export issue. It's not beautiful, but it works. The Arbitrary macro doesn't path-qualify the code it generates, so it assumes there'll be a module arbitrary in its expansion environment. Yay (lack of) hygiene.

@graydon
Copy link
Contributor

graydon commented Jun 21, 2023

The solution to the multiple crate-types situation suggested by cargo devs is to build our wasms with cargo rustc --crate-type cdylib, overriding whatever's written in Cargo.toml. This does work (it produces the LTO build) and I think it might be enough of a workaround to just .. include in the docs somewhere, maybe set as more of a default pattern in examples (i.e. remove the crate-type specifications from Cargo.toml and change the Makefile that builds wasms to call cargo rustc --crate-type cdylib explicitly). WDYT?

@leighmcculloch
Copy link
Member

I think that's reasonable. We can make that the default in docs for how to build .wasm files, and contracts can be rlib otherwise. We can keep the builds in this repo as is for the convenience of the workspace, and update the soroban-docs to show a setup without a crate-type in the toml, and using that new command.

@graydon
Copy link
Contributor

graydon commented Jun 21, 2023

@leighmcculloch as far as renaming it fuzz I don't think that's appropriate -- the same arbitrary machinery here is being used for fuzzing and proptesting (which I'm actually slightly more excited about than fuzzing, tbh!)

Copy link
Contributor

@graydon graydon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing that seems a little odd / unmotivated to me is the specialized typed vec and map<foo,foo> types rather than just vec and map<val,val> but it's not a big deal and I can imagine them plausibly being useful at some point, no need to delay further on that basis.

Overall this is great stuff! Thanks so much.

@brson
Copy link
Contributor Author

brson commented Jun 22, 2023

The only thing that seems a little odd / unmotivated to me is the specialized typed vec and map<foo,foo> types rather than just vec and map<val,val> but it's not a big deal and I can imagine them plausibly being useful at some point, no need to delay further on that basis.

If this is referring to the giant list of types that might be generated for an arbitrary Val, then I agree it's not ideal. Just getting something working that covers a bunch of cases. It can be improved in the future. The current Val generator can be considered an implementation detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Verify that equivalent RawVals and ScVals compare the same
3 participants