Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(forge): improve fuzz corpus #5246

Merged
merged 2 commits into from
Jun 29, 2023

Conversation

plotchy
Copy link
Contributor

@plotchy plotchy commented Jun 28, 2023

This pr provides a small update to the fuzz corpus that should allow higher effectiveness in fuzzing to find bugs, hit assertions, cause failures, etc.

Motivation

The fuzz corpus uses the values for the push, slot, slot_value, and log topics for relevant accounts in the state db.

The fuzz corpus can be minorly improved by also inserting values that are +/- 1 of the value, in order to pass through conditional statements that use < or >.

For example:

Challenge:

pragma solidity ^0.8.13;
contract MagicNumber {
    uint256 public target = 0;

    function check_against_magic(uint256 number) public {

        if (number > 100_098) { 
            if (number < 100_100) {
                target = 1;
            }
        }
    }

    function is_solved() public view returns (bool){
        return target == 1;
    }
}

Test:

pragma solidity ^0.8.13;

import "forge-std/Test.sol";
import "../src/MagicNumber.sol";

contract MagicNumberTest is Test {
    MagicNumber public magic_number;

    function setUp() public {
        magic_number = new MagicNumber();
    }

    function testFuzz(uint number) public {
        magic_number.check_against_magic(number);
        assertFalse(magic_number.is_solved());
    }
}

Fuzzing profile of:

[fuzz]
runs = 100_000
dictionary = 90

Current fuzzer:

❯ forge test --fuzz-runs 100000 --mc MagicNumber                    
[⠢] Compiling...
No files changed, compilation skipped

Running 1 test for test/MagicNumber.t.sol:MagicNumberTest
[PASS] testFuzz(uint256) (runs: 100000, μ: 8346, ~: 8340)
Test result: ok. 1 passed; 0 failed; finished in 2.48s

After PR Changes:

❯ forge test --fuzz-runs 100000 --mc MagicNumber                    
[⠢] Compiling...
No files changed, compilation skipped

Running 1 test for test/MagicNumber.t.sol:MagicNumberTest
[FAIL. Reason: Assertion failed. Counterexample: calldata=0x32f92a0300000000000000000000000000000000000000000000000000000000000188fb, args=[100603 [1.006e5]]] testFuzz(uint256) (runs: 305, μ: 8345, ~: 8340)
Test result: FAILED. 0 passed; 1 failed; 0 skipped; finished in 384.89ms

Failing tests:
Encountered 1 failing test in test/MagicNumber.t.sol:MagicNumberTest
[FAIL. Reason: Assertion failed. Counterexample: calldata=0x32f92a0300000000000000000000000000000000000000000000000000000000000188fb, args=[100603 [1.006e5]]] testFuzz(uint256) (runs: 305, μ: 8345, ~: 8340)

Encountered a total of 1 failing tests, 0 tests succeeded

Solution

My initial feeling is that only the push and slot_value values should be considered for the +/- 1 insertion. Practically, log topics and slot_keys are never used for logical flow or comparisons, and adding them would increase bloat in the fuzzing corpus which drags down effectiveness.

I don't like how I polluted the collect_push_values() fn with returning more than just the push values. I think that the push and slot_value values should instead be iterated over separately and have the +/- 1 values inserted.

Better Solution

While this +/- 1 business definitely aids the initial skyrocket of finding bugs in a fuzzing marathon, it could also lead to a flatter tail where the excess bloat of the dictionary leads to less useful values being picked over a long period of time.

image [Source: Daedaluzz](https://consensys.net/diligence/blog/2023/04/benchmarking-smart-contract-fuzzers/)

A better solution entirely would be to do a static analysis pass over the bytecode and track push values that are concretely modified + consumed by conditional statements. A static pass can be a surgical way to find the real values that are checked against.

Without doing a static pass, there could also be interesting REVM inspectors that track the stack values consumed in lt, gt, eq, etc opcodes, and add these values to the dictionary when the run is finished.

Example where a static pass shines:

pragma solidity ^0.8.13;
contract StaticPassRequired {
    uint256 public target = 0;
    uint256 constant magic_number = 1337;

    function check_against_magic(uint256 number) public {
        if (number == magic_number * 1337) { // value not found in push
            target = 1;
        }
    }

    function is_solved() public view returns (bool){
        return target == 1;
    }
}

@Evalir Evalir self-requested a review June 28, 2023 21:24
Copy link
Member

@Evalir Evalir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing! Thank you @plotchy for the detailed writeup and very good example of how this improves the fuzzer. I think the way this is implemented is fine—it's not a lot of pollution.

Once we get to doing a fuzzer (and larger test runner rewrite) we can absolutely reconsider the strategies we're using for fuzzing and try and introduce things like static analysis. We have a lot of room to improve and tooling around solidity has improved considerably.

Deferring to @mattsse for final approve + merge

@Evalir Evalir requested a review from mattsse June 29, 2023 02:31
Copy link
Member

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sweet!

tysm for this

@mattsse mattsse merged commit 33acf8a into foundry-rs:master Jun 29, 2023
@mattsse mattsse added A-testing Area: testing T-perf Type: performance labels Jun 29, 2023
Comment on lines +118 to +125
if value != U256::zero() {
let below_value = value - U256::one();
state.values_mut().insert(utils::u256_to_h256_be(below_value).into());
}
if value != U256::max_value() {
let above_value = value + U256::one();
state.values_mut().insert(utils::u256_to_h256_be(above_value).into());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on removing the bounds checks here? e.g. if the value is 0, subtracting 1 then adding U256::max_value() to the dict seems ok

OTOH, perhaps I can nerd-snipe you into also doing #3521 (comment), in which case we'd already have these bounds in the dict anyway 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed the 0 and max values should be in the dict from the start. I like the bounds checking for the reason of keeping the intent of the values to get through comparison operations. The wrapping is a neat trick to get a useful value, but isn't the original intent.

@mds1
Copy link
Collaborator

mds1 commented Jun 29, 2023

A better solution entirely would be to do a static analysis pass over the bytecode and track push values that are concretely modified + consumed by conditional statements

Agreed this does seem useful, should we create an issue to track this?


Also, while I'm supportive of this PR, merging changes to the fuzzer always makes me nervous because we don't have benchmarks or a strong test suite to objectively measure how the changes affect performance. Short term, perhaps we require a daedaluzz run to see how the results changed (I'd be curious to see the before/after on the changes for this PR). Longer term we should incorporate that plus the echidna test cases into CI

@Evalir
Copy link
Member

Evalir commented Jun 29, 2023

  • 1 to creating an issue to track this idea, and we should also have a tracking issue for incorporating echidna's test cases into CI

@pcaversaccio
Copy link
Contributor

pcaversaccio commented Jul 1, 2023

My Foundry fuzz time doubled twice, I guess that's because of this PR? https://twitter.com/pcaversaccio/status/1675085794963337216

Going from 28mins to over 1hr is a steep increase...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testing Area: testing T-perf Type: performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants