Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(evm): fuzzing not properly collecting data #2724

Merged
merged 22 commits into from
Aug 15, 2022

Conversation

joshieDo
Copy link
Collaborator

@joshieDo joshieDo commented Aug 11, 2022

Seems like we weren't using any of the collected values from PUSH and storage. For both fuzzing and invariants!

Fuzz & Invariant

  • PUSH values were being collected with a left alignment. (4442...0000 vs 0000...4442) . testNeedle(uint256) now successfully breaks, before it wouldnt.
  • Storage values were being collected in the wrong endianness. (efbe...0000 vs 0000...beef). testIncrement(address) now successfully breaks, before it wouldnt.

Invariant Dictionary Flooding

  • Disabled memory value collection until we find a better way to parse the values. Stuff like ab12...000..12ab...0000 is mostly spam.
  • Remove from the StateChangeSet the randomly generated sender before collect_state_from_call. Otherwise, the dictionary gets flooded with irrelevant addresses.

Some current limitations/follow-ups :

  • Packed values kinda break the storage collection. If you have a bool(true) & address(0xbeef), you might collect a value like 0000...beef01, which is useless.
  • Some parameter-aware dictionary would be nice. At the moment, if we want an address we might be getting a value from the dictionary which was an uint.
  • Values bigger than 32 bytes are not properly collected/used. (strings, bytes)
  • We collect PUSH bytes all the time on all calls. Adding some cache for ABI/addresses before re-collecting them, might increase speed in some cases.

--

Added a quick way to inspect the dictionary. Example

$ ~/projects/foundry/testdata$ RUST_LOG=trace tforge test -c fuzz/FuzzCollection.t.sol  -m invariantCounter -vvvvvv | grep dictionary
2022-08-11T15:57:44.060025Z TRACE contract{name=fuzz/FuzzCollection.t.sol:SampleContractTest}:invariant-test: forge::test::invariant::dictionary:["0000000000000000000000000000000000000000000000000000000000000000", "0000000000000000000000000000000000000000000000000000000000000001", "0000000000000000000000000000000000000000000000000000000000000002", "0000000000000000000000000000000000000000000000000000000000000003", "0000000000000000000000000000000000000000000000000000000000000004", "000000000000000000000000000000000000000000000000000000000000000a", "000000000000000000000000000000000000000000000000000000000000000c", "0000000000000000000000000000000000000000000000000000000000000010", "0000000000000000000000000000000000000000000000000000000000000011", "0000000000000000000000000000000000000000000000000000000000000014", "0000000000000000000000000000000000000000000000000000000000000020", "0000000000000000000000000000000000000000000000000000000000000024", "0000000000000000000000000000000000000000000000000000000000000039", "0000000000000000000000000000000000000000000000000000000000000040", "0000000000000000000000000000000000000000000000000000000000000044", "000000000000000000000000000000000000000000000000000000000000005b", "0000000000000000000000000000000000000000000000000000000000000064", "000000000000000000000000000000000000000000000000000000000000007d", "0000000000000000000000000000000000000000000000000000000000000080", "0000000000000000000000000000000000000000000000000000000000000082", "0000000000000000000000000000000000000000000000000000000000000084", "000000000000000000000000000000000000000000000000000000000000008a", "000000000000000000000000000000000000000000000000000000000000008b", "0000000000000000000000000000000000000000000000000000000000000095", "000000000000000000000000000000000000000000000000000000000000009e", "00000000000000000000000000000000000000000000000000000000000000a0", "00000000000000000000000000000000000000000000000000000000000000a4", "00000000000000000000000000000000000000000000000000000000000000ac", "00000000000000000000000000000000000000000000000000000000000000b1", "00000000000000000000000000000000000000000000000000000000000000b3", "00000000000000000000000000000000000000000000000000000000000000c7", "00000000000000000000000000000000000000000000000000000000000000d7", "00000000000000000000000000000000000000000000000000000000000000e0", "00000000000000000000000000000000000000000000000000000000000000e5", "00000000000000000000000000000000000000000000000000000000000000f3", "00000000000000000000000000000000000000000000000000000000000000ff", "000000000000000000000000000000000000000000000000000000000000010b", "0000000000000000000000000000000000000000000000000000000000000119", "000000000000000000000000000000000000000000000000000000000000011e", "000000000000000000000000000000000000000000000000000000000000012c", "0000000000000000000000000000000000000000000000000000000000000131", "000000000000000000000000000000000000000000000000000000000000017c", "000000000000000000000000000000000000000000000000000000000000018d", "000000000000000000000000000000000000000000000000000000000000019e", "00000000000000000000000000000000000000000000000000000000000001af", "00000000000000000000000000000000000000000000000000000000000001ed", "00000000000000000000000000000000000000000000000000000000000001fc", "00000000000000000000000000000000000000000000000000000000000001ff", "000000000000000000000000000000000000000000000000000000000000021e", "0000000000000000000000000000000000000000000000000000000000000230", "0000000000000000000000000000000000000000000000000000000000000237", "000000000000000000000000000000000000000000000000000000000000024d", "0000000000000000000000000000000000000000000000000000000000000260", "0000000000000000000000000000000000000000000000000000000000000265", "000000000000000000000000000000000000000000000000000000000000027f", "0000000000000000000000000000000000000000000000000000000000004446", "0000000000000000000000000000000000000000000000000000000000005556", "000000000000000000000000000000000000000000000000000000000000beef", "0000000000000000000000000000000000000000000000000000000000461bcd", "000000000000000000000000000000000000000000000000000000000108aff0", "0000000000000000000000000000000000000000000000000000000003df179c", "000000000000000000000000000000000000000000000000000000002f2e16d6", "000000000000000000000000000000000000000000000000000000004e487b71", "0000000000000000000000000000000000000000000000000000000061bc221a", "000000000000000000000000000000000000000000000000000000008da5cb5b", "000000000000000000000000000000000000000000000000000000009f16168e", "00000000000000000000000000000000000000000000000000000000c4b8c257", "00000000000000000000000000000000000000000000000000000000fd4459b1", "0000000000000000000000000000000000000000000000000000006970667358", "0000000000000000000000000000000000000000000027a7262cafa7aba722a9", "0000000000000000000000000000000000000000ffffffffffffffffffffffff", "0000000000000000000000000000000000004e6a5e87b5a058cdb06137a7099f", "000000000000000000000000006060e164736f6c634300080f00330000000000", "00000000000000000000000000a329c0648769a73afac7f9381e08fb43dbea72", "0000000000000000000000001804c8ab1f12e6bbf3894d4083f33e07309d1f38", "0000000000000000000000003fab184622dc19b6109349b94811493bf2a45362", "0000000000000000000000004e59b44847b379578588920ca78fbf26c0b4956c", "0000000000000000000000007109709ecfa91a80626ff3989d68f67f5b1dd12d", "000000000000000000000000ce71065d4017f316ec606fe4422e11eb2c47c246", "000000000000000000000001000000000000000000000000000000000000beef", "20d8a6f5a693f9d1d627a598e8820f7a55ee74c183aa8f1a30e8d4e8dd9a8d84", "ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffe0"]

),
(
90,
fuzz_param_from_state(&ParamType::Address, fuzz_state)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like weights are 90/10 but the comment above this method says 80/20. Probably should be exposed as a config option at some point (not necessarily in this PR)

@gakonst
Copy link
Member

gakonst commented Aug 11, 2022

PUSH values were being collected with a left alignment. (4442...0000 vs 0000...4442) . testNeedle(uint256) now successfully breaks, before it wouldnt.

Storage values were being collected in the wrong endianness. (efbe...0000 vs 0000...beef). testIncrement(address) now successfully breaks, before it wouldnt.

@transmissions11 this explains the storage/constants issue we encountered the other day during the invariants demo.

Packed values kinda break the storage collection. If you have a bool(true) & address(0xbeef), you might collect a value like 0000...beef01, which is useless.

@brockelmore I think you were looking into storage packing/unpacking for forge-std?

Disabled memory value collection until we find a better way to parse the values. Stuff like ab12...000..12ab...0000 is mostly spam.

sounds good. is there a nice way to find memory that's not spam-looking?

Remove from the StateChangeSet the randomly generated sender before collect_state_from_call. Otherwise, the dictionary gets flooded with irrelevant addresses.

what does that mean? do we will use the dictionary to populate potential senders?

Some parameter-aware dictionary would be nice. At the moment, if we want an address we might be getting a value from the dictionary which was an uint.
Values bigger than 32 bytes are not properly collected/used. (strings, bytes)

this should be a matter of a type-refactor of the dictionary to contain each data type? and have the proptest selector use from that? should also address the 32 bytes thing

We collect PUSH bytes all the time on all calls. Adding some cache for ABI/addresses before re-collecting them, might increase speed in some cases.

yep this would be good we could add a LRU cache?

@joshieDo
Copy link
Collaborator Author

joshieDo commented Aug 11, 2022

what does that mean? do we will use the dictionary to populate potential senders?

A randomly generated sender 0x1337 would be part of the changeset (because of the nonce increase), and therefore, its address would be added to the dictionary. It only applies to EOA, and we no longer allow it.
In a long session, the dictionary would be full of these addresses, burying other potential important values.

sounds good. is there a nice way to find memory that's not spam-looking?

One option would be to inspect in-between steps and look for mstore and mload. I wonder if it makes things very slow though...

@@ -84,11 +86,27 @@ fn generate_call(

/// Strategy to select a sender address:
/// * If `senders` is empty, then it's a completely random address.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm: It's not "completely random" right? But instead is either random OR from the dict, with the same weights as with other fuzz values

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshieDo same Q, what's the difference between fuzz_param_from_state that has a 90% chance of being selected in fuzz_strategy vs the senders: Vec<Address> being passed in the fn?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the docs to:

/// Strategy to select a sender address:
/// * If `senders` is empty, then it's either a random address (10%) or from the dictionary (90%).
/// * If `senders` is not empty, then there's an 80% chance that one from the list is selected. The
///   remaining 20% will either be a random address (10%) or from the dictionary (90%).

does it help?

Copy link
Member

@brockelmore brockelmore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see test contract comments

let mut state: BTreeSet<[u8; 32]> = BTreeSet::new();
for (address, account) in db.accounts.iter() {
// We don't want to collect data from the test contract.
if *address == test_address {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm i actually disagree here. A lot of times the test contract will be an owner of a protocol's contract. And people throw relevant state in the test contract as well (esp w/ invariant tests, things like target contracts, etc)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ultimately these should probably be flags exposed in the config—there are cases where collecting data from the test contract will flood your dict, and other times where it may be valuable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo very unlikely it will flood your dict - we collect push bytes, storage and basic account info. a lot of those values will either be duplicate of subcontracts or be relevant to the subcontracts (e.g. their addresses, assertion values etc.). if values are duplicate they don't expand the dictionary since it's a set

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshieDo if you really mean CALLER instead of test contract I agree, we dont wan't the caller. But we do want test contract state + address in the dictionary

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm ok so e.g. stack/mem from a large setUp method or test contract helper methods wouldn't be collected? if so then I agree it's not likely to flood and should be ok

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stack/mem is in a separate file, this file only collects push bytes + storage

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stack/mem would be collected for invariants (through the inspector) but not normal fuzzing (there's no inspector collector).

Hmm, I see your overall point. I'll revert it. ( i really meant the test contract).

logs: &[Log],
state_changeset: &StateChangeset,
state: EvmFuzzState,
) {
let mut state = state.write();

for (address, account) in state_changeset {
// We don't want to collect data from the test contract.
if *address == test_address {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again not sure i agree

@onbjerg
Copy link
Member

onbjerg commented Aug 11, 2022

Seems like this might have been the cause of #1168? If I understand correctly, we were collecting push bytes etc, but they were just collected in the wrong order (i.e. in reverse)?

Edit: Nvm, this does not necessarily resolve #1168

@onbjerg onbjerg added the T-bug Type: bug label Aug 11, 2022
@brockelmore
Copy link
Member

brockelmore commented Aug 11, 2022

@brockelmore I think you were looking into storage packing/unpacking for forge-std?

Yes but that is really a sub-optimal method only used in forge-std because we likely wont have access to storage layout from solc for fork tests. Here, you rarely run fuzz tests in a forked environment so we should have all storage layouts available

As for typed dictionary, should be possible in limited cases easily, and can get infinitely complicated trying to get more intelligent there. For example, we can (relatively) easily type return data and storage (changeset) just via storage layout and ABIs. This would improve and expand dynamic type support (string, bytes, lists). Typing the changeset with storage layout is a bit annoying for sha3-ed slots, but probably can be done a la banteg's method of preimage magic but would come at execution time cost (but would be nice to have for the debugger anyway).

The infinitely more complicated version also has ties to the debugger. If we get variable to stack association (i.e. stack item 2 is uint256 x from the AST), we can perfectly type the stack (well - almost, compiler generated pushes are opaque still) . We can also then potentially remove things like JUMPDEST locations that likely are irrelevant, etc. As well as being able to better understand what is in memory. I think that stuff is much longer term but has very positive synergies with the debugger

@onbjerg
Copy link
Member

onbjerg commented Aug 11, 2022

I think we need to be extra careful about storage layout assumptions, or any language-specific assumptions in general if they have some amount of complexity, since a long term goal is still to support more languages, and we cannot guarantee that the storage layout is the same. If we start making huge complex solutions to decode storage layout for the fuzzer, we're potentially shooting ourselves in the foot. Also not even sure how much it would improve the fuzzer vs. something like a coverage guided fuzzing

@brockelmore
Copy link
Member

On the typed dictionary front, are we positive that is overall good for the fuzzer? There is something nice about type obfuscation to try to screw things up. May be worthwhile long term to have some percentage of the strategy do type obfuscation for inputs to increase "relevant" randomness, instead of typing everything and only using known types (especially while typing is limited)

@joshieDo
Copy link
Collaborator Author

joshieDo commented Aug 11, 2022

Seems like this might have been the cause of #1168? If I understand correctly, we were collecting push bytes etc, but they were just collected in the wrong order (i.e. in reverse)?

Edit: Nvm, this does not necessarily resolve #1168

It does solve it though:

[⠒] Compiling...
No files changed, compilation skipped

Running 2 tests for test/SampleContract.t.sol:FuzzerDictTest
[FAIL. Reason: Undefined. Counterexample: calldata=0xe53491ce0000000000000000000000000000000000000000000000000000000000000064, args=[0x0000000000000000000000000000000000000064]] testImmutableOwner(address) (runs: 67, μ: 6176, ~: 6176)
[FAIL. Reason: Undefined. Counterexample: calldata=0x5f9789a200000000000000000000000000000000000000000000000000000000000000c8, args=[0x00000000000000000000000000000000000000c8]] testStorageOwner(address) (runs: 97, μ: 8243, ~: 8243)
Test result: FAILED. 0 passed; 2 failed; finished in 66.78ms

Failing tests:
Encountered 2 failing tests in test/SampleContract.t.sol:FuzzerDictTest
[FAIL. Reason: Undefined. Counterexample: calldata=0xe53491ce0000000000000000000000000000000000000000000000000000000000000064, args=[0x0000000000000000000000000000000000000064]] testImmutableOwner(address) (runs: 67, μ: 6176, ~: 6176)
[FAIL. Reason: Undefined. Counterexample: calldata=0x5f9789a200000000000000000000000000000000000000000000000000000000000000c8, args=[0x00000000000000000000000000000000000000c8]] testStorageOwner(address) (runs: 97, μ: 8243, ~: 8243)

Encountered a total of 2 failing tests, 0 tests succeeded

Copy link
Member

@gakonst gakonst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

solid, should we move ahead with this as-is? what follow ups do we have?

@mds1 mentioned config for:

  • percent of time dict vs. random vs. edge is used (edge + dict should be merged)
  • include-stack
  • include-memory
  • include-storage-keys
  • include-storage-values
  • include-push-bytes (constants, immutables)

Comment on lines +491 to +501
let mut has_code = false;
if let Some(Some(code)) = state_changeset.get(sender).map(|account| account.info.code.as_ref())
{
has_code = !code.is_empty();
}

// We keep the nonce changes to apply later.
let mut sender_changeset = None;
if !has_code {
sender_changeset = state_changeset.remove(sender);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to do this? what if there's e.g. a smart contract wallet that would make a call and hit e.g. a if isContract check that would trigger a re-entrancy via onERC721/onERC1155 fallback?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code only prevents adding this address to the dictionary through the statechangeset. The changes are still applied. So, unless I misunderstood something, it wouldn't prevent that scenario.

evm/src/fuzz/strategies/invariants.rs Show resolved Hide resolved
@@ -84,11 +86,27 @@ fn generate_call(

/// Strategy to select a sender address:
/// * If `senders` is empty, then it's a completely random address.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshieDo same Q, what's the difference between fuzz_param_from_state that has a 90% chance of being selected in fuzz_strategy vs the senders: Vec<Address> being passed in the fn?

evm/src/fuzz/strategies/param.rs Show resolved Hide resolved
evm/src/fuzz/strategies/state.rs Outdated Show resolved Hide resolved
@gakonst
Copy link
Member

gakonst commented Aug 11, 2022

On the typed dictionary front, are we positive that is overall good for the fuzzer? There is something nice about type obfuscation to try to screw things up. May be worthwhile long term to have some percentage of the strategy do type obfuscation for inputs to increase "relevant" randomness, instead of typing everything and only using known types (especially while typing is limited)

This is a good discussion point. I think the most valuable areas where this can happen is when trying to handcraft a bytes memory that's abi.encode/abi.encodePacked from a bunch of other values (e.g. the Nomad hack). Not sure how valuable that is to have, when compared to having a more targeted typed dict. Worth discussing in a separate issue.

@joshieDo
Copy link
Collaborator Author

solid, should we move ahead with this as-is? what follow ups do we have?

I'm hitting a test failure on forge-std:
[FAIL. Reason: Too many global rejects] testAssertEq_BytesErr_Pass(bytes,bytes) (runs: 53, μ: 20801, ~: 21464)

need to investigate

@gakonst
Copy link
Member

gakonst commented Aug 11, 2022

This makes me think our dict got too big / noisy?

@mds1
Copy link
Collaborator

mds1 commented Aug 11, 2022

This makes me think our dict got too big / noisy?

Hmm, maybe. That failed test (and others in forge-std) take two inputs a and b and have vm.assume(a == b). So a failure here implies that this PR makes the fuzzer less likely to pass the same inputs for both inputs, and passing the same value for both/all inputs is a valuable thing for the fuzzer to try.

@joshieDo
Copy link
Collaborator Author

joshieDo commented Aug 11, 2022

Okay figured it out, but I'm unsure on how to proceed:

I added PUSH data collection on build_initial_state(). I guess this leads to collecting too many PUSH values from the test contract. During collect_state_from_data , they don't get collected either, because the test contract doesn't show up on the statechangeset.

Once I remove it, the forge-std test passes fine.

Do we disregard the PUSH values in this scenario ? @brockelmore @mds1 Following the logic from before, I'd rather include them and change the forge-std test I think.

@gakonst
Copy link
Member

gakonst commented Aug 11, 2022

Once I remove it, the forge-std test passes fine.
Do we disregard the PUSH values in this scenario ? @brockelmore @mds1 Following the logic from before, I'd rather include them and change the forge-std test I think.

Should we maybe collect PUSHes that are bigger/equal than e.g. 1 byte? That would cover u8 -> u256 / address etc., whereas it'd omit things like small masks or function selectors etc.

@mds1
Copy link
Collaborator

mds1 commented Aug 13, 2022

It sounds like ultimately we may need to get smarter about which PUSH bytes are collected? @gakonst's idea is an interesting heuristic. Curious to hear what @brockelmore thinks but I'd be ok with changing the forge-std tests if required for now. Though as mentioned above occasionally passing the same value for both inputs is a good property so I don't think we want to lose that in the long term.

@gakonst
Copy link
Member

gakonst commented Aug 13, 2022

Though as mentioned above occasionally passing the same value for both inputs is a good property so I don't think we want to lose that in the long term.

Yeah we want this. Unrelated, but I also think we should still be generating the same call twice, e.g. transfer(0x1234, 100) -> transfer(0x1234, 100) in invariants, not sure how possible this is right now. Wonder if we should add an extra strategy for that.

@gakonst
Copy link
Member

gakonst commented Aug 15, 2022

@joshieDo are we still getting too many rejects after not collecting balances/nonces?

@joshieDo
Copy link
Collaborator Author

joshieDo commented Aug 15, 2022

The CI is failing on something unrelated, but I tried locally and it's still getting too many rejects. Also tried to only include >= PUSH5. Same result.

@gakonst
Copy link
Member

gakonst commented Aug 15, 2022

OK - let's remove the PUSH bytes collection in the dictionary for now & get this merged with the endianness fix.

Let's investigate a smart technique for collecting PUSH values in a separate PR.

@gakonst gakonst merged commit e7077e2 into foundry-rs:master Aug 15, 2022
iFrostizz pushed a commit to iFrostizz/foundry that referenced this pull request Nov 9, 2022
* always enable tracing on forge script

* wip

* add tests for storage collection during fuzz

* remove dbg statements

* fix ignored invariant_runs

* temporarily disable memory collection

* curate data we collect for dictionary

* clippy

* add test fuzz/invariant test for data collection

* add seed to test_fuzz_collection

* fix comments

* exclude tests from test_fuzz

* add seed to test_invariant_storage

* revert ignoring test_contract on state collection

* fix select_random_sender docs

* dont collect balance or nonces

* fix tests

* disable push collection on build_initial_state

* fix exclusion statement
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-bug Type: bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants