-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(evm): fuzzing not properly collecting data #2724
Conversation
), | ||
( | ||
90, | ||
fuzz_param_from_state(&ParamType::Address, fuzz_state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like weights are 90/10 but the comment above this method says 80/20. Probably should be exposed as a config option at some point (not necessarily in this PR)
@transmissions11 this explains the storage/constants issue we encountered the other day during the invariants demo.
@brockelmore I think you were looking into storage packing/unpacking for forge-std?
sounds good. is there a nice way to find memory that's not spam-looking?
what does that mean? do we will use the dictionary to populate potential senders?
this should be a matter of a type-refactor of the dictionary to contain each data type? and have the proptest selector use from that? should also address the 32 bytes thing
yep this would be good we could add a LRU cache? |
A randomly generated sender
One option would be to inspect in-between |
@@ -84,11 +86,27 @@ fn generate_call( | |||
|
|||
/// Strategy to select a sender address: | |||
/// * If `senders` is empty, then it's a completely random address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm: It's not "completely random" right? But instead is either random OR from the dict, with the same weights as with other fuzz values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joshieDo same Q, what's the difference between fuzz_param_from_state
that has a 90% chance of being selected in fuzz_strategy
vs the senders: Vec<Address>
being passed in the fn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the docs to:
/// Strategy to select a sender address:
/// * If `senders` is empty, then it's either a random address (10%) or from the dictionary (90%).
/// * If `senders` is not empty, then there's an 80% chance that one from the list is selected. The
/// remaining 20% will either be a random address (10%) or from the dictionary (90%).
does it help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see test contract comments
evm/src/fuzz/strategies/state.rs
Outdated
let mut state: BTreeSet<[u8; 32]> = BTreeSet::new(); | ||
for (address, account) in db.accounts.iter() { | ||
// We don't want to collect data from the test contract. | ||
if *address == test_address { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm i actually disagree here. A lot of times the test contract will be an owner of a protocol's contract. And people throw relevant state in the test contract as well (esp w/ invariant tests, things like target contracts, etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ultimately these should probably be flags exposed in the config—there are cases where collecting data from the test contract will flood your dict, and other times where it may be valuable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo very unlikely it will flood your dict - we collect push bytes, storage and basic account info. a lot of those values will either be duplicate of subcontracts or be relevant to the subcontracts (e.g. their addresses, assertion values etc.). if values are duplicate they don't expand the dictionary since it's a set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joshieDo if you really mean CALLER
instead of test contract
I agree, we dont wan't the caller. But we do want test contract state + address in the dictionary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm ok so e.g. stack/mem from a large setUp method or test contract helper methods wouldn't be collected? if so then I agree it's not likely to flood and should be ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stack/mem is in a separate file, this file only collects push bytes + storage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stack/mem
would be collected for invariants (through the inspector) but not normal fuzzing (there's no inspector collector).
Hmm, I see your overall point. I'll revert it. ( i really meant the test contract).
evm/src/fuzz/strategies/state.rs
Outdated
logs: &[Log], | ||
state_changeset: &StateChangeset, | ||
state: EvmFuzzState, | ||
) { | ||
let mut state = state.write(); | ||
|
||
for (address, account) in state_changeset { | ||
// We don't want to collect data from the test contract. | ||
if *address == test_address { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again not sure i agree
Yes but that is really a sub-optimal method only used in forge-std because we likely wont have access to storage layout from solc for fork tests. Here, you rarely run fuzz tests in a forked environment so we should have all storage layouts available As for typed dictionary, should be possible in limited cases easily, and can get infinitely complicated trying to get more intelligent there. For example, we can (relatively) easily type return data and storage (changeset) just via storage layout and ABIs. This would improve and expand dynamic type support (string, bytes, lists). Typing the changeset with storage layout is a bit annoying for sha3-ed slots, but probably can be done a la banteg's method of preimage magic but would come at execution time cost (but would be nice to have for the debugger anyway). The infinitely more complicated version also has ties to the debugger. If we get variable to stack association (i.e. stack item 2 is |
I think we need to be extra careful about storage layout assumptions, or any language-specific assumptions in general if they have some amount of complexity, since a long term goal is still to support more languages, and we cannot guarantee that the storage layout is the same. If we start making huge complex solutions to decode storage layout for the fuzzer, we're potentially shooting ourselves in the foot. Also not even sure how much it would improve the fuzzer vs. something like a coverage guided fuzzing |
On the typed dictionary front, are we positive that is overall good for the fuzzer? There is something nice about type obfuscation to try to screw things up. May be worthwhile long term to have some percentage of the strategy do type obfuscation for inputs to increase "relevant" randomness, instead of typing everything and only using known types (especially while typing is limited) |
It does solve it though: [⠒] Compiling...
No files changed, compilation skipped
Running 2 tests for test/SampleContract.t.sol:FuzzerDictTest
[FAIL. Reason: Undefined. Counterexample: calldata=0xe53491ce0000000000000000000000000000000000000000000000000000000000000064, args=[0x0000000000000000000000000000000000000064]] testImmutableOwner(address) (runs: 67, μ: 6176, ~: 6176)
[FAIL. Reason: Undefined. Counterexample: calldata=0x5f9789a200000000000000000000000000000000000000000000000000000000000000c8, args=[0x00000000000000000000000000000000000000c8]] testStorageOwner(address) (runs: 97, μ: 8243, ~: 8243)
Test result: FAILED. 0 passed; 2 failed; finished in 66.78ms
Failing tests:
Encountered 2 failing tests in test/SampleContract.t.sol:FuzzerDictTest
[FAIL. Reason: Undefined. Counterexample: calldata=0xe53491ce0000000000000000000000000000000000000000000000000000000000000064, args=[0x0000000000000000000000000000000000000064]] testImmutableOwner(address) (runs: 67, μ: 6176, ~: 6176)
[FAIL. Reason: Undefined. Counterexample: calldata=0x5f9789a200000000000000000000000000000000000000000000000000000000000000c8, args=[0x00000000000000000000000000000000000000c8]] testStorageOwner(address) (runs: 97, μ: 8243, ~: 8243)
Encountered a total of 2 failing tests, 0 tests succeeded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
solid, should we move ahead with this as-is? what follow ups do we have?
@mds1 mentioned config for:
- percent of time dict vs. random vs. edge is used (edge + dict should be merged)
- include-stack
- include-memory
- include-storage-keys
- include-storage-values
- include-push-bytes (constants, immutables)
let mut has_code = false; | ||
if let Some(Some(code)) = state_changeset.get(sender).map(|account| account.info.code.as_ref()) | ||
{ | ||
has_code = !code.is_empty(); | ||
} | ||
|
||
// We keep the nonce changes to apply later. | ||
let mut sender_changeset = None; | ||
if !has_code { | ||
sender_changeset = state_changeset.remove(sender); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to do this? what if there's e.g. a smart contract wallet that would make a call and hit e.g. a if isContract
check that would trigger a re-entrancy via onERC721/onERC1155 fallback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code only prevents adding this address to the dictionary through the statechangeset
. The changes are still applied. So, unless I misunderstood something, it wouldn't prevent that scenario.
@@ -84,11 +86,27 @@ fn generate_call( | |||
|
|||
/// Strategy to select a sender address: | |||
/// * If `senders` is empty, then it's a completely random address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joshieDo same Q, what's the difference between fuzz_param_from_state
that has a 90% chance of being selected in fuzz_strategy
vs the senders: Vec<Address>
being passed in the fn?
This is a good discussion point. I think the most valuable areas where this can happen is when trying to handcraft a |
I'm hitting a test failure on need to investigate |
This makes me think our dict got too big / noisy? |
Hmm, maybe. That failed test (and others in forge-std) take two inputs |
Okay figured it out, but I'm unsure on how to proceed: I added Once I remove it, the Do we disregard the PUSH values in this scenario ? @brockelmore @mds1 Following the logic from before, I'd rather include them and change the |
Should we maybe collect PUSHes that are bigger/equal than e.g. 1 byte? That would cover u8 -> u256 / address etc., whereas it'd omit things like small masks or function selectors etc. |
It sounds like ultimately we may need to get smarter about which PUSH bytes are collected? @gakonst's idea is an interesting heuristic. Curious to hear what @brockelmore thinks but I'd be ok with changing the forge-std tests if required for now. Though as mentioned above occasionally passing the same value for both inputs is a good property so I don't think we want to lose that in the long term. |
Yeah we want this. Unrelated, but I also think we should still be generating the same call twice, e.g. |
2a9f1f9
to
6648e8c
Compare
@joshieDo are we still getting too many rejects after not collecting balances/nonces? |
The CI is failing on something unrelated, but I tried locally and it's still getting too many rejects. Also tried to only include >= |
OK - let's remove the PUSH bytes collection in the dictionary for now & get this merged with the endianness fix. Let's investigate a smart technique for collecting PUSH values in a separate PR. |
* always enable tracing on forge script * wip * add tests for storage collection during fuzz * remove dbg statements * fix ignored invariant_runs * temporarily disable memory collection * curate data we collect for dictionary * clippy * add test fuzz/invariant test for data collection * add seed to test_fuzz_collection * fix comments * exclude tests from test_fuzz * add seed to test_invariant_storage * revert ignoring test_contract on state collection * fix select_random_sender docs * dont collect balance or nonces * fix tests * disable push collection on build_initial_state * fix exclusion statement
Seems like we weren't using any of the collected values from PUSH and storage. For both fuzzing and invariants!
Fuzz & Invariant
PUSH
values were being collected with a left alignment. (4442...0000
vs0000...4442
) .testNeedle(uint256)
now successfully breaks, before it wouldnt.efbe...0000
vs0000...beef
).testIncrement(address)
now successfully breaks, before it wouldnt.Invariant Dictionary Flooding
memory
value collection until we find a better way to parse the values. Stuff likeab12...000..12ab...0000
is mostly spam.StateChangeSet
the randomly generated sender beforecollect_state_from_call
. Otherwise, the dictionary gets flooded with irrelevant addresses.Some current limitations/follow-ups :
bool(true)
&address(0xbeef)
, you might collect a value like0000...beef01
, which is useless.PUSH
bytes all the time on all calls. Adding some cache for ABI/addresses before re-collecting them, might increase speed in some cases.--
Added a quick way to inspect the dictionary. Example