Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fuzz: new fuzz package and lnwire parsing harnesses #1895

Merged
merged 3 commits into from
Mar 10, 2020

Conversation

Crypt-iQ
Copy link
Collaborator

@Crypt-iQ Crypt-iQ commented Sep 13, 2018

This PR updates the fuzzing harness and converts it into one harness per message, updates the README for updated building instructions, and updates the corpus to include support for the newer messages. A new fuzz package was introduced which contains all of the message harnesses.

TODO:

  • Update the README with newer build instructions.

  • Build script

  • Update the go-fuzz fuzzing corpus to include support for the newer lnwire messages.

  • Add a fuzzing harness for each lnwire message so that things like correct encodings / signatures can be added and so the fuzzer doesn't get "confused".

Future Work:

@halseth halseth added documentation Documentation changes that do not affect code behaviour testing Improvements/modifications to the test suite P3 might get fixed, nice to have labels Sep 13, 2018
@halseth halseth modified the milestones: 0.5.1, 0.5.2 Sep 13, 2018
@Crypt-iQ Crypt-iQ force-pushed the lnwire_fuzzing_09_11_2018 branch from 8fef0ae to f73abf3 Compare September 23, 2018 10:21
@Crypt-iQ
Copy link
Collaborator Author

Crypt-iQ commented Nov 17, 2018

I fuzzed watchtower messages and a couple of the lnwire messages for ~70+ hours each, but came up short. I haven't been having much luck fuzzing the standard lnwire messages - I will add more updates to this branch when I have time like formatting correct messages that require encoding and stuff like that. I think I will design a new fuzzing harness in the same fashion that bitcoin-core does theirs.

Besides the lnwire and wtwire messages, I am open to suggestions for fuzzing other things in the project. I currently don't have the ability to fuzz consistently so this is hampering any bug finding - if anybody has computational resources, HIT ME UP. I have also been toying with the idea of using llgo with klee to run symbolic execution tests. This could even be combined with fuzzing as was done in the Darpa CGC (see: Driller)

@Roasbeef
Copy link
Member

Roasbeef commented Dec 4, 2018

Ping on this PR, is it good to go/review as is, or do you plan to address the lingering TODO's (or update the list)?

@Crypt-iQ
Copy link
Collaborator Author

Crypt-iQ commented Dec 5, 2018

I'm going to do the first TODO and commit that and then it should be ready to review. The second TODO should in a different PR because it can get quite involved generating better inputs (see above comment). For the third TODO, I'll be able to start fuzzing at work so I can run for a long time and it doesn't need a PR.

@Crypt-iQ
Copy link
Collaborator Author

Crypt-iQ commented Dec 18, 2018

Experiencing this problem: dvyukov/go-fuzz#195 because of go modules. Was able to get around it though, but not ideal / neat because it doesn't put the binaries in the project directory.

@Crypt-iQ
Copy link
Collaborator Author

Crypt-iQ commented Dec 19, 2018

I wasn't entirely sure up until this point that separating harnesses was a good idea, but I am now positive that it is. I was using Bitcoin Core as a guide, which switches on input even though it isn't as efficient because it can confuse the fuzzer and use input intended for one target on a different target. This also lets us know when we have reached the maximum coverage for a specific target (like the init message for example). Though it can be argued that we should continue to fuzz even once we reach a point where coverage is not changing, see this comment.
I am currently fuzzing all protocol messages and have noticed a ceiling for the coverage (~1850), but I think this is because go-fuzz is in this "confused" state and that our coverage could easily increase with better and split-up harnesses. So far, I have made only made a fuzzing harness for the init message.

I am maybe a bit confused on what to do about the "better corpus messages" because I have read (I'll try to find the link) that an "ideal" corpus is one that is small in size yet generates the maximum coverage. I am not sure how I feel about this, but I can leave that for a later time.

As far as my comment about symbolic execution, that is totally a bust. The library that converts golang to LLVM will only build in docker with a super old version of go and it will not work with KLEE (the symex). In general, I don't know enough about symbolic execution anyways. And it's sort of unrelated.

@Crypt-iQ Crypt-iQ force-pushed the lnwire_fuzzing_09_11_2018 branch from 84f79b0 to 7064f9d Compare December 21, 2018 16:00
@Crypt-iQ
Copy link
Collaborator Author

I was talking to @tuxx42 and we can get very good coverage information similar to AFL by using gccgo - so if the fuzzer is trying to guess a magic value, we will know and can add this input to the corpus.

@Crypt-iQ
Copy link
Collaborator Author

Should be ready for review @Roasbeef

@Crypt-iQ Crypt-iQ force-pushed the lnwire_fuzzing_09_11_2018 branch from 9c0543d to 4eb7e40 Compare January 14, 2019 12:59
@Roasbeef Roasbeef removed this from the 0.5.2 milestone Jan 16, 2019
@Crypt-iQ
Copy link
Collaborator Author

Crypt-iQ commented Jan 16, 2019

Removed message prefixes from the seed corpus, so everything is good to go now. Going to do some coverage comparison tests on a compute engine instance (new approach vs. old, general harness). Will post results

@Crypt-iQ
Copy link
Collaborator Author

Crypt-iQ commented Jan 23, 2019

Old, slightly patched harness fuzzes some messages more than others (24 hour test on an 8 core machine running 2 workers ~14k execs/sec, resulting corpus size of 72 which is extremely small given the time). Specifically openchannel, acceptchannel, nodeannouncement, queryshortchanids, & replychannelrange (bad - we want an even distribution!!). These are pretty shallow fuzzing targets (no state machines or anything fancy, just deserialization + serialization), so individually fuzzing messages might not give us much more coverage. The decodeShortChanIDs function doesn't hit the EncodingSortedZlib block, so that might be worth fuzzing individually... :neckbeard: Will add a brontide fuzzing PR soonish that depends on this one!

Coverage here: https://crypt-iq.github.io/count_wirefuzz.html

@Crypt-iQ
Copy link
Collaborator Author

Added zlib harnesses to hit the EncodingSortedZlib block

@Crypt-iQ Crypt-iQ force-pushed the lnwire_fuzzing_09_11_2018 branch from 0502b2a to f612ef6 Compare August 9, 2019 16:00
@Crypt-iQ Crypt-iQ changed the title docs+lnwire: Updating corpus, fuzz test, README fuzz: new fuzz package and lnwire parsing harnesses Aug 9, 2019
@Crypt-iQ
Copy link
Collaborator Author

Crypt-iQ commented Aug 9, 2019

Updated the PR with just 2 commits and included better building instructions since go-fuzz-build master fails to build the targets. Also included a one-liner in fuzz.md that builds all fuzzing binaries for a given package. The only thing left to do would be to generate a decent corpus for each message and zip it up in each harness directory. However, I don't have the resources to do that right now so maybe somebody else could do it 😉

@Crypt-iQ Crypt-iQ force-pushed the lnwire_fuzzing_09_11_2018 branch from f612ef6 to c7f74b4 Compare January 8, 2020 01:53
@Roasbeef Roasbeef requested a review from cfromknecht January 14, 2020 07:34
@Roasbeef Roasbeef added this to the 0.10.0 milestone Jan 14, 2020
@Roasbeef Roasbeef added the v0.10 label Jan 17, 2020
@Crypt-iQ Crypt-iQ force-pushed the lnwire_fuzzing_09_11_2018 branch from c7f74b4 to 2445034 Compare January 18, 2020 05:33
@Crypt-iQ
Copy link
Collaborator Author

Crypt-iQ commented Jan 18, 2020

Removed the corpus from the PR as it's overly burdensome to review with hundreds of files in the way. Updated coverage is here.

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stoked to finally see this land. Mostly some high level questions w.r.t methodology, also haven't yet run the latest versions of the per-message fuzzers.

)

// Fuzz is used by go-fuzz.
func Fuzz(data []byte) int {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level question: why do certain message targets like AcceptChannel use a custom scenario rather than the fuzz.Harness scenario as many of these below do?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah so this could have been more explicit. it's because of parsing UpfrontShutdownScript - before the test starts, if it's empty (nil slice), it will be converted into a non-nil slice at the very end of the test (the decoding+encoding changes it). since there's no way to only reflect.DeepEqual subsets of a struct (or like switch on error / determine which field the error occured), had to do it this way

@@ -0,0 +1,21 @@
// +build gofuzz

package fundinglocked
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale for each of them having a new package rather than just a series of files in a blanket package?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah so at the time of writing the tests you couldn't have multiple Fuzz funcs in the same package, but that's changed now so can change it to this - would make the directory look cleaner too..

// PrefixWithMsgType takes []byte and adds a wire protocol prefix
// to make the []byte into an actual message to be used in fuzzing.
func PrefixWithMsgType(data []byte, prefix lnwire.MessageType) []byte {
prefixBytes := make([]byte, 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively we can just declare it as:

var prefixBytes [2]byte

@Crypt-iQ
Copy link
Collaborator Author

Crypt-iQ commented Jan 27, 2020

Will get rid of all the packages. NOTE: Some of the harnesses are a bit different (acceptchannel, nodeannouncement, openchannel) since the channel ones have a quirk during serialization +deserialization with the UpfrontShutdownScript. nodeannouncement contains []net.Addr which don't serialize + deserialize back to the byte-by-byte same address (ipv6 address can be converted into an ipv4 address to be minimally encoded)

@Crypt-iQ
Copy link
Collaborator Author

Flattened all packages to single lnwirefuzz package. Had to modify the decodeShortChanIds function so that invalid EncodingType returns an error if there are no short channel ids rather than no error.

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🦅

I think this is ready to land after the fixup commits have been squashed, and this PR rebased to master.

@Crypt-iQ Crypt-iQ force-pushed the lnwire_fuzzing_09_11_2018 branch 2 times, most recently from c0f2ef6 to cdd33f8 Compare February 4, 2020 01:48
@Crypt-iQ
Copy link
Collaborator Author

Crypt-iQ commented Feb 4, 2020

@Roasbeef rebased+squashed, not sure why Travis is failing. I would also like to point to your attention 0b78f59 which errors if the decoded EncodingType is unknown. I think the fix more accurately follows the ideal behavior of decoding+mimics the encoding, plus it also helps in fuzz testing since the general harness will panic if WriteMessage fails, which it would, here.

@Crypt-iQ Crypt-iQ requested a review from Roasbeef February 4, 2020 03:00
docs/gofuzz/fuzz.md Outdated Show resolved Hide resolved
// Check whether the encodingType is valid or not.
if encodingType > EncodingSortedZlib {
return 0, nil, ErrUnknownShortChanIDEncoding(encodingType)
}
Copy link
Contributor

@cfromknecht cfromknecht Feb 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible just to remove this whole check, i.e. the zero length check? to me it looks like both of the existing sections below handle the zero-length case gracefully, and then error will instead be produced by the default case

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can just remove this check

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the check, but had to add a special check here. The first ReadMessage call would succeed and return an empty list of ShortChannelIDs (it would succeed because there would be bytes, but not enough to be a zlib-encoded *ShortChannelID); the subsequent call would error on ReadMessage when creating the zlib reader because there were no more bytes left to be read.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, removing this check makes TestReplyChannelRangeEmpty fail. Now any QueryShortChanIDs or ReplyChannelRange message with zlib encoding and no ShortChannelIDs will fail with unable to create zlib reader: unexpected EOF. I don't think the peer should fail on a legitimate message like this.

@Crypt-iQ Crypt-iQ force-pushed the lnwire_fuzzing_09_11_2018 branch 2 times, most recently from e303410 to 7d28d7e Compare February 19, 2020 03:15
// At this point, if there's no body remaining, then only the encoding
// type was specified, meaning that there're no further bytes to be
// parsed.
if len(queryBody) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we just need to move the check to the beginning of the zlib case and the tests should pass again?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup that should work

@Roasbeef
Copy link
Member

New test failure?

--- FAIL: TestReplyChannelRangeEmpty (0.00s)
    --- FAIL: TestReplyChannelRangeEmpty/empty_zlib_encoding (0.00s)
        reply_channel_range_test.go:86: unable to decode req: unable to create zlib reader: unexpected EOF

@Crypt-iQ
Copy link
Collaborator Author

New test failure?

--- FAIL: TestReplyChannelRangeEmpty (0.00s)
    --- FAIL: TestReplyChannelRangeEmpty/empty_zlib_encoding (0.00s)
        reply_channel_range_test.go:86: unable to decode req: unable to create zlib reader: unexpected EOF

Have to do this: #1895 (comment)

@Crypt-iQ Crypt-iQ force-pushed the lnwire_fuzzing_09_11_2018 branch from 7d28d7e to 363bdc4 Compare March 3, 2020 18:59
@Crypt-iQ
Copy link
Collaborator Author

Crypt-iQ commented Mar 3, 2020

Ran all the fuzzers one last time and no more bugs with the fuzzing harness from what i can tell. Should be g2g now

@Roasbeef Roasbeef merged commit 462048a into lightningnetwork:master Mar 10, 2020
@Crypt-iQ Crypt-iQ deleted the lnwire_fuzzing_09_11_2018 branch March 10, 2020 00:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Documentation changes that do not affect code behaviour fuzzing P3 might get fixed, nice to have testing Improvements/modifications to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants