Add a libFuzzer fuzzing harness #744

elichai · 2020-04-18T16:13:19Z

Hi,
As a result of #739, I wrote a fuzzing harness so people can fuzz the library themselves if they want so.

This is my first time playing with libFuzzer so any feedback/criticism is very welcome.

For the autotools part it's mostly copied from bitcoin core with the caveat of automatically adding -fsanitize=fuzzer, because this is only libFuzzer so and that's required to run it anyway.

Currently it's fuzzing only the public API of the library, but in the future we can add fuzzing for internal functions, which is why I created a fuzz directory.

One thing I did that I know isn't quite standard is the Fuzz Garbage through parsing functions part, where I loop over the input byte by byte and feed it into the parsing functions, I think this is a good way to throw a lot of data into these functions and make sure that: A. they don't crash, B. they return only 1 or 0.

cc @practicalswift

elichai · 2020-04-19T08:44:38Z

Maybe I should add more "rainy day" scenarios? (ie if I can decode a pubkey+signature from the input, try to verify it and check that it returns 0)

practicalswift · 2020-04-19T10:04:06Z

@elichai

This is an excellent initiative! Thanks a lot for doing this! ❤️

Strongest possible concept ACK from a fellow fuzzing enthusiast :)

Yesterday I setup a very basic fuzzing infrastructure which does continuous fuzzing of master using the code in this PR. It has been running for a few CPU hours by now and it seems to be chugging along fine. I hope to be able ramp up the number of cores dedicated to this and keep it running permanently (with periodic automatic re-builds against master) :)

I'd be happy to share all coverage increasing inputs I'm able to find. (In the extremely unlikely event that I find a crashing input (crashing with or without sanitizers enabled) it will be responsibly reported to the maintainers like I've done in the past when I've found issues in Bitcoin Core or any of its dependencies (CVE-2017-18350, CVE-2018-20586, CVE-2019-18936, etc.). I believe in responsible disclosure and the policy I'm following myself is that the project maintainer decides when to go public with any findings and ideally that the maintainer handles all such public disclosure to avoid having the researcher accidentally disclosing more than the maintainer signed off for due to miscommunication.)

A few comments/suggestions regarding the implementation that I thought of when I tinkered with your code while setting up the continuous fuzzing:

The configure option semantics are not matching the ones in Bitcoin Core. In Bitcoin Core --enable-fuzz does not automatically add -fsanitize=fuzzer so if you wantlibFuzzer with say ASAN/UBSAN you'll have to do --enable-fuzz --with-sanitizers=address,fuzzer,undefined. The rationale being that you might want to use another fuzzer such as afl-fuzz or honggfuzz. What do you think about matching how it works in Bitcoin Core?
What do you think about going from the current single-binary-with-many-fuzzing-harnesses to the one-binary-per-independent-fuzzing-harness model that Bitcoin Core uses? My experience is that the latter model works better for modern coverage-guided fuzzers. See also the comment in Offering my Bitcoin fuzzers bitcoin/bitcoin#11045 (comment) from @Dor1s.
Regarding adding more "rainy day" scenarios: yes, please! :)

Again: @elichai - this is a great initative. Thanks!

elichai · 2020-04-19T10:12:57Z

Thanks :)

The configure option semantics are not matching the ones in Bitcoin Core. In Bitcoin Core --enable-fuzz does not automatically add -fsanitize=fuzzer so if you wantlibFuzzer with say ASAN/UBSAN you'll have to do --enable-fuzz --with-sanitizers=address,fuzzer,undefined. The rationale being that you might want to use another fuzzer such as afl-fuzz or honggfuzz. What do you think about matching how it works in Bitcoin Core?

I've done it like that because I only added support for libFuzzer right now, didn't want to add the complexity Bitcoin Core has for AFL because I wasn't sure if anyone was using it and it can be done in a future PR, but if it's a common setup I can definitely adapt.
(FYI, clang doesn't care if you specify a sanitizer multiple times so you can still run with --enable-fuzz --with-sanitizers=address,fuzzer,undefined)

What do you think about going from the current single-binary-with-many-fuzzing-harnesses to the one-binary-per-independent-fuzzing-harness model that Bitcoin Core uses? My experience is that the latter model works better for modern coverage-guided fuzzers. See also the comment in bitcoin/bitcoin#11045 (comment) from @Dor1s.

I also thought about it, right now we have different benchmarks binary per module, but one test binary for all modules.
I used the same one for all because they share common characteristics (need 32 bytes, need to verify a seckey etc.) but if there's a practical advantage to separate binaries I can do that.

P.S. what do you think about the parsing loops, are they a good idea or that's not really how the fuzzing is designed to be used? (I'm iterating byte by byte over the whole input trying it as input)

practicalswift · 2020-04-30T15:20:50Z

@elichai

Have you checked what parts of the code your fuzzer is able to reach deeply in to, and what parts that are only shallowly fuzzed when starting from an empty seed corpus?

Due to the nature of the library there are quite a few natural "fuzzing road blocks" in the code base that will need to be handled to get good/deep coverage when doing coverage-based fuzzing starting from an empty corpus. John Regehr has a great piece on writing fuzzable code which also covers "fuzzer blockers".

I highly recommend Regehr's blog if you're interested in modern fuzzing and modern fuzzing research. Here are some good reads:

https://blog.regehr.org/archives/1700 (Helping Generative Fuzzers Avoid Looking Only Where the Light is Good, Part 1)
https://blog.regehr.org/archives/1687 (Write Fuzzable Code)
https://blog.regehr.org/archives/1284 (Reducers are Fuzzers)
https://blog.regehr.org/archives/1269 (API Fuzzing vs. File Fuzzing: A Cautionary Tale)
https://blog.regehr.org/archives/1039 (Levels of Fuzzing)
https://blog.regehr.org/archives/896 (How to Fuzz an ADT Implementation)
https://blog.regehr.org/archives/856 (Oracles for Random Testing)

Happy fuzzing! :)

elichai · 2020-04-30T15:23:54Z

I'll read, thank you.

Have you checked what parts of the code your fuzzer is able to reach deeply in to, and what parts that are only shallowly fuzzed when starting from an empty seed corpus?

How do I check that? And it should be able to reach pretty deeply in the "happy path"(because a valid seckey is pretty probable, and then I sign, verify etc with it)

practicalswift · 2020-04-30T15:52:55Z

How do I check that?

This is the type of output you would like to be looking at when searching for opportunities to tweak your fuzzer to reach deeper:

https://marcofalke.github.io/btc_cov/fuzz.coverage/index.html

It shows the current fuzzing coverage of Bitcoin Core. It is currently at 50% line coverage but I'm far from done. I'm aiming for 1.) beating the unit tests with regards to coverage (short term), 2.) beating the functional tests with regards to coverage (medium term) and 3.) in the longer term the final goal is full coverage of course :)

See if you can create such a page with lcov output for your secp256k1 fuzzer. That would be really nice to see!

I would suggest first fuzzing normally using say:

$ mkdir -p thin-air-corpus/
$ ./fuzz_public -use_value_profile=1 -max_len=8192 thin-air-corpus/

Let it run until you get bored waiting for entries marked NEW in the libFuzzer output: in other words when the increases in the cov and ft columns are infrequent.

Stop the fuzzer.

Now go build a coverage instrumented build of fuzz_public which is not using -fsanitize=fuzzer.

Call the resulting binary fuzz_public_cov_without_libfuzzer.

Do ...

for INPUT in thin-air-corpus/*; do
    ./fuzz_public_cov_without_libfuzzer < $INPUT
done

... to collect coverage data.

Use lcov, gcov, gcovr or your favourite coverage analysis tool to see the achieved coverage :)

Tweak your fuzzing harness to achieve even better coverage.

Repeat the whole process.

Again.

And again.

And again until you've reached a level of coverage you're pleased with :)

And it should be able to reach pretty deeply in the "happy path"(because a valid seckey is pretty probable, and then I sign, verify etc with it)

All paths are good to fuzz, but generally speaking if I'd have to choose between fuzzing only "happy paths" or only non-"happy paths" to find vulnerabilities I'd go for non-happy paths (paths likely to invoke error handling, etc.). Those are usually relatively under-tested and thus the places where you are most likely to find interesting stuff :)

practicalswift · 2020-04-30T23:26:21Z

@elichai

I've now created a repo (https://github.com/practicalswift/libsecp256k1-fuzzing-seed-corpus) where we can share coverage increasing inputs :)

I've added a first batch of the 2911 coverage increasing inputs I've found using a fuzzing farm that has been running non-stop for 12 days (calendar time, CPU time much much more of course).

Hope this first set of inputs can help you find opportunities where the fuzzing harness can be tweaked to reach deeper in to the code base and thus achieve better coverage :)

Really excited about your fuzzing work!

Quick start guide if anyone else wants to follow along on this fuzzing journey:

# check out https://github.com/bitcoin-core/secp256k1/pull/744
$ ./autogen.sh
$ CC=clang CXX=clang++ ./configure --enable-fuzz --with-asm=no --enable-experimental --enable-module-ecdh --enable-module-recovery
$ make
$ git clone https://github.com/practicalswift/libsecp256k1-fuzzing-seed-corpus
$ mkdir -p coverage-increasing-inputs-to-submit-upstream/
$ ./fuzz_public -use_value_profile=1 coverage-increasing-inputs-to-submit-upstream/ libsecp256k1-fuzzing-seed-corpus/seeds/
INFO: Seed: 747237488
INFO: Loaded 2 modules   (717 inline 8-bit counters): 625 [0x7f66c8ed90c0, 0x7f66c8ed9331), 92 [0x6f9120, 0x6f917c),
INFO: Loaded 2 PC tables (717 PCs): 625 [0x7f66c8ed9338,0x7f66c8edba48), 92 [0x4d2bc0,0x4d3180),
INFO:        0 files found in coverage-increasing-inputs-to-submit-upstream/
INFO:     2911 files found in libsecp256k1-fuzzing-seed-corpus/seeds/
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 8191 bytes
INFO: seed corpus: files: 2911 min: 1b max: 8191b total: 259749b rss: 25Mb
#2048   pulse  cov: 425 ft: 13712 corp: 1712/54Kb exec/s: 682 rss: 26Mb
#2912   INITED cov: 445 ft: 14943 corp: 2533/239Kb exec/s: 582 rss: 27Mb
…
# Please submit any coverage increasing inputs to:
# * https://github.com/practicalswift/libsecp256k1-fuzzing-seed-corpus

elichai · 2020-05-03T19:36:47Z

I've now created a repo (https://github.com/practicalswift/libsecp256k1-fuzzing-seed-corpus) where we can share coverage increasing inputs :)

Thanks! This is awesome.
I think I'll reformat this into multiple seperate binaries per module as you and others have suggested, and then I'll manually add corpuses to increase coverage(and PR them into your repo), but it's already looking pretty good.

practicalswift · 2020-05-03T19:42:07Z

@elichai Excellent! Let me know if I can help in any way :)

elichai · 2020-05-04T10:57:10Z

@elichai Excellent! Let me know if I can help in any way :)

There's one thing I'm trying to figure out
Is it ok to reuse the same fuzzing input on multiple functions or should I "pull" more data for each function I call? should I minimize the amount of fuzzing input I'm using or maximize it?

Right now I'm mostly reusing the same data for a lot of operations, and the whole "Fuzz Garbage through parsing functions" basically throws all the input into those functions.

But if I shouldn't use the same input for multiple functions(which if I look at FuzzedDataProvider https://github.com/llvm/llvm-project/blob/master/compiler-rt/include/fuzzer/FuzzedDataProvider.h it sure looks like it) then I should add an abstraction that keeps an index, and remove the loops that go over all the data.
And if I do that what should I do if I'm out of data? return 0;? loop to the beginning? something else?

(P.S. Are you on IRC or any chat platform?)

practicalswift · 2020-05-06T13:37:38Z

@elichai

Is it ok to reuse the same fuzzing input on multiple functions or should I "pull" more data for each function I call? should I minimize the amount of fuzzing input I'm using or maximize it?

Re-using the same parts of the input for independent parts of the fuzzing harness is likely to confuse the coverage-guided fuzzer and should be avoided if possible.

Example:

If you're fuzzing the f and g using input from fuzzer controlled stream s then …

f(s.read(10))
g(s.read(10))

… is much better than …

f(s.read(10))
s.reset()
g(s.read(10)) # re-use the same ten bytes used for f

… since the fuzzer in the first case can try to figure out the relation between s[0:10] vs coverage achieved in f, and s[10:20] vs coverage achieved in g independently.

Generally you want to make everything as dumb as possible for the coverage-guided fuzzer to work: no input re-use which may obscure the true interaction between input provided and coverage reached, and many small target binaries (testing a single unit of of functionality each) instead of one big target binary trying to cover all functionality.

In the example above then f and g should ideally be fuzzed in different binaries assuming they are unrelated/independent of each other.

Hope that helps :)

(P.S. Are you on IRC or any chat platform?)

I don't use IRC, but I'll send you an e-mail right away :)

elichai · 2020-05-07T15:16:56Z

I believe this is getting better.
Should I split fuzz_public into multiple binaries? (fuzz_ecdsa, fuzz_arithmetic, fuzz_sigs, fuzz_key_parsing)

practicalswift · 2020-05-22T21:14:19Z

@elichai Yes, please split in multiple binaries: see the last point in these recommendations from the libFuzzer documentation :)

Some important things to remember about fuzz targets:

The fuzzing engine will execute the fuzz target many times with different inputs in the same process.

It must tolerate any kind of input (empty, huge, malformed, etc).

It must not exit() on any input.

It may use threads but ideally all threads should be joined at the end of the function.

It must be as deterministic as possible. Non-determinism (e.g. random decisions not based on the input bytes) will make fuzzing inefficient.

It must be fast. Try avoiding cubic or greater complexity, logging, or excessive memory consumption. Ideally, it should not modify any global state (although that’s not strict).

Usually, the narrower the target the better. E.g. if your target can parse several data formats, split it into several targets, one per format.

practicalswift · 2020-09-08T21:50:01Z

Anyone wants to chime in with Concept ACK:s or Concept NACK:s?

(FWIW I'm super enthusiastic about your work @elichai, and if it doesn't end up in this repo I'd like to collaborate with you on this effort in another repo (say secp256k1-fuzz :))

real-or-random · 2020-09-09T14:45:52Z

Anyone wants to chime in with Concept ACK:s or Concept NACK:s?

I rally don't know a lot about fuzzing so I doubt I'm qualified enough to judge this. What's the cost of maintaining this? Does it have advantages to keep this in this repo here (because it may be extended to the internal API) or can this live equally good in another repo?

practicalswift · 2020-11-30T14:22:49Z

@elichai

I think a rebase is needed :)

Somewhat related to this PR:

@guidovranken is doing some really interesting differential fuzzing work which AFAIK haven't found any bugs in libsecp256k1, but in some other libraries like Trezor: trezor/trezor-firmware#1374.

Perhaps it could serve as inspiration for your fuzzing work @elichai, or you could join forces? :)

elichai · 2020-11-30T15:04:27Z

@guidovranken is doing some really interesting differential fuzzing work which AFAIK haven't found any bugs in libsecp256k1, but in some other libraries like Trezor: trezor/trezor-firmware#1374.

Perhaps it could serve as inspiration for your fuzzing work @elichai, or you could join forces? :)

Small note, I'm not sure it's that obvious that a zero message should pass ECDSA verification (see rust-bitcoin/rust-secp256k1#207)

but yeah I don't mind helping fuzzing / rebasing, and I should probably add schnorrsig coverage here at some point

sipa · 2023-05-09T14:34:24Z

Sorry for the late response, but Concept ACK.

While the general advice is indeed to have multiple fuzzing binaries for different tests (and specifically, not read the "which fuzz test to run" from the test input, as the fuzzing engine will try to mutate pointlessly between the tests), in Bitcoin Core we discovered this quickly becomes unwieldy when there are lots of tests. We instead settled for reading the test name from the environment (so, one binary, but it needs to be told through FUZZ="fuzz_test_name" which test to run). I think it'd make sense to do something similar here, as I can envision having many tests (I'll write ones myself too).

Going further than this, I think the most interesting types of test will be more low-level tests (field, scalar arithmetic, perhaps group operations), and higher-level tests for the exhaustive test field. Having this is a great start, though.

Rebase?

Add a fuzz option to autotools

20983eb

elichai force-pushed the 2020-04-fuzz branch from 3cde9a1 to 956a024 Compare April 30, 2020 08:47

elichai added 5 commits May 7, 2020 17:48

Create helpers for writing fuzzers

7678916

Add a fuzzing harness for the public API

86cd7c1

Add a fuzzing harness for the Recovery module API

a17f2b3

Add a fuzzing harness for the ECDH module API

f894863

Build fuzz harness in travis

c1ce466

elichai force-pushed the 2020-04-fuzz branch from 956a024 to c1ce466 Compare May 7, 2020 15:15

real-or-random mentioned this pull request Dec 2, 2020

API tests #187

Open

real-or-random mentioned this pull request Feb 1, 2021

fix uninitialized read in tests #889

Merged

real-or-random added the assurance label May 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a libFuzzer fuzzing harness #744

Add a libFuzzer fuzzing harness #744

elichai commented Apr 18, 2020

elichai commented Apr 19, 2020

practicalswift commented Apr 19, 2020

elichai commented Apr 19, 2020 •

edited

Loading

practicalswift commented Apr 30, 2020

elichai commented Apr 30, 2020

practicalswift commented Apr 30, 2020 •

edited

Loading

practicalswift commented Apr 30, 2020 •

edited

Loading

elichai commented May 3, 2020

practicalswift commented May 3, 2020

elichai commented May 4, 2020 •

edited

Loading

practicalswift commented May 6, 2020 •

edited

Loading

elichai commented May 7, 2020

practicalswift commented May 22, 2020

practicalswift commented Sep 8, 2020

real-or-random commented Sep 9, 2020

practicalswift commented Nov 30, 2020

elichai commented Nov 30, 2020

sipa commented May 9, 2023 •

edited

Loading

Add a libFuzzer fuzzing harness #744

Are you sure you want to change the base?

Add a libFuzzer fuzzing harness #744

Conversation

elichai commented Apr 18, 2020

elichai commented Apr 19, 2020

practicalswift commented Apr 19, 2020

elichai commented Apr 19, 2020 • edited Loading

practicalswift commented Apr 30, 2020

elichai commented Apr 30, 2020

practicalswift commented Apr 30, 2020 • edited Loading

practicalswift commented Apr 30, 2020 • edited Loading

elichai commented May 3, 2020

practicalswift commented May 3, 2020

elichai commented May 4, 2020 • edited Loading

practicalswift commented May 6, 2020 • edited Loading

elichai commented May 7, 2020

practicalswift commented May 22, 2020

practicalswift commented Sep 8, 2020

real-or-random commented Sep 9, 2020

practicalswift commented Nov 30, 2020

elichai commented Nov 30, 2020

sipa commented May 9, 2023 • edited Loading

elichai commented Apr 19, 2020 •

edited

Loading

practicalswift commented Apr 30, 2020 •

edited

Loading

practicalswift commented Apr 30, 2020 •

edited

Loading

elichai commented May 4, 2020 •

edited

Loading

practicalswift commented May 6, 2020 •

edited

Loading

sipa commented May 9, 2023 •

edited

Loading