diff --git a/pages/public/assets/logo_aflplusplus.png b/pages/public/assets/logo_aflplusplus.png new file mode 100644 index 0000000..80cd016 Binary files /dev/null and b/pages/public/assets/logo_aflplusplus.png differ diff --git a/sources/aflplusplus/FAQ.md b/sources/aflplusplus/FAQ.md new file mode 100644 index 0000000..69b3ff5 --- /dev/null +++ b/sources/aflplusplus/FAQ.md @@ -0,0 +1,376 @@ +--- +status: collected +title: "Frequently asked questions (FAQ)" +author: AFLplusplus Community +collector: Souls-R +collected_date: 20240827 +link: https://github.com/AFLplusplus/AFLplusplus/blob/stable/docs/FAQ.md +--- +# Frequently asked questions (FAQ) + +If you find an interesting or important question missing, submit it via +[https://github.com/AFLplusplus/AFLplusplus/discussions](https://github.com/AFLplusplus/AFLplusplus/discussions). + +## General + +
+ What is the difference between AFL and AFL++?

+ + AFL++ is a superior fork to Google's AFL - more speed, more and better + mutations, more and better instrumentation, custom module support, etc. + + American Fuzzy Lop (AFL) was developed by MichaƂ "lcamtuf" Zalewski starting + in 2013/2014, and when he left Google end of 2017 he stopped developing it. + + At the end of 2019, the Google fuzzing team took over maintenance of AFL, + however, it is only accepting PRs from the community and is not developing + enhancements anymore. + + In the second quarter of 2019, 1 1/2 years later, when no further development + of AFL had happened and it became clear there would none be coming, AFL++ was + born, where initially community patches were collected and applied for bug + fixes and enhancements. Then from various AFL spin-offs - mostly academic + research - features were integrated. This already resulted in a much advanced + AFL. + + Until the end of 2019, the AFL++ team had grown to four active developers + which then implemented their own research and features, making it now by far + the most flexible and feature rich guided fuzzer available as open source. And + in independent fuzzing benchmarks it is one of the best fuzzers available, + e.g., + [Fuzzbench Report](https://www.fuzzbench.com/reports/2020-08-03/index.html). +

+ +
+ Is AFL++ a whitebox, graybox, or blackbox fuzzer?

+ + The definition of the terms whitebox, graybox, and blackbox fuzzing varies + from one source to another. For example, "graybox fuzzing" could mean + binary-only or source code fuzzing, or something completely different. + Therefore, we try to avoid them. + + [The Fuzzing Book](https://www.fuzzingbook.org/html/GreyboxFuzzer.html#AFL:-An-Effective-Greybox-Fuzzer) + describes the original AFL to be a graybox fuzzer. In that sense, AFL++ is + also a graybox fuzzer. +

+ +
+ Where can I find tutorials?

+ + We compiled a list of tutorials and exercises, see + [tutorials.md](tutorials.md). +

+ +
+ What is an "edge"?

+ + A program contains `functions`, `functions` contain the compiled machine code. + The compiled machine code in a `function` can be in a single or many `basic + blocks`. A `basic block` is the **largest possible number of subsequent machine + code instructions** that has **exactly one entry point** (which can be be entered by + multiple other basic blocks) and runs linearly **without branching or jumping to + other addresses** (except at the end). + + ``` + function() { + A: + some + code + B: + if (x) goto C; else goto D; + C: + some code + goto E + D: + some code + goto B + E: + return + } + ``` + + Every code block between two jump locations is a `basic block`. + + An `edge` is then the unique relationship between two directly connected + `basic blocks` (from the code example above): + + ``` + Block A + | + v + Block B <------+ + / \ | + v v | + Block C Block D --+ + \ + v + Block E + ``` + + Every line between two blocks is an `edge`. Note that a few basic block loop + to itself, this too would be an edge. +

+ +
+ Should you ever stop afl-fuzz, minimize the corpus and restart?

+ + To stop afl-fuzz, minimize it's corpus and restart you would usually do: + + ``` + Control-C # to terminate afl-fuzz + $ afl-cmin -T nproc -i out/default/queue -o minimized_queue -- ./target + $ AFL_FAST_CAL=1 AFL_CMPLOG_ONLY_NEW=1 afl-fuzz -i minimized_queue -o out2 [other options] -- ./target + ``` + + If this improves fuzzing or not is debated and no consensus has been reached + or in-depth analysis been performed. + + On the pro side: + * The queue/corpus is reduced (up to 20%) by removing intermediate paths + that are maybe not needed anymore. + + On the con side: + * Fuzzing time is lost for the time the fuzzing is stopped, minimized and + restarted. + + The the big question: + * Does a minimized queue/corpus improve finding new coverage or does it + hinder it? + + The AFL++ team's own limited analysis seem to to show that keeping + intermediate paths help to find more coverage, at least for afl-fuzz. + + For honggfuzz in comparison it is a good idea to restart it from time to + time if you have other fuzzers (e.g: AFL++) running in parallel to sync + the finds of other fuzzers to honggfuzz as it has no syncing feature like + AFL++ or libfuzzer. + +

+ +## Targets + +
+ How can I fuzz a binary-only target?

+ + AFL++ is a great fuzzer if you have the source code available. + + However, if there is only the binary program and no source code available, + then the standard non-instrumented mode is not effective. + + To learn how these binaries can be fuzzed, read + [fuzzing_binary-only_targets.md](fuzzing_binary-only_targets.md). +

+ +
+ How can I fuzz a network service?

+ + The short answer is - you cannot, at least not "out of the box". + + For more information on fuzzing network services, see + [best_practices.md#fuzzing-a-network-service](best_practices.md#fuzzing-a-network-service). +

+ +
+ How can I fuzz a GUI program?

+ + Not all GUI programs are suitable for fuzzing. If the GUI program can read the + fuzz data from a file without needing any user interaction, then it would be + suitable for fuzzing. + + For more information on fuzzing GUI programs, see + [best_practices.md#fuzzing-a-gui-program](best_practices.md#fuzzing-a-gui-program). +

+ +## Performance + +
+ What makes a good performance?

+ + Good performance generally means "making the fuzzing results better". This can + be influenced by various factors, for example, speed (finding lots of paths + quickly) or thoroughness (working with decreased speed, but finding better + mutations). +

+ +
+ How can I improve the fuzzing speed?

+ + There are a few things you can do to improve the fuzzing speed, see + [best_practices.md#improving-speed](best_practices.md#improving-speed). +

+ +
+ Why is my stability below 100%?

+ + Stability is measured by how many percent of the edges in the target are + "stable". Sending the same input again and again should take the exact same + path through the target every time. If that is the case, the stability is + 100%. + + If, however, randomness happens, e.g., a thread reading other external data, + reaction to timing, etc., then in some of the re-executions with the same data + the edge coverage result will be different across runs. Those edges that + change are then flagged "unstable". + + The more "unstable" edges there are, the harder it is for AFL++ to identify + valid new paths. + + If you fuzz in persistent mode (`AFL_LOOP` or `LLVMFuzzerTestOneInput()` + harnesses, a large number of unstable edges can mean that the target keeps + internal state and therefore it is possible that crashes cannot be replayed. + In such a case do either **not** fuzz in persistent mode (remove `AFL_LOOP()` + from your harness or call `LLVMFuzzerTestOneInput()` harnesses with `@@`), + or set a low `AFL_LOOP` value, e.g. 100, and enable `AFL_PERSISTENT_RECORD` + in `config.h` with the same value. + + A value above 90% is usually fine and a value above 80% is also still ok, and + even a value above 20% can still result in successful finds of bugs. However, + it is recommended that for values below 90% or 80% you should take + countermeasures to improve stability. + + For more information on stability and how to improve the stability value, see + [best_practices.md#improving-stability](best_practices.md#improving-stability). +

+ +
+ What are power schedules?

+ + Not every item in our queue/corpus is the same, some are more interesting, + others provide little value. + A power schedule measures how "interesting" a value is, and depending on + the calculated value spends more or less time mutating it. + + AFL++ comes with several power schedules, initially ported from + [AFLFast](https://github.com/mboehme/aflfast), however, modified to be more + effective and several more modes added. + + The most effective modes are `-p fast` (default) and `-p explore`. + + If you fuzz with several parallel afl-fuzz instances, then it is beneficial + to assign a different schedule to each instance, however the majority should + be `fast` and `explore`. + + It does not make sense to explain the details of the calculation and + reasoning behind all of the schedules. If you are interested, read the source + code and the AFLFast paper. +

+ +## Troubleshooting + +
+ FATAL: forkserver is already up but an instrumented dlopen library loaded afterwards

+ + It can happen that you see this error on startup when fuzzing a target: + + ``` + [-] FATAL: forkserver is already up, but an instrumented dlopen() library + loaded afterwards. You must AFL_PRELOAD such libraries to be able + to fuzz them or LD_PRELOAD to run outside of afl-fuzz. + To ignore this set AFL_IGNORE_PROBLEMS=1. + ``` + + As the error describes, a dlopen() call is happening in the target that is + loading an instrumented library after the forkserver is already in place. This + is a problem for afl-fuzz because when the forkserver is started, we must know + the map size already and it can't be changed later. + + The best solution is to simply set `AFL_PRELOAD=foo.so` to the libraries that + are dlopen'ed (e.g., use `strace` to see which), or to set a manual forkserver + after the final dlopen(). + + If this is not a viable option, you can set `AFL_IGNORE_PROBLEMS=1` but then + the existing map will be used also for the newly loaded libraries, which + allows it to work, however, the efficiency of the fuzzing will be partially + degraded. Note that there is additionally `AFL_IGNORE_PROBLEMS_COVERAGE` to + additionally tell AFL++ to ignore any coverage from the late loaded libaries. +

+ +
+ I got a weird compile error from clang.

+ + If you see this kind of error when trying to instrument a target with + afl-cc/afl-clang-fast/afl-clang-lto: + + ``` + /prg/tmp/llvm-project/build/bin/clang-13: symbol lookup error: /usr/local/bin/../lib/afl//cmplog-instructions-pass.so: undefined symbol: _ZNK4llvm8TypeSizecvmEv + clang-13: error: unable to execute command: No such file or directory + clang-13: error: clang frontend command failed due to signal (use -v to see invocation) + clang version 13.0.0 (https://github.com/llvm/llvm-project 1d7cf550721c51030144f3cd295c5789d51c4aad) + Target: x86_64-unknown-linux-gnu + Thread model: posix + InstalledDir: /prg/tmp/llvm-project/build/bin + clang-13: note: diagnostic msg: + ******************** + ``` + + Then this means that your OS updated the clang installation from an upgrade + package and because of that the AFL++ llvm plugins do not match anymore. + + Solution: `git pull ; make clean install` of AFL++. +

+ +
+ AFL++ map size warning.

+ + When you run a large instrumented program stand-alone or via afl-showmap + you might see a warning like the following: + + ``` + Warning: AFL++ tools might need to set AFL_MAP_SIZE to 223723 to be able to run this instrumented program if this crashes! + ``` + + Depending how the target works it might also crash afterwards. + + Solution: just do an `export AFL_MAP_SIZE=(the value in the warning)`. +

+ +
+ Linker errors.

+ + If you compile C++ harnesses and see `undefined reference` errors for + variables named `__afl_...`, e.g.: + + ``` + /usr/bin/ld: /tmp/test-d3085f.o: in function `foo::test()': + test.cpp:(.text._ZN3fooL4testEv[_ZN3fooL4testEv]+0x35): undefined reference to `foo::__afl_connected' + clang: error: linker command failed with exit code 1 (use -v to see invocation) + ``` + + Then you use AFL++ macros like `__AFL_LOOP` within a namespace and this + will not work. + + Solution: Move that harness portion to the global namespace, e.g. before: + ``` + #include + namespace foo { + static void test() { + while(__AFL_LOOP(1000)) { + foo::function(); + } + } + } + + int main(int argc, char** argv) { + foo::test(); + return 0; + } + ``` + after: + ``` + #include + static void mytest() { + while(__AFL_LOOP(1000)) { + foo::function(); + } + } + namespace foo { + static void test() { + mytest(); + } + } + int main(int argc, char** argv) { + foo::test(); + return 0; + } + ``` +

diff --git a/sources/aflplusplus/INSTALL.md b/sources/aflplusplus/INSTALL.md new file mode 100644 index 0000000..533c763 --- /dev/null +++ b/sources/aflplusplus/INSTALL.md @@ -0,0 +1,180 @@ +--- +status: collected +title: "Building and installing AFL++" +author: AFLplusplus Community +collector: Souls-R +collected_date: 20240827 +link: https://github.com/AFLplusplus/AFLplusplus/blob/stable/docs/INSTALL.md +--- +# Building and installing AFL++ + +## Linux on x86 + +An easy way to install AFL++ with everything compiled is available via docker: +You can use the [Dockerfile](../Dockerfile) or just pull directly from the +Docker Hub (for x86_64 and arm64): + +```shell +docker pull aflplusplus/aflplusplus:latest +docker run -ti -v /location/of/your/target:/src aflplusplus/aflplusplus +``` + +This image is automatically generated when a push to the stable branch happens. +You will find your target source code in `/src` in the container. + +Note: you can also pull `aflplusplus/aflplusplus:dev` which is the most current +development state of AFL++. + +If you want to build AFL++ yourself, you have many options. The easiest choice +is to build and install everything: + +NOTE: depending on your Debian/Ubuntu/Kali/... release, replace `-14` with +whatever llvm version is available. We recommend llvm 13 or newer. + +```shell +sudo apt-get update +sudo apt-get install -y build-essential python3-dev automake cmake git flex bison libglib2.0-dev libpixman-1-dev python3-setuptools cargo libgtk-3-dev +# try to install llvm 14 and install the distro default if that fails +sudo apt-get install -y lld-14 llvm-14 llvm-14-dev clang-14 || sudo apt-get install -y lld llvm llvm-dev clang +sudo apt-get install -y gcc-$(gcc --version|head -n1|sed 's/\..*//'|sed 's/.* //')-plugin-dev libstdc++-$(gcc --version|head -n1|sed 's/\..*//'|sed 's/.* //')-dev +sudo apt-get install -y ninja-build # for QEMU mode +sudo apt-get install -y cpio libcapstone-dev # for Nyx mode +sudo apt-get install -y wget curl # for Frida mode +sudo apt-get install python3-pip # for Unicorn mode +git clone https://github.com/AFLplusplus/AFLplusplus +cd AFLplusplus +make distrib +sudo make install +``` + +It is recommended to install the newest available gcc, clang and llvm-dev +possible in your distribution! + +Note that `make distrib` also builds FRIDA mode, QEMU mode, unicorn_mode, and +more. If you just want plain AFL++, then do `make all`. If you want some +assisting tooling compiled but are not interested in binary-only targets, then +instead choose: + +```shell +make source-only +``` + +These build targets exist: + +* all: the main AFL++ binaries and llvm/gcc instrumentation +* binary-only: everything for binary-only fuzzing: frida_mode, nyx_mode, + qemu_mode, frida_mode, unicorn_mode, coresight_mode, libdislocator, + libtokencap +* source-only: everything for source code fuzzing: nyx_mode, libdislocator, + libtokencap +* distrib: everything (for both binary-only and source code fuzzing) +* man: creates simple man pages from the help option of the programs +* install: installs everything you have compiled with the build options above +* clean: cleans everything compiled, not downloads (unless not on a checkout) +* deepclean: cleans everything including downloads +* code-format: format the code, do this before you commit and send a PR please! +* tests: runs test cases to ensure that all features are still working as they + should +* unit: perform unit tests (based on cmocka) +* help: shows these build options + +[Unless you are on macOS](https://developer.apple.com/library/archive/qa/qa1118/_index.html), +you can also build statically linked versions of the AFL++ binaries by passing +the `PERFORMANCE=1` argument to make: + +```shell +make PERFORMANCE=1 +``` + +These build options exist: + +* PERFORMANCE - compile with performance options that make the binary not transferable to other systems. Recommended (except on macOS)! +* STATIC - compile AFL++ static (does not work on macOS) +* CODE_COVERAGE - compile the target for code coverage (see [README.llvm.md](../instrumentation/README.llvm.md)) +* ASAN_BUILD - compiles AFL++ with address sanitizer for debug purposes +* UBSAN_BUILD - compiles AFL++ tools with undefined behaviour sanitizer for debug purposes +* DEBUG - no optimization, -ggdb3, all warnings and -Werror +* LLVM_DEBUG - shows llvm deprecation warnings +* PROFILING - compile afl-fuzz with profiling information +* INTROSPECTION - compile afl-fuzz with mutation introspection +* NO_PYTHON - disable python support +* NO_SPLICING - disables splicing mutation in afl-fuzz, not recommended for normal fuzzing +* NO_UTF - do not use UTF-8 for line rendering in status screen (fallback to G1 box drawing, of vanilla AFL) +* NO_NYX - disable building nyx mode dependencies +* NO_CORESIGHT - disable building coresight (arm64 only) +* NO_UNICORN_ARM64 - disable building unicorn on arm64 +* AFL_NO_X86 - if compiling on non-Intel/AMD platforms +* LLVM_CONFIG - if your distro doesn't use the standard name for llvm-config (e.g., Debian) + +e.g.: `make LLVM_CONFIG=llvm-config-14` + +## macOS on x86_64 and arm64 + +macOS has some gotchas due to the idiosyncrasies of the platform. + +macOS supports SYSV shared memory used by AFL++'s instrumentation, but the +default settings aren't sufficient. Before even building, increase +them by running the provided script: + +```shell +sudo afl-system-config +``` + +See +[https://www.spy-hill.com/help/apple/SharedMemory.html](https://www.spy-hill.com/help/apple/SharedMemory.html) +for documentation for the shared memory settings and how to make them permanent. + +Next, to build AFL++, install the following packages from brew: + +```shell +brew install wget git make cmake llvm gdb coreutils +``` + +Depending on your macOS system + brew version, brew may be installed in different places. +You can check with `brew info llvm` to know where, then create a variable for it: + +```shell +export HOMEBREW_BASE="/opt/homebrew/opt" +``` + +or + +```shell +export HOMEBREW_BASE="/usr/local/opt" +``` + +Set `PATH` to point to the brew clang, clang++, llvm-config, gmake and coreutils. +Also use the brew clang compiler; the Xcode clang compiler must not be used. + +```shell +export PATH="$HOMEBREW_BASE/coreutils/libexec/gnubin:/usr/local/bin:$HOMEBREW_BASE/llvm/bin:$PATH" +export CC=clang +export CXX=clang++ +``` + +Then build following the general Linux instructions. + +If everything worked, you should then have `afl-clang-fast` installed, which you can check with: + +```shell +which afl-clang-fast +``` + +Note that `afl-clang-lto`, `afl-gcc-fast` and `qemu_mode` are not working on macOS. + +The crash reporting daemon that comes by default with macOS will cause +problems with fuzzing. You need to turn it off, which you can do with `afl-system-config`. + +The `fork()` semantics on macOS are a bit unusual compared to other unix systems +and definitely don't look POSIX-compliant. This means two things: + + - Fuzzing will be probably slower than on Linux. In fact, some folks report + considerable performance gains by running the jobs inside a Linux VM on + macOS. + - Some non-portable, platform-specific code may be incompatible with the AFL++ + forkserver. If you run into any problems, set `AFL_NO_FORKSRV=1` in the + environment before starting afl-fuzz. + +User emulation mode of QEMU does not appear to be supported on macOS, so +black-box instrumentation mode (`-Q`) will not work. However, FRIDA mode (`-O`) +works on both x86 and arm64 macOS boxes. diff --git a/sources/aflplusplus/afl-fuzz_approach.md b/sources/aflplusplus/afl-fuzz_approach.md new file mode 100644 index 0000000..5d9e716 --- /dev/null +++ b/sources/aflplusplus/afl-fuzz_approach.md @@ -0,0 +1,556 @@ +--- +status: collected +title: "The afl-fuzz approach" +author: AFLplusplus Community +collector: Souls-R +collected_date: 20240827 +link: https://github.com/AFLplusplus/AFLplusplus/blob/stable/docs/afl-fuzz_approach.md +--- +# The afl-fuzz approach + +AFL++ is a brute-force fuzzer coupled with an exceedingly simple but rock-solid +instrumentation-guided genetic algorithm. It uses a modified form of edge +coverage to effortlessly pick up subtle, local-scale changes to program control +flow. + +Note: If you are interested in a more current up-to-date deep dive how AFL++ +works then we commend this blog post: +[https://blog.ritsec.club/posts/afl-under-hood/](https://blog.ritsec.club/posts/afl-under-hood/) + +Simplifying a bit, the overall algorithm can be summed up as: + +1) Load user-supplied initial test cases into the queue. + +2) Take the next input file from the queue. + +3) Attempt to trim the test case to the smallest size that doesn't alter the + measured behavior of the program. + +4) Repeatedly mutate the file using a balanced and well-researched variety of + traditional fuzzing strategies. + +5) If any of the generated mutations resulted in a new state transition recorded + by the instrumentation, add mutated output as a new entry in the queue. + +6) Go to 2. + +The discovered test cases are also periodically culled to eliminate ones that +have been obsoleted by newer, higher-coverage finds; and undergo several other +instrumentation-driven effort minimization steps. + +As a side result of the fuzzing process, the tool creates a small, +self-contained corpus of interesting test cases. These are extremely useful for +seeding other, labor- or resource-intensive testing regimes - for example, for +stress-testing browsers, office applications, graphics suites, or closed-source +tools. + +The fuzzer is thoroughly tested to deliver out-of-the-box performance far +superior to blind fuzzing or coverage-only tools. + +## Understanding the status screen + +This section provides an overview of the status screen - plus tips for +troubleshooting any warnings and red text shown in the UI. + +For the general instruction manual, see [README.md](README.md). + +### A note about colors + +The status screen and error messages use colors to keep things readable and +attract your attention to the most important details. For example, red almost +always means "consult this doc" :-) + +Unfortunately, the UI will only render correctly if your terminal is using +traditional un*x palette (white text on black background) or something close to +that. + +If you are using inverse video, you may want to change your settings, say: + +- For GNOME Terminal, go to `Edit > Profile` preferences, select the "colors" + tab, and from the list of built-in schemes, choose "white on black". +- For the MacOS X Terminal app, open a new window using the "Pro" scheme via the + `Shell > New Window` menu (or make "Pro" your default). + +Alternatively, if you really like your current colors, you can edit config.h to +comment out USE_COLORS, then do `make clean all`. + +We are not aware of any other simple way to make this work without causing other +side effects - sorry about that. + +With that out of the way, let's talk about what's actually on the screen... + +### The status bar + +``` +american fuzzy lop ++3.01a (default) [fast] {0} +``` + +The top line shows you which mode afl-fuzz is running in (normal: "american +fuzzy lop", crash exploration mode: "peruvian rabbit mode") and the version of +AFL++. Next to the version is the banner, which, if not set with -T by hand, +will either show the binary name being fuzzed, or the -M/-S main/secondary name +for parallel fuzzing. Second to last is the power schedule mode being run +(default: fast). Finally, the last item is the CPU id. + +### Process timing + +``` + +----------------------------------------------------+ + | run time : 0 days, 8 hrs, 32 min, 43 sec | + | last new find : 0 days, 0 hrs, 6 min, 40 sec | + | last uniq crash : none seen yet | + | last uniq hang : 0 days, 1 hrs, 24 min, 32 sec | + +----------------------------------------------------+ +``` + +This section is fairly self-explanatory: it tells you how long the fuzzer has +been running and how much time has elapsed since its most recent finds. This is +broken down into "paths" (a shorthand for test cases that trigger new execution +patterns), crashes, and hangs. + +When it comes to timing: there is no hard rule, but most fuzzing jobs should be +expected to run for days or weeks; in fact, for a moderately complex project, +the first pass will probably take a day or so. Every now and then, some jobs +will be allowed to run for months. + +There's one important thing to watch out for: if the tool is not finding new +paths within several minutes of starting, you're probably not invoking the +target binary correctly and it never gets to parse the input files that are +thrown at it; other possible explanations are that the default memory limit +(`-m`) is too restrictive and the program exits after failing to allocate a +buffer very early on; or that the input files are patently invalid and always +fail a basic header check. + +If there are no new paths showing up for a while, you will eventually see a big +red warning in this section, too :-) + +### Overall results + +``` + +-----------------------+ + | cycles done : 0 | + | total paths : 2095 | + | uniq crashes : 0 | + | uniq hangs : 19 | + +-----------------------+ +``` + +The first field in this section gives you the count of queue passes done so far +- that is, the number of times the fuzzer went over all the interesting test + cases discovered so far, fuzzed them, and looped back to the very beginning. + Every fuzzing session should be allowed to complete at least one cycle; and + ideally, should run much longer than that. + +As noted earlier, the first pass can take a day or longer, so sit back and +relax. + +To help make the call on when to hit `Ctrl-C`, the cycle counter is color-coded. +It is shown in magenta during the first pass, progresses to yellow if new finds +are still being made in subsequent rounds, then blue when that ends - and +finally, turns green after the fuzzer hasn't been seeing any action for a longer +while. + +The remaining fields in this part of the screen should be pretty obvious: +there's the number of test cases ("paths") discovered so far, and the number of +unique faults. The test cases, crashes, and hangs can be explored in real-time +by browsing the output directory, see +[#interpreting-output](#interpreting-output). + +### Cycle progress + +``` + +-------------------------------------+ + | now processing : 1296 (61.86%) | + | paths timed out : 0 (0.00%) | + +-------------------------------------+ +``` + +This box tells you how far along the fuzzer is with the current queue cycle: it +shows the ID of the test case it is currently working on, plus the number of +inputs it decided to ditch because they were persistently timing out. + +The "*" suffix sometimes shown in the first line means that the currently +processed path is not "favored" (a property discussed later on). + +### Map coverage + +``` + +--------------------------------------+ + | map density : 10.15% / 29.07% | + | count coverage : 4.03 bits/tuple | + +--------------------------------------+ +``` + +The section provides some trivia about the coverage observed by the +instrumentation embedded in the target binary. + +The first line in the box tells you how many branch tuples already were hit, in +proportion to how much the bitmap can hold. The number on the left describes the +current input; the one on the right is the value for the entire input corpus. + +Be wary of extremes: + +- Absolute numbers below 200 or so suggest one of three things: that the program + is extremely simple; that it is not instrumented properly (e.g., due to being + linked against a non-instrumented copy of the target library); or that it is + bailing out prematurely on your input test cases. The fuzzer will try to mark + this in pink, just to make you aware. +- Percentages over 70% may very rarely happen with very complex programs that + make heavy use of template-generated code. Because high bitmap density makes + it harder for the fuzzer to reliably discern new program states, we recommend + recompiling the binary with `AFL_INST_RATIO=10` or so and trying again (see + [env_variables.md](env_variables.md)). The fuzzer will flag high percentages + in red. Chances are, you will never see that unless you're fuzzing extremely + hairy software (say, v8, perl, ffmpeg). + +The other line deals with the variability in tuple hit counts seen in the +binary. In essence, if every taken branch is always taken a fixed number of +times for all the inputs that were tried, this will read `1.00`. As we manage to +trigger other hit counts for every branch, the needle will start to move toward +`8.00` (every bit in the 8-bit map hit), but will probably never reach that +extreme. + +Together, the values can be useful for comparing the coverage of several +different fuzzing jobs that rely on the same instrumented binary. + +### Stage progress + +``` + +-------------------------------------+ + | now trying : interest 32/8 | + | stage execs : 3996/34.4k (11.62%) | + | total execs : 27.4M | + | exec speed : 891.7/sec | + +-------------------------------------+ +``` + +This part gives you an in-depth peek at what the fuzzer is actually doing right +now. It tells you about the current stage, which can be any of: + +- calibration - a pre-fuzzing stage where the execution path is examined to + detect anomalies, establish baseline execution speed, and so on. Executed very + briefly whenever a new find is being made. +- trim L/S - another pre-fuzzing stage where the test case is trimmed to the + shortest form that still produces the same execution path. The length (L) and + stepover (S) are chosen in general relationship to file size. +- bitflip L/S - deterministic bit flips. There are L bits toggled at any given + time, walking the input file with S-bit increments. The current L/S variants + are: `1/1`, `2/1`, `4/1`, `8/8`, `16/8`, `32/8`. +- arith L/8 - deterministic arithmetics. The fuzzer tries to subtract or add + small integers to 8-, 16-, and 32-bit values. The stepover is always 8 bits. +- interest L/8 - deterministic value overwrite. The fuzzer has a list of known + "interesting" 8-, 16-, and 32-bit values to try. The stepover is 8 bits. +- extras - deterministic injection of dictionary terms. This can be shown as + "user" or "auto", depending on whether the fuzzer is using a user-supplied + dictionary (`-x`) or an auto-created one. You will also see "over" or + "insert", depending on whether the dictionary words overwrite existing data or + are inserted by offsetting the remaining data to accommodate their length. +- havoc - a sort-of-fixed-length cycle with stacked random tweaks. The + operations attempted during this stage include bit flips, overwrites with + random and "interesting" integers, block deletion, block duplication, plus + assorted dictionary-related operations (if a dictionary is supplied in the + first place). +- splice - a last-resort strategy that kicks in after the first full queue cycle + with no new paths. It is equivalent to 'havoc', except that it first splices + together two random inputs from the queue at some arbitrarily selected + midpoint. +- sync - a stage used only when `-M` or `-S` is set (see + [fuzzing_in_depth.md:3c) Using multiple cores](fuzzing_in_depth.md#c-using-multiple-cores)). + No real fuzzing is involved, but the tool scans the output from other fuzzers + and imports test cases as necessary. The first time this is done, it may take + several minutes or so. + +The remaining fields should be fairly self-evident: there's the exec count +progress indicator for the current stage, a global exec counter, and a benchmark +for the current program execution speed. This may fluctuate from one test case +to another, but the benchmark should be ideally over 500 execs/sec most of the +time - and if it stays below 100, the job will probably take very long. + +The fuzzer will explicitly warn you about slow targets, too. If this happens, +see the [best_practices.md#improving-speed](best_practices.md#improving-speed) +for ideas on how to speed things up. + +### Findings in depth + +``` + +--------------------------------------+ + | favored paths : 879 (41.96%) | + | new edges on : 423 (20.19%) | + | total crashes : 0 (0 unique) | + | total tmouts : 24 (19 unique) | + +--------------------------------------+ +``` + +This gives you several metrics that are of interest mostly to complete nerds. +The section includes the number of paths that the fuzzer likes the most based on +a minimization algorithm baked into the code (these will get considerably more +air time), and the number of test cases that actually resulted in better edge +coverage (versus just pushing the branch hit counters up). There are also +additional, more detailed counters for crashes and timeouts. + +Note that the timeout counter is somewhat different from the hang counter; this +one includes all test cases that exceeded the timeout, even if they did not +exceed it by a margin sufficient to be classified as hangs. + +### Fuzzing strategy yields + +``` + +-----------------------------------------------------+ + | bit flips : 57/289k, 18/289k, 18/288k | + | byte flips : 0/36.2k, 4/35.7k, 7/34.6k | + | arithmetics : 53/2.54M, 0/537k, 0/55.2k | + | known ints : 8/322k, 12/1.32M, 10/1.70M | + | dictionary : 9/52k, 1/53k, 1/24k | + |havoc/splice : 1903/20.0M, 0/0 | + |py/custom/rq : unused, 53/2.54M, unused | + | trim/eff : 20.31%/9201, 17.05% | + +-----------------------------------------------------+ +``` + +This is just another nerd-targeted section keeping track of how many paths were +netted, in proportion to the number of execs attempted, for each of the fuzzing +strategies discussed earlier on. This serves to convincingly validate +assumptions about the usefulness of the various approaches taken by afl-fuzz. + +The trim strategy stats in this section are a bit different than the rest. The +first number in this line shows the ratio of bytes removed from the input files; +the second one corresponds to the number of execs needed to achieve this goal. +Finally, the third number shows the proportion of bytes that, although not +possible to remove, were deemed to have no effect and were excluded from some of +the more expensive deterministic fuzzing steps. + +Note that when deterministic mutation mode is off (which is the default because +it is not very efficient) the first five lines display "disabled (default, +enable with -D)". + +Only what is activated will have counter shown. + +### Path geometry + +``` + +---------------------+ + | levels : 5 | + | pending : 1570 | + | pend fav : 583 | + | own finds : 0 | + | imported : 0 | + | stability : 100.00% | + +---------------------+ +``` + +The first field in this section tracks the path depth reached through the guided +fuzzing process. In essence: the initial test cases supplied by the user are +considered "level 1". The test cases that can be derived from that through +traditional fuzzing are considered "level 2"; the ones derived by using these as +inputs to subsequent fuzzing rounds are "level 3"; and so forth. The maximum +depth is therefore a rough proxy for how much value you're getting out of the +instrumentation-guided approach taken by afl-fuzz. + +The next field shows you the number of inputs that have not gone through any +fuzzing yet. The same stat is also given for "favored" entries that the fuzzer +really wants to get to in this queue cycle (the non-favored entries may have to +wait a couple of cycles to get their chance). + +Next is the number of new paths found during this fuzzing section and imported +from other fuzzer instances when doing parallelized fuzzing; and the extent to +which identical inputs appear to sometimes produce variable behavior in the +tested binary. + +That last bit is actually fairly interesting: it measures the consistency of +observed traces. If a program always behaves the same for the same input data, +it will earn a score of 100%. When the value is lower but still shown in purple, +the fuzzing process is unlikely to be negatively affected. If it goes into red, +you may be in trouble, since AFL++ will have difficulty discerning between +meaningful and "phantom" effects of tweaking the input file. + +Now, most targets will just get a 100% score, but when you see lower figures, +there are several things to look at: + +- The use of uninitialized memory in conjunction with some intrinsic sources of + entropy in the tested binary. Harmless to AFL, but could be indicative of a + security bug. +- Attempts to manipulate persistent resources, such as left over temporary files + or shared memory objects. This is usually harmless, but you may want to + double-check to make sure the program isn't bailing out prematurely. Running + out of disk space, SHM handles, or other global resources can trigger this, + too. +- Hitting some functionality that is actually designed to behave randomly. + Generally harmless. For example, when fuzzing sqlite, an input like `select + random();` will trigger a variable execution path. +- Multiple threads executing at once in semi-random order. This is harmless when + the 'stability' metric stays over 90% or so, but can become an issue if not. + Here's what to try: + * Use afl-clang-fast from [instrumentation](../instrumentation/) - it uses a + thread-local tracking model that is less prone to concurrency issues, + * See if the target can be compiled or run without threads. Common + `./configure` options include `--without-threads`, `--disable-pthreads`, or + `--disable-openmp`. + * Replace pthreads with GNU Pth (https://www.gnu.org/software/pth/), which + allows you to use a deterministic scheduler. +- In persistent mode, minor drops in the "stability" metric can be normal, + because not all the code behaves identically when re-entered; but major dips + may signify that the code within `__AFL_LOOP()` is not behaving correctly on + subsequent iterations (e.g., due to incomplete clean-up or reinitialization of + the state) and that most of the fuzzing effort goes to waste. + +The paths where variable behavior is detected are marked with a matching entry +in the `/queue/.state/variable_behavior/` directory, so you can look +them up easily. + +### CPU load + +``` + [cpu: 25%] +``` + +This tiny widget shows the apparent CPU utilization on the local system. It is +calculated by taking the number of processes in the "runnable" state, and then +comparing it to the number of logical cores on the system. + +If the value is shown in green, you are using fewer CPU cores than available on +your system and can probably parallelize to improve performance; for tips on how +to do that, see +[fuzzing_in_depth.md:3c) Using multiple cores](fuzzing_in_depth.md#c-using-multiple-cores). + +If the value is shown in red, your CPU is *possibly* oversubscribed, and running +additional fuzzers may not give you any benefits. + +Of course, this benchmark is very simplistic; it tells you how many processes +are ready to run, but not how resource-hungry they may be. It also doesn't +distinguish between physical cores, logical cores, and virtualized CPUs; the +performance characteristics of each of these will differ quite a bit. + +If you want a more accurate measurement, you can run the `afl-gotcpu` utility +from the command line. + +## Interpreting output + +See [#understanding-the-status-screen](#understanding-the-status-screen) for +information on how to interpret the displayed stats and monitor the health of +the process. Be sure to consult this file especially if any UI elements are +highlighted in red. + +The fuzzing process will continue until you press Ctrl-C. At a minimum, you want +to allow the fuzzer to at least one queue cycle without any new finds, which may +take anywhere from a couple of hours to a week or so. + +There are three subdirectories created within the output directory and updated +in real-time: + +- queue/ - test cases for every distinctive execution path, plus all the + starting files given by the user. This is the synthesized corpus. + + Before using this corpus for any other purposes, you can shrink + it to a smaller size using the afl-cmin tool. The tool will find + a smaller subset of files offering equivalent edge coverage. + +- crashes/ - unique test cases that cause the tested program to receive a fatal + signal (e.g., SIGSEGV, SIGILL, SIGABRT). The entries are grouped by + the received signal. + +- hangs/ - unique test cases that cause the tested program to time out. The + default time limit before something is classified as a hang is the + larger of 1 second and the value of the -t parameter. The value can + be fine-tuned by setting AFL_HANG_TMOUT, but this is rarely + necessary. + +Crashes and hangs are considered "unique" if the associated execution paths +involve any state transitions not seen in previously-recorded faults. If a +single bug can be reached in multiple ways, there will be some count inflation +early in the process, but this should quickly taper off. + +The file names for crashes and hangs are correlated with the parent, +non-faulting queue entries. This should help with debugging. + +## Visualizing + +If you have gnuplot installed, you can also generate some pretty graphs for any +active fuzzing task using afl-plot. For an example of how this looks like, see +[https://lcamtuf.coredump.cx/afl/plot/](https://lcamtuf.coredump.cx/afl/plot/). + +You can also manually build and install afl-plot-ui, which is a helper utility +for showing the graphs generated by afl-plot in a graphical window using GTK. +You can build and install it as follows: + +```shell +sudo apt install libgtk-3-0 libgtk-3-dev pkg-config +cd utils/plot_ui +make +cd ../../ +sudo make install +``` + +To learn more about remote monitoring and metrics visualization with StatsD, see +[rpc_statsd.md](rpc_statsd.md). + +### Addendum: status and plot files + +For unattended operation, some of the key status screen information can be also +found in a machine-readable format in the fuzzer_stats file in the output +directory. This includes: + +- `start_time` - unix time indicating the start time of afl-fuzz +- `last_update` - unix time corresponding to the last update of this file +- `run_time` - run time in seconds to the last update of this file +- `fuzzer_pid` - PID of the fuzzer process +- `cycles_done` - queue cycles completed so far +- `cycles_wo_finds` - number of cycles without any new paths found +- `time_wo_finds` - longest time in seconds no new path was found +- `execs_done` - number of execve() calls attempted +- `execs_per_sec` - overall number of execs per second +- `corpus_count` - total number of entries in the queue +- `corpus_favored` - number of queue entries that are favored +- `corpus_found` - number of entries discovered through local fuzzing +- `corpus_imported` - number of entries imported from other instances +- `max_depth` - number of levels in the generated data set +- `cur_item` - currently processed entry number +- `pending_favs` - number of favored entries still waiting to be fuzzed +- `pending_total` - number of all entries waiting to be fuzzed +- `corpus_variable` - number of test cases showing variable behavior +- `stability` - percentage of bitmap bytes that behave consistently +- `bitmap_cvg` - percentage of edge coverage found in the map so far +- `saved_crashes` - number of unique crashes recorded +- `saved_hangs` - number of unique hangs encountered +- `last_find` - seconds since the last find was found +- `last_crash` - seconds since the last crash was found +- `last_hang` - seconds since the last hang was found +- `execs_since_crash` - execs since the last crash was found +- `exec_timeout` - the -t command line value +- `slowest_exec_ms` - real time of the slowest execution in ms +- `peak_rss_mb` - max rss usage reached during fuzzing in MB +- `edges_found` - how many edges have been found +- `var_byte_count` - how many edges are non-deterministic +- `afl_banner` - banner text (e.g., the target name) +- `afl_version` - the version of AFL++ used +- `target_mode` - default, persistent, qemu, unicorn, non-instrumented +- `command_line` - full command line used for the fuzzing session + +Most of these map directly to the UI elements discussed earlier on. + +On top of that, you can also find an entry called `plot_data`, containing a +plottable history for most of these fields. If you have gnuplot installed, you +can turn this into a nice progress report with the included `afl-plot` tool. + +### Addendum: automatically sending metrics with StatsD + +In a CI environment or when running multiple fuzzers, it can be tedious to log +into each of them or deploy scripts to read the fuzzer statistics. Using +`AFL_STATSD` (and the other related environment variables `AFL_STATSD_HOST`, +`AFL_STATSD_PORT`, `AFL_STATSD_TAGS_FLAVOR`) you can automatically send metrics +to your favorite StatsD server. Depending on your StatsD server, you will be +able to monitor, trigger alerts, or perform actions based on these metrics +(e.g.: alert on slow exec/s for a new build, threshold of crashes, time since +last crash > X, etc.). + +The selected metrics are a subset of all the metrics found in the status and in +the plot file. The list is the following: `cycle_done`, `cycles_wo_finds`, +`execs_done`,`execs_per_sec`, `corpus_count`, `corpus_favored`, `corpus_found`, +`corpus_imported`, `max_depth`, `cur_item`, `pending_favs`, `pending_total`, +`corpus_variable`, `saved_crashes`, `saved_hangs`, `total_crashes`, +`slowest_exec_ms`, `edges_found`, `var_byte_count`, `havoc_expansion`. Their +definitions can be found in the addendum above. + +When using multiple fuzzer instances with StatsD, it is *strongly* recommended +to setup the flavor (`AFL_STATSD_TAGS_FLAVOR`) to match your StatsD server. This +will allow you to see individual fuzzer performance, detect bad ones, see the +progress of each strategy... \ No newline at end of file diff --git a/sources/aflplusplus/fuzzing_binary-only_targets.md b/sources/aflplusplus/fuzzing_binary-only_targets.md new file mode 100644 index 0000000..2724b39 --- /dev/null +++ b/sources/aflplusplus/fuzzing_binary-only_targets.md @@ -0,0 +1,312 @@ +--- +status: collected +title: "Fuzzing binary-only targets" +author: AFLplusplus Community +collector: Souls-R +collected_date: 20240827 +link: https://github.com/AFLplusplus/AFLplusplus/blob/stable/docs/fuzzing_binary-only_targets.md +--- +# Fuzzing binary-only targets + +AFL++, libfuzzer, and other fuzzers are great if you have the source code of the +target. This allows for very fast and coverage guided fuzzing. + +However, if there is only the binary program and no source code available, then +standard `afl-fuzz -n` (non-instrumented mode) is not effective. + +For fast, on-the-fly instrumentation of black-box binaries, AFL++ still offers +various support. The following is a description of how these binaries can be +fuzzed with AFL++. + +## TL;DR: + +FRIDA mode and QEMU mode in persistent mode are the fastest - if persistent mode +is possible and the stability is high enough. + +Otherwise, try Zafl, RetroWrite, Dyninst, and if these fail, too, then try +standard FRIDA/QEMU mode with `AFL_ENTRYPOINT` to where you need it. + +If your target is non-linux, then use unicorn_mode. + +## Fuzzing binary-only targets with AFL++ + +### QEMU mode + +QEMU mode is the "native" solution to the program. It is available in the +./qemu_mode/ directory and, once compiled, it can be accessed by the afl-fuzz -Q +command line option. It is the easiest to use alternative and even works for +cross-platform binaries. + +For linux programs and its libraries, this is accomplished with a version of +QEMU running in the lesser-known "user space emulation" mode. QEMU is a project +separate from AFL++, but you can conveniently build the feature by doing: + +```shell +cd qemu_mode +./build_qemu_support.sh +``` + +The following setup to use QEMU mode is recommended: + +* run 1 afl-fuzz -Q instance with CMPLOG (`-c 0` + `AFL_COMPCOV_LEVEL=2`) +* run 1 afl-fuzz -Q instance with QASAN (`AFL_USE_QASAN=1`) +* run 1 afl-fuzz -Q instance with LAF (`AFL_PRELOAD=libcmpcov.so` + + `AFL_COMPCOV_LEVEL=2`), alternatively you can use FRIDA mode, just switch `-Q` + with `-O` and remove the LAF instance + +Then run as many instances as you have cores left with either -Q mode or - even +better - use a binary rewriter like Dyninst, RetroWrite, ZAFL, etc. +The binary rewriters all have their own advantages and caveats. +ZAFL is the best but cannot be used in a business/commercial context. + +If a binary rewriter works for your target then you can use afl-fuzz normally +and it will have twice the speed compared to QEMU mode (but slower than QEMU +persistent mode). + +The speed decrease of QEMU mode is at about 50%. However, various options exist +to increase the speed: +- using AFL_ENTRYPOINT to move the forkserver entry to a later basic block in + the binary (+5-10% speed) +- using persistent mode + [qemu_mode/README.persistent.md](../qemu_mode/README.persistent.md) this will + result in a 150-300% overall speed increase - so 3-8x the original QEMU mode + speed! +- using AFL_CODE_START/AFL_CODE_END to only instrument specific parts + +For additional instructions and caveats, see +[qemu_mode/README.md](../qemu_mode/README.md). If possible, you should use the +persistent mode, see +[qemu_mode/README.persistent.md](../qemu_mode/README.persistent.md). The mode is +approximately 2-5x slower than compile-time instrumentation, and is less +conducive to parallelization. + +Note that there is also honggfuzz: +[https://github.com/google/honggfuzz](https://github.com/google/honggfuzz) which +now has a QEMU mode, but its performance is just 1.5% ... + +If you like to code a customized fuzzer without much work, we highly recommend +to check out our sister project libafl which supports QEMU, too: +[https://github.com/AFLplusplus/LibAFL](https://github.com/AFLplusplus/LibAFL) + +### WINE+QEMU + +Wine mode can run Win32 PE binaries with the QEMU instrumentation. It needs +Wine, python3, and the pefile python package installed. + +It is included in AFL++. + +For more information, see +[qemu_mode/README.wine.md](../qemu_mode/README.wine.md). + +### FRIDA mode + +In FRIDA mode, you can fuzz binary-only targets as easily as with QEMU mode. +FRIDA mode is most of the times slightly faster than QEMU mode. It is also +newer, and has the advantage that it works on MacOS (both intel and M1). + +To build FRIDA mode: + +```shell +cd frida_mode +gmake +``` + +For additional instructions and caveats, see +[frida_mode/README.md](../frida_mode/README.md). + +If possible, you should use the persistent mode, see +[instrumentation/README.persistent_mode.md](../instrumentation/README.persistent_mode.md). +The mode is approximately 2-5x slower than compile-time instrumentation, and is +less conducive to parallelization. But for binary-only fuzzing, it gives a huge +speed improvement if it is possible to use. + +You can also perform remote fuzzing with frida, e.g., if you want to fuzz on +iPhone or Android devices, for this you can use +[https://github.com/ttdennis/fpicker/](https://github.com/ttdennis/fpicker/) as +an intermediate that uses AFL++ for fuzzing. + +If you like to code a customized fuzzer without much work, we highly recommend +to check out our sister project libafl which supports Frida, too: +[https://github.com/AFLplusplus/LibAFL](https://github.com/AFLplusplus/LibAFL). +Working examples already exist :-) + +### Nyx mode + +Nyx is a full system emulation fuzzing environment with snapshot support that is +built upon KVM and QEMU. It is only available on Linux and currently restricted +to x86_x64. + +For binary-only fuzzing a special 5.10 kernel is required. + +See [nyx_mode/README.md](../nyx_mode/README.md). + +### Unicorn + +Unicorn is a fork of QEMU. The instrumentation is, therefore, very similar. In +contrast to QEMU, Unicorn does not offer a full system or even userland +emulation. Runtime environment and/or loaders have to be written from scratch, +if needed. On top, block chaining has been removed. This means the speed boost +introduced in the patched QEMU Mode of AFL++ cannot be ported over to Unicorn. + +For non-Linux binaries, you can use AFL++'s unicorn_mode which can emulate +anything you want - for the price of speed and user written scripts. + +To build unicorn_mode: + +```shell +cd unicorn_mode +./build_unicorn_support.sh +``` + +For further information, check out +[unicorn_mode/README.md](../unicorn_mode/README.md). + +### Shared libraries + +If the goal is to fuzz a dynamic library, then there are two options available. +For both, you need to write a small harness that loads and calls the library. +Then you fuzz this with either FRIDA mode or QEMU mode and either use +`AFL_INST_LIBS=1` or `AFL_QEMU/FRIDA_INST_RANGES`. + +Another, less precise and slower option is to fuzz it with utils/afl_untracer/ +and use afl-untracer.c as a template. It is slower than FRIDA mode. + +For more information, see +[utils/afl_untracer/README.md](../utils/afl_untracer/README.md). + +### Coresight + +Coresight is ARM's answer to Intel's PT. With AFL++ v3.15, there is a coresight +tracer implementation available in `coresight_mode/` which is faster than QEMU, +however, cannot run in parallel. Currently, only one process can be traced, it +is WIP. + +Fore more information, see +[coresight_mode/README.md](../coresight_mode/README.md). + +## Binary rewriters + +An alternative solution are binary rewriters. They are faster than the solutions +native to AFL++ but don't always work. + +### ZAFL + +ZAFL is a static rewriting platform supporting x86-64 C/C++, +stripped/unstripped, and PIE/non-PIE binaries. Beyond conventional +instrumentation, ZAFL's API enables transformation passes (e.g., laf-Intel, +context sensitivity, InsTrim, etc.). + +Its baseline instrumentation speed typically averages 90-95% of +afl-clang-fast's. + +[https://git.zephyr-software.com/opensrc/zafl](https://git.zephyr-software.com/opensrc/zafl) + +### RetroWrite + +RetroWrite is a static binary rewriter that can be combined with AFL++. If you +have an x86_64 or arm64 binary that does not contain C++ exceptions and - if +x86_64 - still has it's symbols and compiled with position independent code +(PIC/PIE), then the RetroWrite solution might be for you. +It decompiles to ASM files which can then be instrumented with afl-gcc. + +Binaries that are statically instrumented for fuzzing using RetroWrite are close +in performance to compiler-instrumented binaries and outperform the QEMU-based +instrumentation. + +[https://github.com/HexHive/retrowrite](https://github.com/HexHive/retrowrite) + +### Dyninst + +Dyninst is a binary instrumentation framework similar to Pintool and DynamoRIO. +However, whereas Pintool and DynamoRIO work at runtime, Dyninst instruments the +target at load time and then let it run - or save the binary with the changes. +This is great for some things, e.g., fuzzing, and not so effective for others, +e.g., malware analysis. + +So, what you can do with Dyninst is taking every basic block and putting AFL++'s +instrumentation code in there - and then save the binary. Afterwards, just fuzz +the newly saved target binary with afl-fuzz. Sounds great? It is. The issue +though - it is a non-trivial problem to insert instructions, which change +addresses in the process space, so that everything is still working afterwards. +Hence, more often than not binaries crash when they are run. + +The speed decrease is about 15-35%, depending on the optimization options used +with afl-dyninst. + +[https://github.com/vanhauser-thc/afl-dyninst](https://github.com/vanhauser-thc/afl-dyninst) + +### Mcsema + +Theoretically, you can also decompile to llvm IR with mcsema, and then use +llvm_mode to instrument the binary. Good luck with that. + +[https://github.com/lifting-bits/mcsema](https://github.com/lifting-bits/mcsema) + +## Binary tracers + +### Pintool & DynamoRIO + +Pintool and DynamoRIO are dynamic instrumentation engines. They can be used for +getting basic block information at runtime. Pintool is only available for Intel +x32/x64 on Linux, Mac OS, and Windows, whereas DynamoRIO is additionally +available for ARM and AARCH64. DynamoRIO is also 10x faster than Pintool. + +The big issue with DynamoRIO (and therefore Pintool, too) is speed. DynamoRIO +has a speed decrease of 98-99%, Pintool has a speed decrease of 99.5%. + +Hence, DynamoRIO is the option to go for if everything else fails and Pintool +only if DynamoRIO fails, too. + +DynamoRIO solutions: +* [https://github.com/vanhauser-thc/afl-dynamorio](https://github.com/vanhauser-thc/afl-dynamorio) +* [https://github.com/mxmssh/drAFL](https://github.com/mxmssh/drAFL) +* [https://github.com/googleprojectzero/winafl/](https://github.com/googleprojectzero/winafl/) + <= very good but windows only + +Pintool solutions: +* [https://github.com/vanhauser-thc/afl-pin](https://github.com/vanhauser-thc/afl-pin) +* [https://github.com/mothran/aflpin](https://github.com/mothran/aflpin) +* [https://github.com/spinpx/afl_pin_mode](https://github.com/spinpx/afl_pin_mode) + <= only old Pintool version supported + +### Intel PT + +If you have a newer Intel CPU, you can make use of Intel's processor trace. The +big issue with Intel's PT is the small buffer size and the complex encoding of +the debug information collected through PT. This makes the decoding very CPU +intensive and hence slow. As a result, the overall speed decrease is about +70-90% (depending on the implementation and other factors). + +There are two AFL intel-pt implementations: + +1. [https://github.com/junxzm1990/afl-pt](https://github.com/junxzm1990/afl-pt) + => This needs Ubuntu 14.04.05 without any updates and the 4.4 kernel. + +2. [https://github.com/hunter-ht-2018/ptfuzzer](https://github.com/hunter-ht-2018/ptfuzzer) + => This needs a 4.14 or 4.15 kernel. The "nopti" kernel boot option must be + used. This one is faster than the other. + +Note that there is also honggfuzz: +[https://github.com/google/honggfuzz](https://github.com/google/honggfuzz). But +its IPT performance is just 6%! + +## Non-AFL++ solutions + +There are many binary-only fuzzing frameworks. Some are great for CTFs but don't +work with large binaries, others are very slow but have good path discovery, +some are very hard to set up... + +* Jackalope: + [https://github.com/googleprojectzero/Jackalope](https://github.com/googleprojectzero/Jackalope) +* Manticore: + [https://github.com/trailofbits/manticore](https://github.com/trailofbits/manticore) +* QSYM: + [https://github.com/sslab-gatech/qsym](https://github.com/sslab-gatech/qsym) +* S2E: [https://github.com/S2E](https://github.com/S2E) +* TinyInst: + [https://github.com/googleprojectzero/TinyInst](https://github.com/googleprojectzero/TinyInst) +* ... please send me any missing that are good + +## Closing words + +That's it! News, corrections, updates? Send an email to vh@thc.org. diff --git a/sources/aflplusplus/fuzzing_in_depth.md b/sources/aflplusplus/fuzzing_in_depth.md new file mode 100644 index 0000000..222272c --- /dev/null +++ b/sources/aflplusplus/fuzzing_in_depth.md @@ -0,0 +1,984 @@ +--- +status: collected +title: "Fuzzing with AFL++" +author: AFLplusplus Community +collector: Souls-R +collected_date: 20240827 +link: https://github.com/AFLplusplus/AFLplusplus/blob/stable/docs/fuzzing_in_depth.md +--- +# Fuzzing with AFL++ + +The following describes how to fuzz with a target if source code is available. +If you have a binary-only target, go to +[fuzzing_binary-only_targets.md](fuzzing_binary-only_targets.md). + +Fuzzing source code is a three-step process: + +1. Compile the target with a special compiler that prepares the target to be + fuzzed efficiently. This step is called "instrumenting a target". +2. Prepare the fuzzing by selecting and optimizing the input corpus for the + target. +3. Perform the fuzzing of the target by randomly mutating input and assessing if + that input was processed on a new path in the target binary. + +## 0. Common sense risks + +Please keep in mind that, similarly to many other computationally-intensive +tasks, fuzzing may put a strain on your hardware and on the OS. In particular: + +- Your CPU will run hot and will need adequate cooling. In most cases, if + cooling is insufficient or stops working properly, CPU speeds will be + automatically throttled. That said, especially when fuzzing on less suitable + hardware (laptops, smartphones, etc.), it's not entirely impossible for + something to blow up. + +- Targeted programs may end up erratically grabbing gigabytes of memory or + filling up disk space with junk files. AFL++ tries to enforce basic memory + limits, but can't prevent each and every possible mishap. The bottom line is + that you shouldn't be fuzzing on systems where the prospect of data loss is + not an acceptable risk. + +- Fuzzing involves billions of reads and writes to the filesystem. On modern + systems, this will be usually heavily cached, resulting in fairly modest + "physical" I/O - but there are many factors that may alter this equation. It + is your responsibility to monitor for potential trouble; with very heavy I/O, + the lifespan of many HDDs and SSDs may be reduced. + + A good way to monitor disk I/O on Linux is the `iostat` command: + + ```shell + $ iostat -d 3 -x -k [...optional disk ID...] + ``` + + Using the `AFL_TMPDIR` environment variable and a RAM-disk, you can have the + heavy writing done in RAM to prevent the aforementioned wear and tear. For + example, the following line will run a Docker container with all this preset: + + ```shell + # docker run -ti --mount type=tmpfs,destination=/ramdisk -e AFL_TMPDIR=/ramdisk aflplusplus/aflplusplus + ``` + +## 1. Instrumenting the target + +### a) Selecting the best AFL++ compiler for instrumenting the target + +AFL++ comes with a central compiler `afl-cc` that incorporates various different +kinds of compiler targets and instrumentation options. The following +evaluation flow will help you to select the best possible. + +It is highly recommended to have the newest llvm version possible installed, +anything below 9 is not recommended. + +``` ++--------------------------------+ +| clang/clang++ 11+ is available | --> use LTO mode (afl-clang-lto/afl-clang-lto++) ++--------------------------------+ see [instrumentation/README.lto.md](instrumentation/README.lto.md) + | + | if not, or if the target fails with LTO afl-clang-lto/++ + | + v ++---------------------------------+ +| clang/clang++ 3.8+ is available | --> use LLVM mode (afl-clang-fast/afl-clang-fast++) ++---------------------------------+ see [instrumentation/README.llvm.md](instrumentation/README.llvm.md) + | + | if not, or if the target fails with LLVM afl-clang-fast/++ + | + v + +--------------------------------+ + | gcc 5+ is available | -> use GCC_PLUGIN mode (afl-gcc-fast/afl-g++-fast) + +--------------------------------+ see [instrumentation/README.gcc_plugin.md](instrumentation/README.gcc_plugin.md) and + [instrumentation/README.instrument_list.md](instrumentation/README.instrument_list.md) + | + | if not, or if you do not have a gcc with plugin support + | + v + use GCC mode (afl-gcc/afl-g++) (or afl-clang/afl-clang++ for clang) +``` + +Clickable README links for the chosen compiler: + +* [LTO mode - afl-clang-lto](../instrumentation/README.lto.md) +* [LLVM mode - afl-clang-fast](../instrumentation/README.llvm.md) +* [GCC_PLUGIN mode - afl-gcc-fast](../instrumentation/README.gcc_plugin.md) +* GCC/CLANG modes (afl-gcc/afl-clang) have no README as they have no own + features + +You can select the mode for the afl-cc compiler by one of the following methods: + +* Using a symlink to afl-cc: afl-gcc, afl-g++, afl-clang, afl-clang++, + afl-clang-fast, afl-clang-fast++, afl-clang-lto, afl-clang-lto++, + afl-gcc-fast, afl-g++-fast (recommended!). +* Using the environment variable `AFL_CC_COMPILER` with `MODE`. +* Passing --afl-`MODE` command line options to the compiler via + `CFLAGS`/`CXXFLAGS`/`CPPFLAGS`. + +`MODE` can be one of the following: + +* LTO (afl-clang-lto*) +* LLVM (afl-clang-fast*) +* GCC_PLUGIN (afl-g*-fast) or GCC (afl-gcc/afl-g++) +* CLANG(afl-clang/afl-clang++) + +Because no AFL++ specific command-line options are accepted (beside the +--afl-MODE command), the compile-time tools make fairly broad use of environment +variables, which can be listed with `afl-cc -hh` or looked up in +[env_variables.md](env_variables.md). + +### b) Selecting instrumentation options + +If you instrument with LTO mode (afl-clang-fast/afl-clang-lto), the following +options are available: + +* Splitting integer, string, float, and switch comparisons so AFL++ can easier + solve these. This is an important option if you do not have a very good and + large input corpus. This technique is called laf-intel or COMPCOV. To use + this, set the following environment variable before compiling the target: + `export AFL_LLVM_LAF_ALL=1`. You can read more about this in + [instrumentation/README.laf-intel.md](../instrumentation/README.laf-intel.md). +* A different technique (and usually a better one than laf-intel) is to + instrument the target so that any compare values in the target are sent to + AFL++ which then tries to put these values into the fuzzing data at different + locations. This technique is very fast and good - if the target does not + transform input data before comparison. Therefore, this technique is called + `input to state` or `redqueen`. If you want to use this technique, then you + have to compile the target twice, once specifically with/for this mode by + setting `AFL_LLVM_CMPLOG=1`, and pass this binary to afl-fuzz via the `-c` + parameter. Note that you can compile also just a cmplog binary and use that + for both, however, there will be a performance penalty. You can read more + about this in + [instrumentation/README.cmplog.md](../instrumentation/README.cmplog.md). + +If you use LTO, LLVM, or GCC_PLUGIN mode +(afl-clang-fast/afl-clang-lto/afl-gcc-fast), you have the option to selectively +instrument _parts_ of the target that you are interested in. For afl-clang-fast, +you have to use an llvm version newer than 10.0.0 or a mode other than +DEFAULT/PCGUARD. + +This step can be done either by explicitly including parts to be instrumented or +by explicitly excluding parts from instrumentation. + +* To instrument _only specified parts_, create a file (e.g., `allowlist.txt`) + with all the filenames and/or functions of the source code that should be + instrumented and then: + + 1. Just put one filename or function (prefixing with `fun: `) per line (no + directory information necessary for filenames) in the file `allowlist.txt`. + + Example: + + ``` + foo.cpp # will match foo/foo.cpp, bar/foo.cpp, barfoo.cpp etc. + fun: foo_func # will match the function foo_func + ``` + + 2. Set `export AFL_LLVM_ALLOWLIST=allowlist.txt` to enable selective positive + instrumentation. + +* Similarly to _exclude_ specified parts from instrumentation, create a file + (e.g., `denylist.txt`) with all the filenames of the source code that should + be skipped during instrumentation and then: + + 1. Same as above. Just put one filename or function per line in the file + `denylist.txt`. + + 2. Set `export AFL_LLVM_DENYLIST=denylist.txt` to enable selective negative + instrumentation. + +**NOTE:** During optimization functions might be +inlined and then would not match the list! See +[instrumentation/README.instrument_list.md](../instrumentation/README.instrument_list.md). + +There are many more options and modes available, however, these are most of the +time less effective. See: + +* [instrumentation/README.llvm.md#6) AFL++ Context Sensitive Branch Coverage](../instrumentation/README.llvm.md#6-afl-context-sensitive-branch-coverage) +* [instrumentation/README.llvm.md#7) AFL++ N-Gram Branch Coverage](../instrumentation/README.llvm.md#7-afl-n-gram-branch-coverage) + +AFL++ performs "never zero" counting in its bitmap. You can read more about this +here: +* [instrumentation/README.llvm.md#8-neverzero-counters](../instrumentation/README.llvm.md#8-neverzero-counters) + +### c) Selecting sanitizers + +It is possible to use sanitizers when instrumenting targets for fuzzing, which +allows you to find bugs that would not necessarily result in a crash. + +Note that sanitizers have a huge impact on CPU (= less executions per second) +and RAM usage. Also, you should only run one afl-fuzz instance per sanitizer +type. This is enough because e.g. a use-after-free bug will be picked up by ASAN +(address sanitizer) anyway after syncing test cases from other fuzzing +instances, so running more than one address sanitized target would be a waste. + +The following sanitizers have built-in support in AFL++: + +* ASAN = Address SANitizer, finds memory corruption vulnerabilities like + use-after-free, NULL pointer dereference, buffer overruns, etc. Enabled with + `export AFL_USE_ASAN=1` before compiling. +* MSAN = Memory SANitizer, finds read accesses to uninitialized memory, e.g., a + local variable that is defined and read before it is even set. Enabled with + `export AFL_USE_MSAN=1` before compiling. +* UBSAN = Undefined Behavior SANitizer, finds instances where - by the C and C++ + standards - undefined behavior happens, e.g., adding two signed integers where + the result is larger than what a signed integer can hold. Enabled with `export + AFL_USE_UBSAN=1` before compiling. +* CFISAN = Control Flow Integrity SANitizer, finds instances where the control + flow is found to be illegal. Originally this was rather to prevent return + oriented programming (ROP) exploit chains from functioning. In fuzzing, this + is mostly reduced to detecting type confusion vulnerabilities - which is, + however, one of the most important and dangerous C++ memory corruption + classes! Enabled with `export AFL_USE_CFISAN=1` before compiling. +* TSAN = Thread SANitizer, finds thread race conditions. Enabled with `export + AFL_USE_TSAN=1` before compiling. +* LSAN = Leak SANitizer, finds memory leaks in a program. This is not really a + security issue, but for developers this can be very valuable. Note that unlike + the other sanitizers above this needs `__AFL_LEAK_CHECK();` added to all areas + of the target source code where you find a leak check necessary! Enabled with + `export AFL_USE_LSAN=1` before compiling. To ignore the memory-leaking check + for certain allocations, `__AFL_LSAN_OFF();` can be used before memory is + allocated, and `__AFL_LSAN_ON();` afterwards. Memory allocated between these + two macros will not be checked for memory leaks. + +It is possible to further modify the behavior of the sanitizers at run-time by +setting `ASAN_OPTIONS=...`, `LSAN_OPTIONS` etc. - the available parameters can +be looked up in the sanitizer documentation of llvm/clang. afl-fuzz, however, +requires some specific parameters important for fuzzing to be set. If you want +to set your own, it might bail and report what it is missing. + +Note that some sanitizers cannot be used together, e.g., ASAN and MSAN, and +others often cannot work together because of target weirdness, e.g., ASAN and +CFISAN. You might need to experiment which sanitizers you can combine in a +target (which means more instances can be run without a sanitized target, which +is more effective). + +### d) Modifying the target + +If the target has features that make fuzzing more difficult, e.g., checksums, +HMAC, etc., then modify the source code so that checks for these values are +removed. This can even be done safely for source code used in operational +products by eliminating these checks within these AFL++ specific blocks: + +``` +#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION + // say that the checksum or HMAC was fine - or whatever is required + // to eliminate the need for the fuzzer to guess the right checksum + return 0; +#endif +``` + +All AFL++ compilers will set this preprocessor definition automatically. + +### e) Instrumenting the target + +In this step, the target source code is compiled so that it can be fuzzed. + +Basically, you have to tell the target build system that the selected AFL++ +compiler is used. Also - if possible - you should always configure the build +system in such way that the target is compiled statically and not dynamically. +How to do this is described below. + +The #1 rule when instrumenting a target is: avoid instrumenting shared libraries +at all cost. You would need to set `LD_LIBRARY_PATH` to point to these, you +could accidentally type "make install" and install them system wide - so don't. +Really don't. **Always compile libraries you want to have instrumented as static +and link these to the target program!** + +Then build the target. (Usually with `make`.) + +**NOTES** + +1. Sometimes configure and build systems are fickle and do not like stderr + output (and think this means a test failure) - which is something AFL++ likes + to do to show statistics. It is recommended to disable AFL++ instrumentation + reporting via `export AFL_QUIET=1`. + +2. Sometimes configure and build systems error on warnings - these should be + disabled (e.g., `--disable-werror` for some configure scripts). + +3. In case the configure/build system complains about AFL++'s compiler and + aborts, then set `export AFL_NOOPT=1` which will then just behave like the + real compiler and run the configure step separately. For building the target + afterwards this option has to be unset again! + +#### configure + +For `configure` build systems, this is usually done by: + +``` +CC=afl-clang-fast CXX=afl-clang-fast++ ./configure --disable-shared +``` + +Note that if you are using the (better) afl-clang-lto compiler, you also have to +set `AR` to llvm-ar[-VERSION] and `RANLIB` to llvm-ranlib[-VERSION] - as is +described in [instrumentation/README.lto.md](../instrumentation/README.lto.md). + +#### CMake + +For CMake build systems, this is usually done by: + +``` +mkdir build; cd build; cmake -DCMAKE_C_COMPILER=afl-cc -DCMAKE_CXX_COMPILER=afl-c++ .. +``` + +Note that if you are using the (better) afl-clang-lto compiler you also have to +set AR to llvm-ar[-VERSION] and RANLIB to llvm-ranlib[-VERSION] - as is +described in [instrumentation/README.lto.md](../instrumentation/README.lto.md). + +#### Meson Build System + +For the Meson Build System, you have to set the AFL++ compiler with the very +first command! + +``` +CC=afl-cc CXX=afl-c++ meson +``` + +#### Other build systems or if configure/cmake didn't work + +Sometimes `cmake` and `configure` do not pick up the AFL++ compiler or the +`RANLIB`/`AR` that is needed - because this was just not foreseen by the +developer of the target. Or they have non-standard options. Figure out if there +is a non-standard way to set this, otherwise set up the build normally and edit +the generated build environment afterwards manually to point it to the right +compiler (and/or `RANLIB` and `AR`). + +In complex, weird, alien build systems you can try this neat project: +[https://github.com/fuzzah/exeptor](https://github.com/fuzzah/exeptor) + +#### Linker scripts + +If the project uses linker scripts to hide the symbols exported by the +binary, then you may see errors such as: + +``` +undefined symbol: __afl_area_ptr +``` + +The solution is to modify the linker script to add: + +``` +{ + global: + __afl_*; +} +``` + +### f) Better instrumentation + +If you just fuzz a target program as-is, you are wasting a great opportunity for +much more fuzzing speed. + +This variant requires the usage of afl-clang-lto, afl-clang-fast or +afl-gcc-fast. + +It is the so-called `persistent mode`, which is much, much faster but requires +that you code a source file that is specifically calling the target functions +that you want to fuzz, plus a few specific AFL++ functions around it. See +[instrumentation/README.persistent_mode.md](../instrumentation/README.persistent_mode.md) +for details. + +Basically, if you do not fuzz a target in persistent mode, then you are just +doing it for a hobby and not professionally :-). + +### g) libfuzzer fuzzer harnesses with LLVMFuzzerTestOneInput() + +libfuzzer `LLVMFuzzerTestOneInput()` harnesses are the defacto standard for +fuzzing, and they can be used with AFL++ (and honggfuzz) as well! + +Compiling them is as simple as: + +``` +afl-clang-fast++ -fsanitize=fuzzer -o harness harness.cpp targetlib.a +``` + +You can even use advanced libfuzzer features like `FuzzedDataProvider`, +`LLVMFuzzerInitialize()` etc. and they will work! + +The generated binary is fuzzed with afl-fuzz like any other fuzz target. + +Bonus: the target is already optimized for fuzzing due to persistent mode and +shared-memory test cases and hence gives you the fastest speed possible. + +For more information, see +[utils/aflpp_driver/README.md](../utils/aflpp_driver/README.md). + +## 2. Preparing the fuzzing campaign + +As you fuzz the target with mutated input, having as diverse inputs for the +target as possible improves the efficiency a lot. + +### a) Collecting inputs + +To operate correctly, the fuzzer requires one or more starting files that +contain a good example of the input data normally expected by the targeted +application. + +Try to gather valid inputs for the target from wherever you can. E.g., if it is +the PNG picture format, try to find as many PNG files as possible, e.g., from +reported bugs, test suites, random downloads from the internet, unit test case +data - from all kind of PNG software. + +If the input format is not known, you can also modify a target program to write +normal data it receives and processes to a file and use these. + +You can find many good examples of starting files in the +[testcases/](../testcases) subdirectory that comes with this tool. + +### b) Making the input corpus unique + +Use the AFL++ tool `afl-cmin` to remove inputs from the corpus that do not +produce a new path/coverage in the target: + +1. Put all files from [step a](#a-collecting-inputs) into one directory, e.g., + `INPUTS`. +2. Run afl-cmin: + * If the target program is to be called by fuzzing as `bin/target INPUTFILE`, + replace the INPUTFILE argument that the target program would read from with + `@@`: + + ``` + afl-cmin -i INPUTS -o INPUTS_UNIQUE -- bin/target -someopt @@ + ``` + + * If the target reads from stdin (standard input) instead, just omit the `@@` + as this is the default: + + ``` + afl-cmin -i INPUTS -o INPUTS_UNIQUE -- bin/target -someopt + ``` + +This step is highly recommended, because afterwards the testcase corpus is not +bloated with duplicates anymore, which would slow down the fuzzing progress! + +### c) Minimizing all corpus files + +The shorter the input files that still traverse the same path within the target, +the better the fuzzing will be. This minimization is done with `afl-tmin`, +however, it is a long process as this has to be done for every file: + +``` +mkdir input +cd INPUTS_UNIQUE +for i in *; do + afl-tmin -i "$i" -o "../input/$i" -- bin/target -someopt @@ +done +``` + +This step can also be parallelized, e.g., with `parallel`. + +Note that this step is rather optional though. + +### Done! + +The INPUTS_UNIQUE/ directory from [step b](#b-making-the-input-corpus-unique) - +or even better the directory input/ if you minimized the corpus in +[step c](#c-minimizing-all-corpus-files) - is the resulting input corpus +directory to be used in fuzzing! :-) + +## 3. Fuzzing the target + +In this final step, fuzz the target. There are not that many important options +to run the target - unless you want to use many CPU cores/threads for the +fuzzing, which will make the fuzzing much more useful. + +If you just use one instance for fuzzing, then you are fuzzing just for fun and +not seriously :-) + +### a) Running afl-fuzz + +Before you do even a test run of afl-fuzz, execute `sudo afl-system-config` (on +the host if you execute afl-fuzz in a Docker container). This reconfigures the +system for optimal speed - which afl-fuzz checks and bails otherwise. Set +`export AFL_SKIP_CPUFREQ=1` for afl-fuzz to skip this check if you cannot run +afl-system-config with root privileges on the host for whatever reason. + +Note: + +* There is also `sudo afl-persistent-config` which sets additional permanent + boot options for a much better fuzzing performance. +* Both scripts improve your fuzzing performance but also decrease your system + protection against attacks! So set strong firewall rules and only expose SSH + as a network service if you use these (which is highly recommended). + +If you have an input corpus from [step 2](#2-preparing-the-fuzzing-campaign), +then specify this directory with the `-i` option. Otherwise, create a new +directory and create a file with any content as test data in there. + +If you do not want anything special, the defaults are already usually best, +hence all you need is to specify the seed input directory with the result of +step [2a) Collecting inputs](#a-collecting-inputs): + +``` +afl-fuzz -i input -o output -- bin/target -someopt @@ +``` + +Note that the directory specified with `-o` will be created if it does not +exist. + +It can be valuable to run afl-fuzz in a `screen` or `tmux` shell so you can log +off, or afl-fuzz is not aborted if you are running it in a remote ssh session +where the connection fails in between. Only do that though once you have +verified that your fuzzing setup works! Run it like `screen -dmS afl-main -- +afl-fuzz -M main-$HOSTNAME -i ...` and it will start away in a screen session. +To enter this session, type `screen -r afl-main`. You see - it makes sense to +name the screen session same as the afl-fuzz `-M`/`-S` naming :-) For more +information on screen or tmux, check their documentation. + +If you need to stop and re-start the fuzzing, use the same command line options +(or even change them by selecting a different power schedule or another mutation +mode!) and switch the input directory with a dash (`-`): + +``` +afl-fuzz -i - -o output -- bin/target -someopt @@ +``` + +Adding a dictionary is helpful. You have the following options: + +* See the directory +[dictionaries/](../dictionaries/), if something is already included for your +data format, and tell afl-fuzz to load that dictionary by adding `-x +dictionaries/FORMAT.dict`. +* With `afl-clang-lto`, you have an autodictionary generation for which you need + to do nothing except to use afl-clang-lto as the compiler. +* With `afl-clang-fast`, you can set + `AFL_LLVM_DICT2FILE=/full/path/to/new/file.dic` to automatically generate a + dictionary during target compilation. + Adding `AFL_LLVM_DICT2FILE_NO_MAIN=1` to not parse main (usually command line + parameter parsing) is often a good idea too. +* You also have the option to generate a dictionary yourself during an + independent run of the target, see + [utils/libtokencap/README.md](../utils/libtokencap/README.md). +* Finally, you can also write a dictionary file manually, of course. + +afl-fuzz has a variety of options that help to workaround target quirks like +very specific locations for the input file (`-f`), performing deterministic +fuzzing (`-D`) and many more. Check out `afl-fuzz -h`. + +We highly recommend that you set a memory limit for running the target with `-m` +which defines the maximum memory in MB. This prevents a potential out-of-memory +problem for your system plus helps you detect missing `malloc()` failure +handling in the target. Play around with various `-m` values until you find one +that safely works for all your input seeds (if you have good ones and then +double or quadruple that). + +By default, afl-fuzz never stops fuzzing. To terminate AFL++, press Control-C or +send a signal SIGINT. You can limit the number of executions or approximate +runtime in seconds with options also. + +When you start afl-fuzz, you will see a user interface that shows what the +status is: + +![resources/screenshot.png](resources/screenshot.png) + +All labels are explained in +[afl-fuzz_approach.md#understanding-the-status-screen](afl-fuzz_approach.md#understanding-the-status-screen). + +### b) Keeping memory use and timeouts in check + +Memory limits are not enforced by afl-fuzz by default and the system may run out +of memory. You can decrease the memory with the `-m` option, the value is in MB. +If this is too small for the target, you can usually see this by afl-fuzz +bailing with the message that it could not connect to the forkserver. + +Consider setting low values for `-m` and `-t`. + +For programs that are nominally very fast, but get sluggish for some inputs, you +can also try setting `-t` values that are more punishing than what `afl-fuzz` +dares to use on its own. On fast and idle machines, going down to `-t 5` may be +a viable plan. + +The `-m` parameter is worth looking at, too. Some programs can end up spending a +fair amount of time allocating and initializing megabytes of memory when +presented with pathological inputs. Low `-m` values can make them give up sooner +and not waste CPU time. + +### c) Using multiple cores + +If you want to seriously fuzz, then use as many cores/threads as possible to +fuzz your target. + +On the same machine - due to the design of how AFL++ works - there is a maximum +number of CPU cores/threads that are useful, use more and the overall +performance degrades instead. This value depends on the target, and the limit is +between 32 and 64 cores per machine. + +If you have the RAM, it is highly recommended run the instances with a caching +of the test cases. Depending on the average test case size (and those found +during fuzzing) and their number, a value between 50-500MB is recommended. You +can set the cache size (in MB) by setting the environment variable +`AFL_TESTCACHE_SIZE`. + +There should be one main fuzzer (`-M main-$HOSTNAME` option - set also +`AFL_FINAL_SYNC=1`) and as many secondary fuzzers (e.g., `-S variant1`) as you +have cores that you use. Every `-M`/`-S` entry needs a unique name (that can be +whatever), however, the same `-o` output directory location has to be used for +all instances. + +For every secondary fuzzer there should be a variation, e.g.: +* one should fuzz the target that was compiled with sanitizers activated + (`export AFL_USE_ASAN=1 ; export AFL_USE_UBSAN=1 ; export AFL_USE_CFISAN=1`) +* one or two should fuzz the target with CMPLOG/redqueen (see above), at least + one cmplog instance should follow transformations (`-l 2AT`) +* one to three fuzzers should fuzz a target compiled with laf-intel/COMPCOV (see + above). Important note: If you run more than one laf-intel/COMPCOV fuzzer and + you want them to share their intermediate results, the main fuzzer (`-M`) must + be one of them (although this is not really recommended). + +The other secondaries should be run like this: +* 10% with the MOpt mutator enabled: `-L 0` +* 10% should use the old queue cycling with `-Z` +* 50-70% should run with `AFL_DISABLE_TRIM` +* 40% should run with `-P explore` and 20% with `-P exploit` +* If you use `-a` then set 30% of the instances to not use `-a`; if you did + not set `-a` (why??), then set 30% to `-a ascii` and 30% to `-a binary`. +* run each with a different power schedule, recommended are: `fast` (default), + `explore`, `coe`, `lin`, `quad`, `exploit`, and `rare` which you can set with + the `-p` option, e.g., `-p explore`. See the + [FAQ](FAQ.md#what-are-power-schedules) for details. + +It can be useful to set `AFL_IGNORE_SEED_PROBLEMS=1` to skip over seeds that +crash or timeout during startup. + +Also, it is recommended to set `export AFL_IMPORT_FIRST=1` to load test cases +from other fuzzers in the campaign first. But note that can slow down the start +of the first fuzz by quite a lot of you have many fuzzers and/or many seeds. + +If you have a large corpus, a corpus from a previous run or are fuzzing in a CI, +then also set `export AFL_CMPLOG_ONLY_NEW=1` and `export AFL_FAST_CAL=1`. +If the queue in the CI is huge and/or the execution time is slow then you can +also add `AFL_NO_STARTUP_CALIBRATION=1` to skip the initial queue calibration +phase and start fuzzing at once - but only do this if the calibration phase +would be too long for your fuzz run time. + +You can also use different fuzzers. If you are using AFL spinoffs or AFL +conforming fuzzers, then just use the same -o directory and give it a unique +`-S` name. Examples are: +* [Fuzzolic](https://github.com/season-lab/fuzzolic) +* [symcc](https://github.com/eurecom-s3/symcc/) +* [Eclipser](https://github.com/SoftSec-KAIST/Eclipser/) +* [AFLsmart](https://github.com/aflsmart/aflsmart) +* [FairFuzz](https://github.com/carolemieux/afl-rb) +* [Neuzz](https://github.com/Dongdongshe/neuzz) +* [Angora](https://github.com/AngoraFuzzer/Angora) + +A long list can be found at +[https://github.com/Microsvuln/Awesome-AFL](https://github.com/Microsvuln/Awesome-AFL). + +However, you can also sync AFL++ with honggfuzz, libfuzzer with `-entropic=1`, +etc. Just show the main fuzzer (`-M`) with the `-F` option where the queue/work +directory of a different fuzzer is, e.g., `-F /src/target/honggfuzz`. Using +honggfuzz (with `-n 1` or `-n 2`) and libfuzzer in parallel is highly +recommended! + +### d) Using multiple machines for fuzzing + +Maybe you have more than one machine you want to fuzz the same target on. Start +the `afl-fuzz` (and perhaps libfuzzer, honggfuzz, ...) orchestra as you like, +just ensure that your have one and only one `-M` instance per server, and that +its name is unique, hence the recommendation for `-M main-$HOSTNAME`. + +Now there are three strategies on how you can sync between the servers: +* never: sounds weird, but this makes every server an island and has the chance + that each follow different paths into the target. You can make this even more + interesting by even giving different seeds to each server. +* regularly (~4h): this ensures that all fuzzing campaigns on the servers "see" + the same thing. It is like fuzzing on a huge server. +* in intervals of 1/10th of the overall expected runtime of the fuzzing you + sync. This tries a bit to combine both. Have some individuality of the paths + each campaign on a server explores, on the other hand if one gets stuck where + another found progress this is handed over making it unstuck. + +The syncing process itself is very simple. As the `-M main-$HOSTNAME` instance +syncs to all `-S` secondaries as well as to other fuzzers, you have to copy only +this directory to the other machines. + +Let's say all servers have the `-o out` directory in /target/foo/out, and you +created a file `servers.txt` which contains the hostnames of all participating +servers, plus you have an ssh key deployed to all of them, then run: + +```bash +for FROM in `cat servers.txt`; do + for TO in `cat servers.txt`; do + rsync -rlpogtz --rsh=ssh $FROM:/target/foo/out/main-$FROM $TO:target/foo/out/ + done +done +``` + +You can run this manually, per cron job - as you need it. There is a more +complex and configurable script in +[utils/distributed_fuzzing](../utils/distributed_fuzzing). + +### e) The status of the fuzz campaign + +AFL++ comes with the `afl-whatsup` script to show the status of the fuzzing +campaign. + +Just supply the directory that afl-fuzz is given with the `-o` option and you +will see a detailed status of every fuzzer in that campaign plus a summary. + +To have only the summary, use the `-s` switch, e.g., `afl-whatsup -s out/`. + +If you have multiple servers, then use the command after a sync or you have to +execute this script per server. + +Another tool to inspect the current state and history of a specific instance is +afl-plot, which generates an index.html file and graphs that show how the +fuzzing instance is performing. The syntax is `afl-plot instance_dir web_dir`, +e.g., `afl-plot out/default /srv/www/htdocs/plot`. + +### f) Stopping fuzzing, restarting fuzzing, adding new seeds + +To stop an afl-fuzz run, press Control-C. + +To restart an afl-fuzz run, just reuse the same command line but replace the `-i +directory` with `-i -` or set `AFL_AUTORESUME=1`. + +If you want to add new seeds to a fuzzing campaign, you can run a temporary +fuzzing instance, e.g., when your main fuzzer is using `-o out` and the new +seeds are in `newseeds/` directory: + +``` +AFL_BENCH_JUST_ONE=1 AFL_FAST_CAL=1 afl-fuzz -i newseeds -o out -S newseeds -- ./target +``` + +### g) Checking the coverage of the fuzzing + +The `corpus count` value is a bad indicator for checking how good the coverage +is. + +A better indicator - if you use default llvm instrumentation with at least +version 9 - is to use `afl-showmap` with the collect coverage option `-C` on the +output directory: + +``` +$ afl-showmap -C -i out -o /dev/null -- ./target -params @@ +... +[*] Using SHARED MEMORY FUZZING feature. +[*] Target map size: 9960 +[+] Processed 7849 input files. +[+] Captured 4331 tuples (highest value 255, total values 67130596) in '/dev/nul +l'. +[+] A coverage of 4331 edges were achieved out of 9960 existing (43.48%) with 7849 input files. +``` + +It is even better to check out the exact lines of code that have been reached - +and which have not been found so far. + +An "easy" helper script for this is +[https://github.com/vanhauser-thc/afl-cov](https://github.com/vanhauser-thc/afl-cov), +just follow the README of that separate project. + +If you see that an important area or a feature has not been covered so far, then +try to find an input that is able to reach that and start a new secondary in +that fuzzing campaign with that seed as input, let it run for a few minutes, +then terminate it. The main node will pick it up and make it available to the +other secondary nodes over time. Set `export AFL_NO_AFFINITY=1` or `export +AFL_TRY_AFFINITY=1` if you have no free core. + +Note that in nearly all cases you can never reach full coverage. A lot of +functionality is usually dependent on exclusive options that would need +individual fuzzing campaigns each with one of these options set. E.g., if you +fuzz a library to convert image formats and your target is the png to tiff API, +then you will not touch any of the other library APIs and features. + +### h) How long to fuzz a target? + +This is a difficult question. Basically, if no new path is found for a long time +(e.g., for a day or a week), then you can expect that your fuzzing won't be +fruitful anymore. However, often this just means that you should switch out +secondaries for others, e.g., custom mutator modules, sync to very different +fuzzers, etc. + +Keep the queue/ directory (for future fuzzings of the same or similar targets) +and use them to seed other good fuzzers like libfuzzer with the -entropic switch +or honggfuzz. + +### i) Improve the speed! + +* Use [persistent mode](../instrumentation/README.persistent_mode.md) (x2-x20 + speed increase). +* If you do not use shmem persistent mode, use `AFL_TMPDIR` to point the input + file on a tempfs location, see [env_variables.md](env_variables.md). +* Linux: Improve kernel performance: modify `/etc/default/grub`, set + `GRUB_CMDLINE_LINUX_DEFAULT="ibpb=off ibrs=off kpti=off l1tf=off mds=off + mitigations=off no_stf_barrier noibpb noibrs nopcid nopti + nospec_store_bypass_disable nospectre_v1 nospectre_v2 pcid=off pti=off + spec_store_bypass_disable=off spectre_v2=off stf_barrier=off"`; then + `update-grub` and `reboot` (warning: makes the system more insecure) - you can + also just run `sudo afl-persistent-config`. +* Linux: Running on an `ext2` filesystem with `noatime` mount option will be a + bit faster than on any other journaling filesystem. +* Use your cores! See [3c) Using multiple cores](#c-using-multiple-cores). +* Run `sudo afl-system-config` before starting the first afl-fuzz instance after + a reboot. + +### j) Going beyond crashes + +Fuzzing is a wonderful and underutilized technique for discovering non-crashing +design and implementation errors, too. Quite a few interesting bugs have been +found by modifying the target programs to call `abort()` when say: + +- Two bignum libraries produce different outputs when given the same + fuzzer-generated input. + +- An image library produces different outputs when asked to decode the same + input image several times in a row. + +- A serialization/deserialization library fails to produce stable outputs when + iteratively serializing and deserializing fuzzer-supplied data. + +- A compression library produces an output inconsistent with the input file when + asked to compress and then decompress a particular blob. + +Implementing these or similar sanity checks usually takes very little time; if +you are the maintainer of a particular package, you can make this code +conditional with `#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION` (a flag also +shared with libfuzzer and honggfuzz) or `#ifdef __AFL_COMPILER` (this one is +just for AFL++). + +### k) Known limitations & areas for improvement + +Here are some of the most important caveats for AFL++: + +- AFL++ detects faults by checking for the first spawned process dying due to a + signal (SIGSEGV, SIGABRT, etc.). Programs that install custom handlers for + these signals may need to have the relevant code commented out. In the same + vein, faults in child processes spawned by the fuzzed target may evade + detection unless you manually add some code to catch that. + +- As with any other brute-force tool, the fuzzer offers limited coverage if + encryption, checksums, cryptographic signatures, or compression are used to + wholly wrap the actual data format to be tested. + + To work around this, you can comment out the relevant checks (see + utils/libpng_no_checksum/ for inspiration); if this is not possible, you can + also write a postprocessor, one of the hooks of custom mutators. See + [custom_mutators.md](custom_mutators.md) on how to use + `AFL_CUSTOM_MUTATOR_LIBRARY`. + +- There are some unfortunate trade-offs with ASAN and 64-bit binaries. This + isn't due to any specific fault of afl-fuzz. + +- There is no direct support for fuzzing network services, background daemons, + or interactive apps that require UI interaction to work. You may need to make + simple code changes to make them behave in a more traditional way. Preeny or + libdesock may offer a relatively simple option, too - see: + [https://github.com/zardus/preeny](https://github.com/zardus/preeny) or + [https://github.com/fkie-cad/libdesock](https://github.com/fkie-cad/libdesock) + + Some useful tips for modifying network-based services can be also found at: + [https://www.fastly.com/blog/how-to-fuzz-server-american-fuzzy-lop](https://www.fastly.com/blog/how-to-fuzz-server-american-fuzzy-lop) + +- Occasionally, sentient machines rise against their creators. If this happens + to you, please consult + [https://lcamtuf.coredump.cx/prep/](https://lcamtuf.coredump.cx/prep/). + +Beyond this, see [INSTALL.md](INSTALL.md) for platform-specific tips. + +## 4. Triaging crashes + +The coverage-based grouping of crashes usually produces a small data set that +can be quickly triaged manually or with a very simple GDB or Valgrind script. +Every crash is also traceable to its parent non-crashing test case in the queue, +making it easier to diagnose faults. + +Having said that, it's important to acknowledge that some fuzzing crashes can be +difficult to quickly evaluate for exploitability without a lot of debugging and +code analysis work. To assist with this task, afl-fuzz supports a very unique +"crash exploration" mode enabled with the `-C` flag. + +In this mode, the fuzzer takes one or more crashing test cases as the input and +uses its feedback-driven fuzzing strategies to very quickly enumerate all code +paths that can be reached in the program while keeping it in the crashing state. + +Mutations that do not result in a crash are rejected; so are any changes that do +not affect the execution path. + +The output is a small corpus of files that can be very rapidly examined to see +what degree of control the attacker has over the faulting address, or whether it +is possible to get past an initial out-of-bounds read - and see what lies +beneath. + +Oh, one more thing: for test case minimization, give afl-tmin a try. The tool +can be operated in a very simple way: + +```shell +./afl-tmin -i test_case -o minimized_result -- /path/to/program [...] +``` + +The tool works with crashing and non-crashing test cases alike. In the crash +mode, it will happily accept instrumented and non-instrumented binaries. In the +non-crashing mode, the minimizer relies on standard AFL++ instrumentation to +make the file simpler without altering the execution path. + +The minimizer accepts the `-m`, `-t`, `-f`, and `@@` syntax in a manner +compatible with afl-fuzz. + +Another tool in AFL++ is the afl-analyze tool. It takes an input file, attempts +to sequentially flip bytes and observes the behavior of the tested program. It +then color-codes the input based on which sections appear to be critical and +which are not; while not bulletproof, it can often offer quick insights into +complex file formats. + +`casr-afl` from [CASR](https://github.com/ispras/casr) tools provides +comfortable triaging for crashes found by AFL++. Reports are clustered and +contain severity and other information. +```shell +casr-afl -i /path/to/afl/out/dir -o /path/to/casr/out/dir +``` + +## 5. CI fuzzing + +Some notes on continuous integration (CI) fuzzing - this fuzzing is different to +normal fuzzing campaigns as these are much shorter runnings. + +If the queue in the CI is huge and/or the execution time is slow then you can +also add `AFL_NO_STARTUP_CALIBRATION=1` to skip the initial queue calibration +phase and start fuzzing at once. But only do that if the calibration time is +too long for your overall available fuzz run time. + +1. Always: + * LTO has a much longer compile time which is diametrical to short fuzzing - + hence use afl-clang-fast instead. + * If you compile with CMPLOG, then you can save compilation time and reuse + that compiled target with the `-c` option and as the main fuzz target. + This will impact the speed by ~15% though. + * `AFL_FAST_CAL` - enables fast calibration, this halves the time the + saturated corpus needs to be loaded. + * `AFL_CMPLOG_ONLY_NEW` - only perform cmplog on new finds, not the initial + corpus as this very likely has been done for them already. + * Keep the generated corpus, use afl-cmin and reuse it every time! + +2. Additionally randomize the AFL++ compilation options, e.g.: + * 30% for `AFL_LLVM_CMPLOG` + * 5% for `AFL_LLVM_LAF_ALL` + +3. Also randomize the afl-fuzz runtime options, e.g.: + * 65% for `AFL_DISABLE_TRIM` + * 50% for `AFL_KEEP_TIMEOUTS` + * 50% use a dictionary generated by `AFL_LLVM_DICT2FILE` + `AFL_LLVM_DICT2FILE_NO_MAIN=1` + * 10% use MOpt (`-L 0`) + * 40% for `AFL_EXPAND_HAVOC_NOW` + * 20% for old queue processing (`-Z`) + * for CMPLOG targets, 70% for `-l 2`, 10% for `-l 3`, 20% for `-l 2AT` + +4. Do *not* run any `-M` modes, just running `-S` modes is better for CI + fuzzing. `-M` enables old queue handling etc. which is good for a fuzzing + campaign but not good for short CI runs. + +How this can look like can, e.g., be seen at AFL++'s setup in Google's +[previous oss-fuzz version](https://github.com/google/oss-fuzz/blob/3e2c5312417d1a6f9564472f3df1fd27759b289d/infra/base-images/base-builder/compile_afl) +and +[clusterfuzz](https://github.com/google/clusterfuzz/blob/master/src/clusterfuzz/_internal/bot/fuzzers/afl/launcher.py). + +## The End + +Check out the [FAQ](FAQ.md). Maybe it answers your question (that you might not +even have known you had ;-) ). + +This is basically all you need to know to professionally run fuzzing campaigns. +If you want to know more, the tons of texts in [docs/](./) will have you +covered. + +Note that there are also a lot of tools out there that help fuzzing with AFL++ +(some might be deprecated or unsupported), see +[third_party_tools.md](third_party_tools.md).