Skip to content

Fuzzing ruamel yaml (Python) project with sydr fuzz (Atheris backend)

Alexey Vishnyakov edited this page Feb 1, 2023 · 21 revisions

Introduction

In this article I'll share my experience of fuzzing Python projects. For this purpose I use sydr-fuzz with Atheris backend. Sydr-fuzz was originally designed as a hybrid fuzzer that combines Sydr (DSE tool) and top world fuzzers like AFLplusplus and libFuzzer. Also, sydr-fuzz supports some useful features like crash triage by casr, ability to check security predicates, and some convenient subcommands for corpus minimization and code coverage collection. Atheris is a coverage-guided Python fuzzing engine. It supports fuzzing of Python code and also native extensions written for CPython. Atheris is based on libFuzzer. Atheris looks and works like libFuzzer, so we decided to support it in sydr-fuzz, why not? Though we don't have symbolic execution for Python code but we still could do fuzzing, crash triage, corpus minimization, and coverage collection using sydr-fuzz interface.

Preparing Fuzz Target

Atheris github page has a nice instruction about installing and using it. We will fuzz yaml project from it's examples. There is a docker container already prepared for building with all needed fuzzing environment. I'll use it for my fuzzing experiments, but for now let's look more precisely at fuzz target and build script.

import atheris

with atheris.instrument_imports():
  from ruamel import yaml as ruamel_yaml
  import sys
  import warnings

# Suppress all warnings.
warnings.simplefilter("ignore")

ryaml = ruamel_yaml.YAML(typ="safe", pure=True)
ryaml.allow_duplicate_keys = True


@atheris.instrument_func
def TestOneInput(input_bytes):
  fdp = atheris.FuzzedDataProvider(input_bytes)
  data = fdp.ConsumeUnicode(sys.maxsize)

  try:
    iterator = ryaml.load_all(data)
    for _ in iterator:
      pass
  except ruamel_yaml.error.YAMLError:
    return

  except Exception:
    input_type = str(type(data))
    codepoints = [hex(ord(x)) for x in data]
    sys.stderr.write(
        "Input was {input_type}: {data}\nCodepoints: {codepoints}".format(
            input_type=input_type, data=data, codepoints=codepoints))
    raise


def main():
  atheris.Setup(sys.argv, TestOneInput)
  atheris.Fuzz()


if __name__ == "__main__":
  main()

First, we define which modules we want to instrument. There is also an ability to instrument everything: atheris.instrument_all(). It might be useful when your project has many dependencies. Then we need to implement def TestOneInput(input_bytes), this is similar to int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) for C/C++. It is important to catch exceptions that are thrown by the target function. But you need to catch only those exceptions that are specified by developers or that the function throws directly. For example, IndexError doesn't need to be caught if it is not specified in documentation. Atheris catches it as a crash. At last, we need to write some code to start fuzzing process:

atheris.Setup(sys.argv, TestOneInput)
atheris.Fuzz()

As for build, we could just use pip install . from project directory and install instrumented project in our fuzzing environment. Ok, let's build docker container and start fuzzing!

Fuzzing

Before we begin, let's look at yaml_fuzzer.toml:

exit-on-time = 3600

[atheris]
path = "/yaml_fuzzer.py"
args = "/corpus -dict=yaml.dict -jobs=1000 -workers=4"

It's pretty simple.

exit-on-time - is an optional parameter that takes time in seconds. If during this time (1 hour in our case) the coverage does not increase, fuzzing is automatically terminated.

I'll use 4 workers for fuzzing till 1000 crashes are found or exit-on-time is triggered. Let's start fuzzing with this command:

# sydr-fuzz -c yaml_fuzzer.toml run
[2023-01-11 17:22:47] [INFO] #3582      RELOAD cov: 1178 ft: 5252 corp: 478/64Kb lim: 487 exec/s: 275 rss: 713Mb                                                     
[2023-01-11 17:22:48] [INFO] Uncaught Python exception: KeyError: (0, 1) /fuzz/yaml_fuzzer-out/crashes/crash-a0acd109aef7675ce2268eec4e0901759f4e1edc                      
[2023-01-11 17:22:50] [INFO] #17540     REDUCE cov: 1178 ft: 5257 corp: 511/86Kb lim: 481 exec/s: 343 rss: 677Mb L: 13/481 MS: 2 CrossOver-EraseBytes-                                                     
[2023-01-11 17:22:50] [INFO] #17573     REDUCE cov: 1178 ft: 5257 corp: 511/86Kb lim: 481 exec/s: 344 rss: 677Mb L: 58/481 MS: 3 ChangeBit-ManualDict-EraseBytes- DE: "'"- 
[2023-01-11 17:22:50] [INFO] Uncaught Python exception: KeyError: (1, 5) /fuzz/yaml_fuzzer-out/crashes/crash-4230d57dcf9dce49804ffd9abbc43a751068c6a2                                                          
[2023-01-11 17:22:55] [INFO] #1024      pulse  cov: 1171 ft: 4926 corp: 413/36Kb exec/s: 204 rss: 710Mb                                                                                                            
[2023-01-11 17:22:55] [INFO] [ATHERIS]         run time : 0 days, 0 hrs, 0 min, 57 sec                                                                                                                             
[2023-01-11 17:22:55] [INFO] [ATHERIS]    last new find : 0 days, 0 hrs, 0 min, 8 sec                                                                                                                              
[2023-01-11 17:22:57] [INFO] #1268      INITED cov: 1178 ft: 5265 corp: 477/60Kb exec/s: 181 rss: 710Mb

After some amount of time we have found some crashes. Let's wait till fuzzing is finished.

[2023-01-11 19:21:27] [INFO] Uncaught Python exception: KeyError: (2, 1) /fuzz/yaml_fuzzer-out/crashes/crash-1f71bdb8cbba856923a45f50c7873bdb7ef64e2d
[2023-01-11 19:21:28] [INFO] Uncaught Python exception: KeyError: (1, 5) /fuzz/yaml_fuzzer-out/crashes/crash-c3152f019e73c4c01925ae1533f47583fe3006df
[2023-01-11 19:21:28] [INFO] Uncaught Python exception: KeyError: (2, 1) /fuzz/yaml_fuzzer-out/crashes/crash-fed9747bec6197c9c8cbc0cf10c051c8f807d407
[2023-01-11 19:21:28] [INFO] Uncaught Python exception: KeyError: (1, 0) /fuzz/yaml_fuzzer-out/crashes/crash-01ae8831870d95a5be5898dd17457235b851bfdf
[2023-01-11 19:21:41] [INFO] EXIT_ON_TIME: No new coverage (cov) for 3600 secs.
[2023-01-11 19:21:42] [INFO] EXIT_ON_TIME: No new coverage (cov) for 3600 secs.
[2023-01-11 19:21:42] [INFO] [RESULTS] Fuzzing corpus is saved in /fuzz/yaml_fuzzer-out/corpus
[2023-01-11 19:21:42] [INFO] [RESULTS] oom/leak/timeout/crash: 0/0/0/407
[2023-01-11 19:21:42] [INFO] [RESULTS] Fuzzing results are saved in /fuzz/yaml_fuzzer-out/crashes

Nice, our fuzzing experiment is ended by exit-on-time. We've got 407 crashes to analyze! This is a job for casr.

Let's minimize corpus first:

# sydr-fuzz -c yaml_fuzzer.toml cmin
[2023-01-11 20:30:08] [INFO] Original fuzzing corpus saved as /fuzz/yaml_fuzzer-out/corpus-old
[2023-01-11 20:30:08] [INFO] Minimizing corpus /fuzz/yaml_fuzzer-out/corpus
[2023-01-11 20:30:08] [INFO] Using LD_PRELOAD="/usr/local/lib/python3.8/dist-packages/asan_with_fuzzer.so"
[2023-01-11 20:30:08] [INFO] ASAN_OPTIONS="abort_on_error=1,detect_leaks=0,malloc_context_size=0,symbolize=0,allocator_may_return_null=1"
[2023-01-11 20:30:08] [INFO] Launching atheris: "/yaml_fuzzer.py" "-merge=1" "-artifact_prefix=/fuzz/yaml_fuzzer-out/crashes/" "-close_fd_mask=2" "-verbosity=2" "-detect_leaks=0" "-dict=/fuzz/yaml.dict" "/fuzz/yaml_fuzzer-out/corpus" "/fuzz/yaml_fuzzer-out/corpus-old"
[2023-01-11 20:30:10] [INFO] MERGE-OUTER: 8719 files, 0 in the initial corpus, 0 processed earlier
[2023-01-11 20:30:10] [INFO] MERGE-OUTER: attempt 1
[2023-01-11 20:31:04] [INFO] MERGE-OUTER: successful in 1 attempt(s)
[2023-01-11 20:31:04] [INFO] MERGE-OUTER: the control file has 982127 bytes
[2023-01-11 20:31:04] [INFO] MERGE-OUTER: consumed 0Mb (120Mb rss) to parse the control file
[2023-01-11 20:31:04] [INFO] MERGE-OUTER: 913 new files with 7301 new features added; 1249 new coverage edges

We've narrowed 8719 files to 913 files, nice! Now we can collect the code coverage!

Coverage

For code coverage we use well-known coverage python module and this instruction from Atheris GitHub. Of course, we've wrapped it into sydr-fuzz pycov subcommand. Let's get html coverage report:

# sydr-fuzz -c yaml_fuzzer.toml pycov html
[2023-01-11 20:37:47] [INFO] Running pycov html "/fuzz/yaml_fuzzer.toml"
[2023-01-11 20:37:47] [INFO] Collecting coverage data for each file in corpus: /fuzz/yaml_fuzzer-out/corpus
[2023-01-11 20:37:47] [INFO] Saving coverage data to /fuzz/yaml_fuzzer-out/coverage/html/.coverage
[2023-01-11 20:37:47] [INFO] Using LD_PRELOAD="/usr/local/lib/python3.8/dist-packages/asan_with_fuzzer.so"
[2023-01-11 20:37:47] [INFO] ASAN_OPTIONS="abort_on_error=1,detect_leaks=0,malloc_context_size=0,symbolize=0,allocator_may_return_null=1"
[2023-01-11 20:37:47] [INFO] Collecting coverage: "coverage" "run" "/yaml_fuzzer.py" "-atheris_runs=914"
[2023-01-11 20:37:51] [INFO] Running coverage html: "coverage" "html" "-d" "/fuzz/yaml_fuzzer-out/coverage/html" "--data-file=/fuzz/yaml_fuzzer-out/coverage/html/.coverage"
Wrote HTML report to /fuzz/yaml_fuzzer-out/coverage/html/index.html

Good, we've got the coverage, let's look at it and move on further!

cov-html

Crash Triage

As I said before, I'll use casr via sydr-fuzz casr subcommand for crash triage:

# sydr-fuzz -c yaml_fuzzer.toml casr

You can learn more about casr from it's repository or from my other fuzzing tutorial.

Let's look at casr output:

[2023-01-11 20:47:14] [INFO] Casr-cluster: deduplication of casr reports...
[2023-01-11 20:47:16] [INFO] Reports before deduplication: 407; after: 16
[2023-01-11 20:47:16] [INFO] Casr-cluster: clustering casr reports...
[2023-01-11 20:47:16] [INFO] Reports before clustering: 16. Clusters: 8
[2023-01-11 20:47:16] [INFO] Copying inputs...
[2023-01-11 20:47:16] [INFO] Done!
[2023-01-11 20:47:16] [INFO] ==> <cl1>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl1/crash-bf5829959ccf0211640314bb30de19bc9bafdeb3
[2023-01-11 20:47:16] [INFO]   casr-python: UNDEFINED: KeyError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/resolver.py:361
[2023-01-11 20:47:16] [INFO]   Similar crashes: 1
[2023-01-11 20:47:16] [INFO] Cluster summary -> KeyError: 1
[2023-01-11 20:47:16] [INFO] ==> <cl2>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl2/crash-e126eb63b0bc1aefac72c3f56dea8484577f1007
[2023-01-11 20:47:16] [INFO]   casr-python: UNDEFINED: RecursionError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/events.py:78
[2023-01-11 20:47:16] [INFO]   Similar crashes: 1
[2023-01-11 20:47:16] [INFO] Cluster summary -> RecursionError: 1
[2023-01-11 20:47:16] [INFO] ==> <cl3>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl3/crash-017ee5d1bb2bee51263f083eb12a60711a3c84f1
[2023-01-11 20:47:16] [INFO]   casr-python: UNDEFINED: KeyError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/resolver.py:361
[2023-01-11 20:47:16] [INFO]   Similar crashes: 4
[2023-01-11 20:47:16] [INFO] Cluster summary -> KeyError: 4
[2023-01-11 20:47:16] [INFO] ==> <cl4>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl4/crash-3637416d80df3c5961e05b0bd459b79009e2a182
[2023-01-11 20:47:16] [INFO]   casr-python: UNDEFINED: KeyError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/resolver.py:361
[2023-01-11 20:47:16] [INFO]   Similar crashes: 2
[2023-01-11 20:47:16] [INFO] Cluster summary -> KeyError: 2
[2023-01-11 20:47:16] [INFO] ==> <cl5>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl5/crash-01ae8831870d95a5be5898dd17457235b851bfdf
[2023-01-11 20:47:16] [INFO]   casr-python: UNDEFINED: KeyError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/resolver.py:361
[2023-01-11 20:47:16] [INFO]   Similar crashes: 4
[2023-01-11 20:47:16] [INFO] Cluster summary -> KeyError: 4
[2023-01-11 20:47:16] [INFO] ==> <cl6>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl6/crash-0ea90a02b95f99e850b036e49419a43103a54149
[2023-01-11 20:47:16] [INFO]   casr-python: UNDEFINED: ValueError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/constructor.py:533
[2023-01-11 20:47:16] [INFO]   Similar crashes: 1
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl6/crash-988f305721849b6a75af3b3f424b4593901630c3
[2023-01-11 20:47:16] [INFO]   casr-python: UNDEFINED: ValueError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/constructor.py:498
[2023-01-11 20:47:16] [INFO]   Similar crashes: 1
[2023-01-11 20:47:16] [INFO] Cluster summary -> ValueError: 2
[2023-01-11 20:47:16] [INFO] ==> <cl7>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl7/crash-3f369c580ac61eded9d05eb06bc1ad6d0e90bfe1
[2023-01-11 20:47:16] [INFO]   casr-python: UNDEFINED: ValueError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/constructor.py:498
[2023-01-11 20:47:16] [INFO]   Similar crashes: 1
[2023-01-11 20:47:16] [INFO] Cluster summary -> ValueError: 1
[2023-01-11 20:47:16] [INFO] ==> <cl8>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl8/crash-05451dc00f42aa97a064d2e08153bb84af113717
[2023-01-11 20:47:16] [INFO]   casr-python: UNDEFINED: TypeError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/constructor.py:273
[2023-01-11 20:47:16] [INFO]   Similar crashes: 1
[2023-01-11 20:47:16] [INFO] Cluster summary -> TypeError: 1
[2023-01-11 20:47:16] [INFO] SUMMARY -> RecursionError: 1 KeyError: 11 ValueError: 3 TypeError: 1
[2023-01-11 20:47:16] [INFO] Crashes and Casr reports are saved in /fuzz/yaml_fuzzer-out/casr

After deduplication we have 16 crashes splitted into 8 clusters. Nice, now we can get down to manual analysis. Let's look at some report, for example from cl6: casrep An unhandled exception has occurred while converting string to float. Looks like an issue:).

Conclusion

In conclusion I want to say that Atheris is a cool fuzzer for Python code. Sydr-fuzz interface is neat. And of course casr, that can triage crashes for Python, helps a lot!


Andrey Fedotov