Skip to content

Fuzzing json‐sanitizer (Java) project with sydr‐fuzz (Jazzer backend)

Savidov Georgy edited this page Jul 18, 2023 · 1 revision

Introduction

This short article demonstrates an approach to fuzzing Java software using sydr-fuzz with Jazzer support. Sydr-fuzz combines Sydr's dynamic symbolic execution with modern fuzzers such as libFuzzer and AFLplusplus. The tool provides a convenient interface for corpus minimization, security predicates checking, coverage collection, and crash analysis using casr. We will use Jazzer, a coverage-guided, in-process fuzzer based on libFuzzer for the JVM platform developed by Code Intelligence. We will not be able to use Sydr for Java code, but all the other features of sydr-fuzz that are useful for analysis are available.

Preparing Fuzz Target

Let's use json-sanitizer as a fuzzing demo project. For fuzzing Java code, we will need Jazzer. Installation instructions for Jazzer can be found in its repository. You will need the jazzer binary itself and its jazzer_standalone_deploy.jar library. Detailed instructions for building Java projects and preparing them for fuzzing can also be found at oss-fuzz. The Docker container with all the necessary fuzzing environment is on the project page at oss-sydr-fuzz. Let's look at the fuzz target:

import com.code_intelligence.jazzer.api.FuzzedDataProvider;
import com.code_intelligence.jazzer.api.FuzzerSecurityIssueHigh;
import com.code_intelligence.jazzer.api.FuzzerSecurityIssueMedium;
import com.google.json.JsonSanitizer;

public class DenylistFuzzer {
  public static void fuzzerTestOneInput(FuzzedDataProvider data) {
    String input = data.consumeRemainingAsString();
    String output;
    try {
      output = JsonSanitizer.sanitize(input, 10);
    } catch (ArrayIndexOutOfBoundsException e) {
      // ArrayIndexOutOfBoundsException is expected if nesting depth is
      // exceeded.
      return;
    }

    // Check for forbidden substrings. As these would enable Cross-Site
    // Scripting, treat every finding as a high severity vulnerability.
    assert !output.contains("</script")
        : new FuzzerSecurityIssueHigh("Output contains </script");
    assert !output.contains("]]>")
        : new FuzzerSecurityIssueHigh("Output contains ]]>");

    // Check for more forbidden substrings. As these would not directly enable
    // Cross-Site Scripting in general, but may impact script execution on the
    // embedding page, treat each finding as a medium severity vulnerability.
    assert !output.contains("<script")
        : new FuzzerSecurityIssueMedium("Output contains <script");
    assert !output.contains("<!--")
        : new FuzzerSecurityIssueMedium("Output contains <!--");
  }
}

We need to create file with name *Fuzzer.java, that will contain Java class with the same name. The class must implement the static method fuzzerTestOneInput(byte[]) or fuzzerTestOneInput(FuzzedDataProvider). We will use the last one for convenient representation of Java types.

Also we need to install maven for json-sanitizer's build:

# wget https://dlcdn.apache.org/maven/maven-3/3.9.3/binaries/apache-maven-3.9.3-bin.tar.gz
# tar -xvf apache-maven-*-bin.tar.gz
# rm apache-maven-*-bin.tar.gz
# mv apache-maven-* /opt/
# export PATH="$PATH:/opt/apache-maven-3.9.3/bin"

Next, run the following command in the project directory :

# mvn package

Now all that's left is to build the fuzz target:

# javac -cp /path/to/target/dir:/json-sanitizer/target/json-sanitizer-<version>.jar:/path/to/jazzer_standalone_deploy.jar DenylistFuzzer

We are ready to start fuzzing!

Fuzzing

Before we start let's look at configuration file DenylistFuzzer.toml:

[jazzer]
target_class = "DenylistFuzzer"
args = "--cp=/out/:/out/json-sanitizer.jar -jobs=400 -workers=4 -rss_limit_mb=4096 /out/corpus -dict=/out/DenylistFuzzer.dict"

With the target_class field, we define a class with a fuzzerTestOneInput method. You must specify --cp (class path) in the arguments - the paths to all dependencies of our fuzz target. Also here you can specify both the remaining options of the Jazzer, and the options of libFuzzer.

Let's start fuzzing:

# sydr-fuzz -c DenylistFuzzer.toml run
[2023-07-12 13:17:41] [INFO] #12792     INITED cov: 470 ft: 6683 corp: 2041/10117Kb exec/s: 2132 rss: 966Mb
[2023-07-12 13:17:41] [INFO] #12988     REDUCE cov: 470 ft: 6683 corp: 2041/10117Kb lim: 131336 exec/s: 2164 rss: 966Mb L: 36/131336 MS: 2 EraseBytes-Custom-
[2023-07-12 13:17:41] [INFO] #13214     REDUCE cov: 470 ft: 6683 corp: 2041/10117Kb lim: 131336 exec/s: 2202 rss: 966Mb L: 31/131336 MS: 2 EraseBytes-Custom-
[2023-07-12 13:17:41] [INFO] #13220     REDUCE cov: 470 ft: 6683 corp: 2041/10117Kb lim: 131336 exec/s: 2203 rss: 966Mb L: 27/131336 MS: 2 EraseBytes-Custom-
[2023-07-12 13:17:41] [INFO] #13350     REDUCE cov: 470 ft: 6683 corp: 2041/10117Kb lim: 131336 exec/s: 2225 rss: 966Mb L: 35/131336 MS: 10 CopyPart-Custom-ChangeBinInt-Custom-CMP-Custom-ChangeByte-Custom-EraseBytes-Custom- DE: "A58e5"-
[2023-07-12 13:17:41] [INFO] #13496     REDUCE cov: 470 ft: 6683 corp: 2041/10117Kb lim: 131336 exec/s: 2249 rss: 966Mb L: 68/131336 MS: 2 EraseBytes-Custom-
[2023-07-12 13:17:41] [INFO] #13656     REDUCE cov: 470 ft: 6683 corp: 2041/10108Kb lim: 131336 exec/s: 2276 rss: 966Mb L: 118030/131336 MS: 10 CrossOver-Custom-InsertByte-Custom-CrossOver-Custom-InsertRepeatedBytes-Custom-EraseBytes-Custom-
[2023-07-12 13:17:42] [INFO] #13978     REDUCE cov: 470 ft: 6683 corp: 2041/10108Kb lim: 131336 exec/s: 2329 rss: 966Mb L: 10723/131336 MS: 4 CrossOver-Custom-EraseBytes-Custom-
[2023-07-12 13:17:42] [INFO] == Java Exception: java.lang.IndexOutOfBoundsException: start 25, end 1, length 26 /fuzz/DenylistFuzzer-out/crashes/crash-abce4e6aa46f9c6d172cd241fb4d1e30c21e7763
[2023-07-12 13:58:57] [INFO] #24412	INITED cov: 470 ft: 6943 corp: 2118/9924Kb exec/s: 2034 rss: 994Mb
[2023-07-12 13:58:57] [INFO] #24483	REDUCE cov: 470 ft: 6943 corp: 2118/9924Kb lim: 131336 exec/s: 2040 rss: 994Mb L: 3/131336 MS: 2 EraseBytes-Custom-
[2023-07-12 13:58:59] [INFO] == Java Exception: java.lang.IndexOutOfBoundsException: start 27, end 0, length 27 /fuzz/DenylistFuzzer-out/crashes/crash-f69217eecbb509cfc6dfba87375d5a7595e46be8
[2023-07-12 13:59:00] [INFO] == Java Exception: java.lang.AssertionError: com.code_intelligence.jazzer.api.FuzzerSecurityIssueMedium: Output contains <!-- /fuzz/DenylistFuzzer-out/crashes/crash-a881bfe8c2a44ddadb2d28de66f3e9da54fe8034
[2023-07-12 13:59:00] [INFO] == Java Exception: java.lang.IndexOutOfBoundsException: start 36, end 1, length 37 /fuzz/DenylistFuzzer-out/crashes/crash-fc4a045cb55197a9c0416e5e526f197d7abf3d01
[2023-07-12 13:59:13] [INFO] #24430	INITED cov: 470 ft: 6943 corp: 2105/9924Kb exec/s: 2714 rss: 973Mb
[2023-07-12 13:59:13] [INFO] [RESULTS] Fuzzing corpus is saved in /fuzz/DenylistFuzzer-out/corpus
[2023-07-12 13:59:13] [INFO] [RESULTS] oom/leak/timeout/crash: 0/0/0/544
[2023-07-12 13:59:13] [INFO] [RESULTS] Fuzzing results are saved in /fuzz/DenylistFuzzer-out/crashes

544 crashes were found! We definitely need casr here.

Before we continue, let's minimize the input corpus first:

# sydr-fuzz -c DenyFuzzer.toml cmin
[2023-07-12 14:56:44] [INFO] Original fuzzing corpus saved as /fuzz/DenylistFuzzer-out/corpus-old
[2023-07-12 14:56:44] [INFO] Minimizing corpus /fuzz/DenylistFuzzer-out/corpus
[2023-07-12 14:56:45] [INFO] Jazzer environment: ASAN_OPTIONS=abort_on_error=0,malloc_context_size=0,allocator_may_return_null=1
[2023-07-12 14:56:45] [INFO] Launching Jazzer: cd "/fuzz/DenylistFuzzer-out/jazzer" && ASAN_OPTIONS="abort_on_error=1,malloc_context_size=0,allocator_may_return_null=1" "/usr/local/bin/jazzer" "-merge=1" "-rss_limit_mb=8192" "-detect_leaks=0" "-artifact_prefix=/fuzz/DenylistFuzzer-out/crashes/" "-use_value_profile=1" "-verbosity=2" "--reproducer_path=/dev/null" "--jvm_args=-Xmx2048m:-Xss1024k" "--target_class=DenylistFuzzer" "--agent_path=/usr/local/lib/jazzer_standalone_deploy.jar" "--cp=/out/:/out/gson-2.8.6.jar:/out/json-sanitizer.jar" "/fuzz/DenylistFuzzer-out/corpus" "/fuzz/DenylistFuzzer-out/corpus-old"
[2023-07-12 14:56:46] [INFO] MERGE-OUTER: 24439 files, 0 in the initial corpus, 0 processed earlier
[2023-07-12 14:56:46] [INFO] MERGE-OUTER: attempt 1
[2023-07-12 14:56:57] [INFO] MERGE-OUTER: successful in 1 attempt(s)
[2023-07-12 14:56:57] [INFO] MERGE-OUTER: the control file has 2783113 bytes
[2023-07-12 14:56:57] [INFO] MERGE-OUTER: consumed 1Mb (782Mb rss) to parse the control file
[2023-07-12 14:56:57] [INFO] MERGE-OUTER: 2116 new files with 6973 new features added; 500 new coverage edges

The data corpus has been significantly reduced: out of 24439 files, only 2116 remained. The next step is coverage.

Coverage

To collect coverage, we will use the jacoco library. All installation instructions can be found here. It is, of course, already present in our docker container. Now we will try to get html coverage report. Note that in order to display the source code lines in our report, it is necessary to specify the paths to the source code directories in the CASR_SOURCE_DIRS environment variable. Let's do this:

# export CASR_SOURCE_DIRS=/json-sanitizer/src/main/java
# sydr-fuzz -c DenylistFuzzer.toml jacov html
[2023-07-12 15:20:05] [INFO] Running jacov html "/fuzz/DenylistFuzzer.toml"
[2023-07-12 15:20:05] [INFO] Collecting coverage data for each file in corpus: /fuzz/DenylistFuzzer-out/corpus
[2023-07-12 15:20:05] [INFO] Jazzer environment: ASAN_OPTIONS=allocator_may_return_null=1,malloc_context_size=0,abort_on_error=1
[2023-07-12 15:20:05] [INFO] Launching Jazzer: cd "/fuzz/DenylistFuzzer-out/jazzer" && ASAN_OPTIONS="allocator_may_return_null=1,malloc_context_size=0,abort_on_error=1" "/usr/local/bin/jazzer" "-detect_leaks=0" "-artifact_prefix=/fuzz/DenylistFuzzer-out/crashes/" "-use_value_profile=1" "-verbosity=2" "--reproducer_path=/dev/null" "--jvm_args=-Xmx2048m:-Xss1024k" "--target_class=DenylistFuzzer" "--agent_path=/usr/local/lib/jazzer_standalone_deploy.jar" "--cp=/out/:/out/gson-2.8.6.jar:/out/json-sanitizer.jar" "-rss_limit_mb=4096" "--additional_jvm_args=-javaagent\\:/usr/local/lib/jacoco/lib/jacocoagent.jar=destfile=/fuzz/DenylistFuzzer-out/coverage/jacoco.exec" "--nohooks" "-runs=0" "/fuzz/DenylistFuzzer-out/corpus"
[2023-07-12 15:20:06] [INFO] Running java to collect coverage html: "/usr/bin/java" "-jar" "/usr/local/lib/jacoco/lib/jacococli.jar" "report" "/fuzz/DenylistFuzzer-out/coverage/jacoco.exec" "--sourcefiles" "/json-sanitizer/src/main/java" "--classfiles" "/out/" "--classfiles" "/out/gson-2.8.6.jar" "--classfiles" "/out/json-sanitizer.jar" "--html" "/fuzz/DenylistFuzzer-out/coverage/html"
[INFO] Loading execution data file /fuzz/DenylistFuzzer-out/coverage/jacoco.exec.
[INFO] Analyzing 170 classes.
[2023-07-12 15:20:07] [INFO] html coverage report is saved to "/fuzz/DenylistFuzzer-out/coverage/html"

The coverage in the html report will look like this:

guide_cov

Crash Triage

It's time to analyze crashes with casr. Let's run it:

# sydr-fuzz -c DenylistFuzzer.toml casr

You can learn more about casr from it's repository or from other fuzzing tutorial.

The output of the sydr-fuzz casr command looks like this:

[2023-07-12 16:32:54] [INFO] Casr-cluster: deduplication of casr reports...
[2023-07-12 16:32:55] [INFO] Reports before deduplication: 544; after: 9
[2023-07-12 16:32:55] [INFO] Casr-cluster: clustering casr reports...
[2023-07-12 16:32:55] [INFO] Reports before clustering: 9. Clusters: 2
[2023-07-12 16:32:55] [INFO] Copying inputs...
[2023-07-12 16:32:55] [INFO] Done!
[2023-07-12 16:32:55] [INFO] ==> <cl1>
[2023-07-12 16:32:55] [INFO] Crash: /fuzz/DenylistFuzzer-out/casr/cl1/crash-7fc508793e29a20a61c0a590e3c39f4e5c64b194
[2023-07-12 16:32:55] [INFO]   casr-java: NOT_EXPLOITABLE: com.code_intelligence.jazzer.api.FuzzerSecurityIssueHigh: DenylistFuzzer.java:36
[2023-07-12 16:32:55] [INFO]   Similar crashes: 1
[2023-07-12 16:32:55] [INFO] Crash: /fuzz/DenylistFuzzer-out/casr/cl1/crash-25535dd57fa4a95833f18b1856415477c1f9cd88
[2023-07-12 16:32:55] [INFO]   casr-java: NOT_EXPLOITABLE: com.code_intelligence.jazzer.api.FuzzerSecurityIssueMedium: DenylistFuzzer.java:44
[2023-07-12 16:32:55] [INFO]   Similar crashes: 1
[2023-07-12 16:32:55] [INFO] Crash: /fuzz/DenylistFuzzer-out/casr/cl1/crash-01322fa015e11aa60b7131cc50ef85950b33b0cf
[2023-07-12 16:32:55] [INFO]   casr-java: NOT_EXPLOITABLE: com.code_intelligence.jazzer.api.FuzzerSecurityIssueHigh: DenylistFuzzer.java:38
[2023-07-12 16:32:55] [INFO]   Similar crashes: 1
[2023-07-12 16:32:55] [INFO] Crash: /fuzz/DenylistFuzzer-out/casr/cl1/crash-03d226b16fd27adf10e4d521afe0cd82dcfbcf2c
[2023-07-12 16:32:55] [INFO]   casr-java: NOT_EXPLOITABLE: com.code_intelligence.jazzer.api.FuzzerSecurityIssueMedium: DenylistFuzzer.java:46
[2023-07-12 16:32:55] [INFO]   Similar crashes: 1
[2023-07-12 16:32:55] [INFO] Cluster summary -> com.code_intelligence.jazzer.api.FuzzerSecurityIssueHigh: 2 com.code_intelligence.jazzer.api.FuzzerSecurityIssueMedium: 2
[2023-07-12 16:32:55] [INFO] ==> <cl2>
[2023-07-12 16:32:55] [INFO] Crash: /fuzz/DenylistFuzzer-out/casr/cl2/crash-f5d7fc872cabb309c3c7f97cd5a44c698e3beb25
[2023-07-12 16:32:55] [INFO]   casr-java: NOT_EXPLOITABLE: java.lang.IndexOutOfBoundsException: JsonSanitizer.java:717
[2023-07-12 16:32:55] [INFO]   Similar crashes: 5
[2023-07-12 16:32:55] [INFO] Cluster summary -> java.lang.IndexOutOfBoundsException: 5
[2023-07-12 16:32:55] [INFO] SUMMARY -> java.lang.IndexOutOfBoundsException: 5 com.code_intelligence.jazzer.api.FuzzerSecurityIssueMedium: 2 com.code_intelligence.jazzer.api.FuzzerSecurityIssueHigh: 2
[2023-07-12 16:32:55] [INFO] Crashes and Casr reports are saved in /fuzz/DenylistFuzzer-out/casr

Before deduplication, we had a set of 544 crashes. After deduplication, it remains to process only 9 reports, which clustering divided into 2 groups. Let's look at the report from the second cluster:

guide_cli

Out of bounds occurred while appending a part of the string. Seems like a potential bug.

Conclusion

In this article the main aspects of fuzzing Java code were introduced: preparing a fuzz target, minimizing a corpus, fuzzing itself, collecting source code coverage, and triaging crashes with casr. Thanks to the sydr-fuzz tool, this analysis can be done with just a few commands, that greatly simplifies the workflow.