Skip to content

Fuzzing fast‐xml‐parser (JavaScript) project with sydr‐fuzz (Jazzer.js backend)

PaDarochek edited this page Jan 29, 2024 · 1 revision

Introduction

In this paper, we will explore an approach to fuzzing JavaScript applications using Sydr-Fuzz interface based on the Jazzer.js fuzzer. Sydr-Fuzz provides a convenient interface for running hybrid fuzzing, leveraging the dynamic symbolic execution capabilities of Sydr tool in combination with the modern fuzzers such as libFuzzer and AFLplusplus. In addition to fuzzing, Sydr-Fuzz offers a set of features for corpus minimization, collecting coverage, finding bugs in programs by checking security predicates, and crash analysis with Casr. In addition to programs in purely compiled languages, Sydr-Fuzz supports fuzzing applications in Python and Java. The next step in the development of the tool was to add the ability to fuzz JavaScript code. The popular in-process coverage-guided fuzzer Jazzer.js, based on libFuzzer, was chosen to implement this idea. The fuzzer is suitable for projects on Node.js platform and was developed by Code Intelligence in a similar way to Jazzer fuzzer for Java language. While we won't be able to use dynamic symbolic execution capabilities for JavaScript projects, all the other Sydr-Fuzz functionality will be available to us.

Preparing Fuzz Target

We will use the fast-xml-parser project to demonstrate fuzzing of JavaScript code via Sydr-Fuzz. Instructions on how to install and use Jazzer.js can be found in its repository on GitHub, and the OSS-Fuzz repository also provides detailed instructions on how to prepare a JavaScript project for fuzzing and set up the necessary environment. In our repository OSS-Sydr-Fuzz we already have a prepared Docker-container with a customized environment, which we will use for fuzzing.

We can build and run the Docker-container with the following command:

$ docker build -t oss-sydr-fuzz-fastxmlparser .
$ docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /etc/localtime:/etc/localtime:ro --rm -it -v $PWD:/jazzer.js/fuzz oss-sydr-fuzz-fastxmlparser /bin/bash

Note here that unlike the rest of the projects, we mount our fuzzing working directory ($PWD) to /jazzer.js rather than the root directory. To run fuzzing with Jazzer.js, the fuzzer must be installed into one of the parent directories. In /jazzer.js directory, the fuzzer is installed, so the fuzzing and other commands will be run from /jazzer.js/fuzz.

Let's move on to the fuzz target:

const { FuzzedDataProvider } = require('@jazzer.js/core');
const XMLParser = require('./src/xmlparser/XMLParser');
const XMLBuilder = require('./src/xmlbuilder/json2xml');
const XMLValidator = require('./src/fxp').XMLValidator;

module.exports.fuzz = function(data) {
  try {
    const provider = new FuzzedDataProvider(data);
    const xmlString = provider.consumeString(1024);
    const parser = new XMLParser();
    let jObj = parser.parse(xmlString);
    let builder = new XMLBuilder();
    const xmlContent = builder.build(jObj);
    XMLValidator.validate(xmlContent, {
      allowBooleanAttributes: true,
    });
  } catch (error) {
    if (!ignoredError(error)) throw error;
  }
};

function ignoredError(error) {
  return !!ignored.find((message) => error.message.message.indexOf(message) !== -1);
}

const ignored = ['Cannot read properties', 'is not closed', 'Invalid Tag'];

In fuzz target file, we need to export the fuzz function: module.exports.fuzz = function(data) {...}, which takes a single parameter of type Buffer as input. Jazzer.js also provides the ability to use FuzzedDataProvider to translate the byte representation of input data into target language types: const { FuzzedDataProvider } = require('@jazzer.js/core');. Some exceptions thrown by fast-xml-parser target library may be intentionally ignored, in case they correspond to non-critical errors when using the library. Such exceptions are defined by ignoredError function in our example.

Building a project for fuzzing does not require any special actions except installing dependencies and cloning the project itself, which is already described in the corresponding Dockerfile. Now we can build our Docker-container and move on to fuzzing!

Fuzzing

Let's take a look at xml.toml configuration file:

exit-on-time = 3600

[jazzer_js]
path = "/jazzer.js/fast-xml-parser/fuzz.js"
args = "/corpus --sync -- -workers=2 -jobs=100 -dict=/xml.dict"

Here we can see the optional parameter exit-on-time, which specifies the time until fuzzing is completed if there is no new coverage. The table also shows the path to the fuzz target path and the args which contains arguments for Jazzer.js (placed before the -- separator) and libFuzzer (placed after the -- separator) fuzzing engines.

Particular attention should be paid to the --sync option, which in some cases can increase fuzzing performance. Jazzer.js supports fuzzing targets with asynchronous functions, but it requires additional synchronization between Node.js and the fuzzing thread, which reduces analysis speed. Despite this, the default fuzzing mode for asynchronous functions is used in Jazzer.js because it supports analysis of all fuzz targets - both those with synchronous computation and those with asynchronous computation. If your fuzz target consists solely of synchronous code, the speed of fuzzing can be improved by switching the fuzzing mode with --sync. You can read more about this in Jazzer.js documentation.

Let's start fuzzing with the following command:

# sydr-fuzz -c xml.toml run
[2024-01-25 18:35:48] [INFO] #186 INITED cov: 568 ft: 2817 corp: 180/7703b exec/s: 0 rss: 148Mb
[2024-01-25 18:35:48] [INFO] #199 NEW cov: 568 ft: 2818 corp: 181/7717b lim: 527 exec/s: 0 rss: 148Mb L: 14/527 MS: 3 InsertByte-ShuffleBytes-CopyPart-
[2024-01-25 18:35:48] [INFO] #207 NEW cov: 568 ft: 2819 corp: 182/7754b lim: 527 exec/s: 0 rss: 148Mb L: 37/527 MS: 3 CopyPart-ShuffleBytes-ChangeBinInt-.
[2024-01-25 18:35:48] [INFO] #209 NEW cov: 568 ft: 2822 corp: 183/7869b lim: 527 exec/s: 0 rss: 148Mb L: 115/527 MS: 2 CMP-InsertRepeatedBytes- DE: "\001\000\000\000\000\000\000\000\000\000\000\000\000"-
[2024-01-25 18:35:48] [INFO] #210 NEW cov: 568 ft: 2825 corp: 184/7969b lim: 527 exec/s: 0 rss: 148Mb L: 100/527 MS: 1 PersAutoDict- DE: "\001\000\000\000\000\000\000\000\000\000\000\000\000"-
[2024-01-25 18:35:48] [INFO] #223 NEW cov: 568 ft: 2826 corp: 185/7987b lim: 527 exec/s: 0 rss: 148Mb L: 18/527 MS: 3 ManualDict-InsertByte-EraseBytes- DE: "<![IGNORE["-
[2024-01-25 18:35:48] [INFO] #251 NEW cov: 568 ft: 2834 corp: 186/7991b lim: 527 exec/s: 0 rss: 148Mb L: 4/527 MS: 3 CopyPart-ChangeBit-InsertByte-.
[2024-01-25 18:35:48] [INFO] #258 NEW cov: 568 ft: 2843 corp: 187/8177b lim: 527 exec/s: 0 rss: 148Mb L: 186/527 MS: 2 ChangeByte-CrossOver-
[2024-01-25 18:35:48] [INFO] #269 NEW cov: 568 ft: 2862 corp: 188/8190b lim: 527 exec/s: 0 rss: 148Mb L: 13/527 MS: 1 ManualDict- DE: "#PCDATA"-
[2024-01-25 18:35:48] [INFO] #270 NEW cov: 568 ft: 2865 corp: 189/8244b lim: 527 exec/s: 0 rss: 148Mb L: 54/527 MS: 1 ShuffleBytes-
[2024-01-25 18:35:48] [INFO] ==1593== Uncaught Exception: Error: Invalid entity name g /jazzer.js/fuzz/xml-out/crashes/crash-f4be71632a88b6db3c44fd1070cfaa4ea9b8806b
[2024-01-25 18:36:29] [INFO] ==2948== Uncaught Exception: Error: Unclosed DOCTYPE /jazzer.js/fuzz/xml-out/crashes/crash-17e68caf797dea78d2c4ed08a96818e2fbb91db0
[2024-01-25 18:36:29] [INFO] ==2957== Uncaught Exception: Error: Unclosed DOCTYPE /jazzer.js/fuzz/xml-out/crashes/crash-a10dff7915f5d028effd802258cfb8938728408c
[2024-01-25 18:36:29] [INFO] ==2928== Uncaught Exception: Error:  js/fuzz/xml-out/crashes/crash-0c802f19059df64ed1efcf4a662397bbe08cc80e
[2024-01-25 18:36:29] [INFO] ==2901== Uncaught Exception: Error: Unclosed DOCTYPE /jazzer.js/fuzz/xml-out/crashes/crash-4a420f2d637d310cf7d461b658f45f376c481074
[2024-01-25 18:36:29] [INFO] ==2976== Uncaught Exception: Error: Unclosed DOCTYPE /jazzer.js/fuzz/xml-out/crashes/crash-2210bcf04f353535379be3953e84002c10f1341363
[2024-01-25 18:36:30] [INFO] [INFO] [RESULTS] Fuzzing corpus is saved in /jazzer.js/fuzz/xml-out/corpus
[2024-01-25 18:36:30] [INFO] [INFO] [RESULTS] oom/leak/timeout/crash: 0/0/0/100
[2024-01-25 18:36:30] [INFO] [INFO] [RESULTS] Fuzzing results are saved in /jazzer.js/fuzz/xml-out/crashes

After fuzzing is complete, we have 100 crashes found. Using Casr to sort them will be very useful. However, before doing so, let's minimize the resulting input corpus:

# sydr-fuzz -c xml.toml cmin
[2024-01-25 18:39:09] [INFO] Original fuzzing corpus saved as /jazzer.js/fuzz/xml-out/corpus-old
[2024-01-25 18:39:09] [INFO] Minimizing corpus /jazzer.js/fuzz/xml-out/corpus
[2024-01-25 18:39:09] [INFO] Jazzer.js environment: ASAN_OPTIONS=abort_on_error=1,allocator_may_return_null=1,detect_leaks=0,hard_rss_limit_mb=0,malloc_context_size=0
[2024-01-25 18:39:09] [INFO] Launching Jazzer.js: cd "/jazzer.js/fuzz/xml-out/jazzer.js" && ASAN_OPTIONS="abort_on_error=1,allocator_may_return_null=1,detect_leaks=0,hard_rss_limit_mb=0,malloc_context_size=0" "/jazzer.js/fast-xml-parser/fuzz. js" "--sync" "/jazzer.js/fuzz/xml-out/corpus" "/jazzer.js/fuzz/xml-out/corpus-old" "--" "-merge=1" "-rss_limit_mb=8192" "-detect_leaks=0" "-artifact_prefix=/jazzer. js/fuzz/xml-out/crashes/" "-use_value_profile=1" "-verbosity=2""
[2024-01-25 18:39:11] [INFO] MERGE-OUTER: 641 files, 0 in the initial corpus, 0 processed earlier
[2024-01-25 18:39:11] [INFO] MERGE-OUTER: attempt 1
[2024-01-25 18:39:11] [INFO] MERGE-OUTER: successful in 1 attempt(s)
[2024-01-25 18:39:11] [INFO] MERGE-OUTER: the control file has 89857 bytes
[2024-01-25 18:39:11] [INFO] MERGE-OUTER: consumed 0Mb (148Mb rss) to parse the control file
[2024-01-25 18:39:11] [INFO] MERGE-OUTER: 400 new files with 3908 new features added; 606 new coverage edges

We were able to significantly reduce the size of the input corpus: from 641 to 400 files. Good job! Now let's take a look at the coverage achieved.

Coverage

To collect coverage, we will use the interface provided by Jazzer.js fuzzer itself. To do this, let's use sydr-fuzz jscov command. We will get the coverage report in html format this way:

# sydr-fuzz -c xml.toml jscov html
[2024-01-25 18:41:53] [INFO] Running jscov html "/jazzer.js/fuzz/xml.toml"
[2024-01-25 18:41:53] [INFO] Collecting coverage data for each file in corpus: /jazzer.js/fuzz/xml-out/corpus
[2024-01-25 18:41:53] [INFO] Jazzer.js environment: ASAN_OPTIONS=abort_on_error=1,allocator_may_return_null=1,detect_leaks=0,hard_rss_limit_mb=0,malloc_context_size=0
[2024-01-25 18:41:53] [INFO] Launching Jazzer.js: cd "/jazzer.js/fuzz/xml-out/jazzer.js" && ASAN_OPTIONS="abort_on_error=1,allocator_may_return_null=1,detect_leaks=0,hard_rss_limit_mb=0,malloc_context_size=0" "/jazzer. js/fast-xml-parser/fuzz.js" "--sync" "--cov" "-m regression" "--cov_dir=/jazzer.js/fuzz/xml-out/coverage/html" "--cov_reporters=html" "/jazzer.js/fuzz/xml-out/corpus" "--" "-detect_leaks=0" "-artifact_prefix=/jazzer.js/fuzz/xml-out/crashes/" "-use_value_profile=1" "-verbosity=2" "-dict=/xml.dict" "-runs=0""
[2024-01-25 18:41:54] [INFO] html coverage report is saved to "/jazzer.js/fuzz/xml-out/coverage/html"

The source code coverage will look like this:

guide_cov

Crash Triage

At the end of our project analysis pipeline, let's apply Casr to analyze and triage crashes found:

# sydr-fuzz -c xml.toml casr

You can learn more about Casr from the Casr repository or from another guide.

Let's look at the output of the command:

[2024-01-25 18:57:03] [INFO] [CASR-LIBFUZZER] Deduplicating CASR reports...
[2024-01-25 18:57:03] [INFO] [CASR-LIBFUZZER] Number of reports before deduplication: 100. Number of reports after deduplication: 3
[2024-01-25 18:57:03] [INFO] [CASR-LIBFUZZER] Clustering CASR reports...
[2024-01-25 18:57:03] [INFO] [CASR-LIBFUZZER] Number of clusters: 3
[2024-01-25 18:57:03] [INFO] [CASR-LIBFUZZER] ==> <cl1>
[2024-01-25 18:57:03] [INFO] [CASR-LIBFUZZER] Crash: /jazzer.js/fuzz/xml-out/casr/cl1/crash-021a3f5bf89df1c650bbffc4d028737d8bfc0f4d
[2024-01-25 18:57:03] [INFO] [CASR-LIBFUZZER] casrep: NOT_EXPLOITABLE: Error: /jazzer.js/fast-xml-parser/src/xmlparser/DocTypeReader.js:56:19
[2024-01-25 18:57:03] [INFO] [CASR-LIBFUZZER] Similar crashes: 1
[2024-01-25 18:57:03] [INFO] [CASR-LIBFUZZER] Cluster summary -> Error: 1
[2024-01-25 18:57:03] [INFO] [CASR-LIBFUZZER] ==> <cl2>
[2024-01-25 18:57:03] [INFO] [CASR-LIBFUZZER] Crash: /jazzer.js/fuzz/xml-out/casr/cl2/crash-0c802f19059df64ed1efcf4a662397bbe08cc80e
[2024-01-25 18:57:03] [INFO] [CASR-LIBFUZZER] casrep: NOT_EXPLOITABLE: Error: /jazzer.js/fast-xml-parser/src/xmlparser/DocTypeReader.js:149:15
[2024-01-25 18:57:03] [INFO] [CASR-LIBFUZZER] Similar crashes: 1
[2024-01-25 18:57:03] [INFO] [CASR-LIBFUZZER] Cluster summary -> Error: 1
[2024-01-25 18:57:03] [INFO] [CASR-LIBFUZZER] ==> <cl3>
[2024-01-25 18:57:03] [INFO] [CASR-LIBFUZZER] Crash: /jazzer.js/fuzz/xml-out/casr/cl3/crash-45b9da6dab82407a6cf96eec2bb232386cd6360e
[2024-01-25 18:57:03] [INFO] [CASR-LIBFUZZER] casrep: NOT_EXPLOITABLE: Error: /jazzer.js/fast-xml-parser/src/xmlparser/DocTypeReader.js:82:46
[2024-01-25 18:57:03] [INFO] [CASR-LIBFUZZER] Similar crashes: 1
[2024-01-25 18:57:03] [INFO] [CASR-LIBFUZZER] Cluster summary -> Error: 1
[2024-01-25 18:57:03] [INFO] [CASR-LIBFUZZER] SUMMARY -> Error: 3
[2024-01-25 18:57:03] [INFO] Crashes and Casr reports are saved in /jazzer.js/fuzz/xml-out/casr

As a result of Casr work, only 3 crashes in different clusters remain. Now it is very easy to analyze them manually. Let's take a look at the report from the second cluster, for example:

guide_casr

We got an unhandled exception caused by an invalid entity name. This looks like a potential error!

Conclusion

This article has explored an approach to fuzzing JavaScript applications using Sydr-Fuzz interface. Running fuzzing, corpus minimization, coverage collection, crash analysis - all of these can be run quickly and conveniently using Sydr-Fuzz. Besides, Casr tool successfully helps to handle a large number of crashes and provides information about them in a convenient way.