Add fuzzer for href setter and getter #657

CarlosEduR · 2024-05-15T21:37:55Z

In order to improve the line coverage for fuzz testing, add fuzzer for set_href and get_href

anonrig · 2024-05-15T22:31:57Z

cc @lemire i have bad news and good news

lemire · 2024-05-16T01:38:50Z

fuzz/href.cc

+#include <memory>
+#include <string>
+
+#include "ada.cpp"


Prior to this these includes, we could add....

#define ADA_DEVELOPMENT_CHECKS 1

This would add additional checks.

lemire · 2024-05-16T01:41:28Z

@anonrig

Indeed, the fuzzer found a bug, here it is...

#659

It is easy to narrow it down, just follow this recipe:

https://gist.github.com/lemire/f501b3b2bb8c33673de4f0a0674a6112

Takes a minute!!!

Fuzzers are great.

lemire · 2024-05-16T15:55:24Z

See #662

lemire · 2024-05-16T15:55:56Z

I expect that I have a fix for the bug in question at #659

lemire · 2024-05-16T19:46:05Z

@CarlosEduR Please rebase. I have merged a bug fix.

CarlosEduR · 2024-05-16T22:44:25Z

thanks @lemire!
I was initially thinking it was something with the .to_string(), but now I see it was with the .set_pathname().

fuzz/href.cc

CarlosEduR · 2024-05-17T03:23:00Z

See #662

@lemire about this comparative fuzzer, are you thinking about comparing the results of the get_href between the ada::url and the ada::url_aggregator in the fuzzer?

lemire · 2024-05-17T04:12:33Z

I am!!!!!!!!!

CarlosEduR · 2024-05-23T15:12:15Z

I am!!!!!!!!!

@lemire assertion added!

fuzz/parse.cc

…) is empty

anonrig · 2024-05-24T21:27:30Z

fuzz/parse.cc

@@ -17,12 +17,23 @@ extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
  auto parse_url = ada::parse<ada::url>(source);
  auto parse_url_aggregator = ada::parse<ada::url_aggregator>(source);

+  if ((parse_url && !parse_url_aggregator) || (!parse_url && parse_url_aggregator)) {


Using xor would make this a lot simpler:

parse_url ^ parse_url_aggregator

interesting... will update it

I updated a little bit more: parse_url.has_value() ^ parse_url_aggregator.has_value()
but seems it raised the abort function.

Can you log?

================== Job 0 exited with exit code 77 ============ /github/workspace/build-out/parse -dict=/github/workspace/build-out/url.dict -max_len=1024 -timeout=25 -rss_limit_mb=2560 -len_control=0 -seed=1337 -artifact_prefix=/tmp/tmplfz2au7i/ -max_total_time=300 -print_final_stats=1 /github/workspace/cifuzz-corpus/parse >fuzz-2.log 2>&1 Dictionary: 7 entries INFO: Running with entropic power schedule (0xFF, 100). INFO: Seed: 1337 INFO: Loaded 1 modules (12847 inline 8-bit counters): 12847 [0x5562cca46348, 0x5562cca49577), INFO: Loaded 1 PC tables (12847 PCs): 12847 [0x5562cca49578,0x5562cca7b868), INFO: 13333 files found in /github/workspace/cifuzz-corpus/parse INFO: seed corpus: files: 13333 min: 1b max: 4394568b total: 16829628b rss: 43Mb AddressSanitizer:DEADLYSIGNAL ================================================================= ==32==ERROR: AddressSanitizer: ABRT on unknown address 0x000000000020 (pc 0x7fd3f3cc[80](https://github.com/ada-url/ada/actions/runs/9230862431/job/25399824163#step:5:81)0b bp 0x7ffec9050880 sp 0x7ffec90504d0 T0) SCARINESS: 10 (signal) #0 0x7fd3f3cc800b in raise (/lib/x86_64-linux-gnu/libc.so.6+0x4300b) (BuildId: 87b331c034a6458c64ce09c03939e947212e18ce) #1 0x7fd3f3ca7858 in abort (/lib/x86_64-linux-gnu/libc.so.6+0x22858) (BuildId: 87b331c034a6458c64ce09c03939e947212e18ce) #2 0x5562cc938973 in LLVMFuzzerTestOneInput /src/ada-url/fuzz/parse.cc:21:5 #3 0x5562cc7468b0 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:614:13 #4 0x5562cc7460d5 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool, bool*) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:516:7 #5 0x5562cc748062 in fuzzer::Fuzzer::ReadAndExecuteSeedCorpora(std::__Fuzzer::vector<fuzzer::SizedFile, std::__Fuzzer::allocator<fuzzer::SizedFile>>&) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:[82](https://github.com/ada-url/ada/actions/runs/9230862431/job/25399824163#step:5:83)9:7 #6 0x5562cc74[83](https://github.com/ada-url/ada/actions/runs/9230862431/job/25399824163#step:5:84)97 in fuzzer::Fuzzer::Loop(std::__Fuzzer::vector<fuzzer::SizedFile, std::__Fuzzer::allocator<fuzzer::SizedFile>>&) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:867:3 #7 0x5562cc7369a6 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:914:6 #8 0x5562cc762ed2 in main /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10 #9 0x7fd3f3ca9082 in __libc_start_main (/lib/x[86](https://github.com/ada-url/ada/actions/runs/9230862431/job/25399824163#step:5:87)_64-linux-gnu/libc.so.6+0x24082) (BuildId: 87b331c034a6458c64ce09c03939e947212e18ce) #10 0x5562cc727b1d in _start (build-out/parse+0x93b1d) DEDUP_TOKEN: raise--abort--LLVMFuzzerTestOneInput AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: ABRT (/lib/x86_64-linux-gnu/libc.so.6+0x4300b) (BuildId: [87](https://github.com/ada-url/ada/actions/runs/9230862431/job/25399824163#step:5:88)b331c034a6458c64ce09c03939e947212e18ce) in raise ==32==ABORTING

The issue is relatively simple. Look at what you are feeding the parser:

"Ws:X\xff"

That's not a valid UTF-8 string. Our parsers assume valid UTF-8 inputs.

This breaks the contract: ada::parse<ada::url>(...) will accept it, while ada::parse<ada::url_aggregator>(...) will reject it.

It would be trivial to front-load the parsing with UTF-8 validation, but we specifically do not do it. So the result is effectively undefined when passing an invalid string (because we assume it does not happen).

You can check that the input is valid UTF-8 with a function like so:

bool validate(const char *buf, size_t len) { const uint8_t *data = reinterpret_cast<const uint8_t *>(buf); uint64_t pos = 0; uint32_t code_point = 0; while (pos < len) { uint64_t next_pos = pos + 16; if (next_pos <= len) { // if it is safe to read 16 more bytes, check that they are ascii uint64_t v1; std::memcpy(&v1, data + pos, sizeof(uint64_t)); uint64_t v2; std::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t)); uint64_t v{v1 | v2}; if ((v & 0x8080808080808080) == 0) { pos = next_pos; continue; } } unsigned char byte = data[pos]; while (byte < 0b10000000) { if (++pos == len) { return true; } byte = data[pos]; } if ((byte & 0b11100000) == 0b11000000) { next_pos = pos + 2; if (next_pos > len) { return false; } if ((data[pos + 1] & 0b11000000) != 0b10000000) { return false; } code_point = (byte & 0b00011111) << 6 | (data[pos + 1] & 0b00111111); if ((code_point < 0x80) || (0x7ff < code_point)) { return false; } } else if ((byte & 0b11110000) == 0b11100000) { next_pos = pos + 3; if (next_pos > len) { return false; } if ((data[pos + 1] & 0b11000000) != 0b10000000) { return false; } if ((data[pos + 2] & 0b11000000) != 0b10000000) { return false; } code_point = (byte & 0b00001111) << 12 | (data[pos + 1] & 0b00111111) << 6 | (data[pos + 2] & 0b00111111); if ((code_point < 0x800) || (0xffff < code_point) || (0xd7ff < code_point && code_point < 0xe000)) { return false; } } else if ((byte & 0b11111000) == 0b11110000) { // 0b11110000 next_pos = pos + 4; if (next_pos > len) { return false; } if ((data[pos + 1] & 0b11000000) != 0b10000000) { return false; } if ((data[pos + 2] & 0b11000000) != 0b10000000) { return false; } if ((data[pos + 3] & 0b11000000) != 0b10000000) { return false; } code_point = (byte & 0b00000111) << 18 | (data[pos + 1] & 0b00111111) << 12 | (data[pos + 2] & 0b00111111) << 6 | (data[pos + 3] & 0b00111111); if (code_point <= 0xffff || 0x10ffff < code_point) { return false; } } else { return false; } pos = next_pos; } return true; }

Basically, validating that we have UTF-8 is kind of a separate issue. So the fuzzer could just require the inputs to be valid UTF-8 by checking.

Nice, got it, thanks for the investigation and explanation @lemire.
Should I update the fuzzer by adding this function to validate the inputs and only compare the two variables once we have a valid UTF-8 string?

You might give it a try!!!

yeah, looks by adding the function to validate the string, the fuzzer is now passing. thanks @lemire @anonrig

CarlosEduR added 2 commits May 15, 2024 17:29

add fuzzer for href setter and getter

01ce91f

add fuzzer for href setter and getter

e35e46d

lemire mentioned this pull request May 16, 2024

path setter bug #659

Merged

lemire reviewed May 16, 2024

View reviewed changes

CarlosEduR added 2 commits May 16, 2024 19:30

add ADA_DEVELOPMENT_CHECKS

e5279ee

ifix conflict with main

db9fbe8

lemire approved these changes May 16, 2024

View reviewed changes

remove command to delete build folder before creating it, used locally

0b6ecb0

CarlosEduR mentioned this pull request May 16, 2024

Add fuzzer for href setter and getter #658

Closed

CarlosEduR marked this pull request as ready for review May 16, 2024 22:48

anonrig reviewed May 16, 2024

View reviewed changes

fuzz/href.cc Outdated Show resolved Hide resolved

CarlosEduR added 3 commits May 22, 2024 01:04

fix conflicts

c16e6b4

run linter

c7b9ca3

add set_href in fuzz and assertion

d122eea

anonrig approved these changes May 23, 2024

View reviewed changes

lemire reviewed May 23, 2024

View reviewed changes

fuzz/parse.cc Show resolved Hide resolved

add condition to abort if just one of the urls (url or url_aggregator…

b3de9ca

…) is empty

anonrig reviewed May 24, 2024

View reviewed changes

CarlosEduR added 3 commits May 24, 2024 18:35

improve condition by using bitwise xor operator

8a4dc32

fix fuzz parse

fc18b9b

print source before aborting

b6dd46f

add function to validate if string is a valid utf-8 string

315ea60

anonrig approved these changes May 27, 2024

View reviewed changes

anonrig merged commit a404fd8 into ada-url:main May 27, 2024
36 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fuzzer for href setter and getter #657

Add fuzzer for href setter and getter #657

CarlosEduR commented May 15, 2024 •

edited

Loading

anonrig commented May 15, 2024

lemire May 16, 2024

lemire commented May 16, 2024

lemire commented May 16, 2024

lemire commented May 16, 2024

lemire commented May 16, 2024

CarlosEduR commented May 16, 2024

CarlosEduR commented May 17, 2024

lemire commented May 17, 2024

CarlosEduR commented May 23, 2024

anonrig May 24, 2024

CarlosEduR May 24, 2024

CarlosEduR May 24, 2024

anonrig May 24, 2024

CarlosEduR May 24, 2024

lemire May 25, 2024

lemire May 25, 2024

CarlosEduR May 26, 2024

lemire May 26, 2024

CarlosEduR May 27, 2024

Add fuzzer for href setter and getter #657

Add fuzzer for href setter and getter #657

Conversation

CarlosEduR commented May 15, 2024 • edited Loading

anonrig commented May 15, 2024

Choose a reason for hiding this comment

lemire commented May 16, 2024

lemire commented May 16, 2024

lemire commented May 16, 2024

lemire commented May 16, 2024

CarlosEduR commented May 16, 2024

CarlosEduR commented May 17, 2024

lemire commented May 17, 2024

CarlosEduR commented May 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarlosEduR commented May 15, 2024 •

edited

Loading