-
-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fuzzer for href setter and getter #657
Conversation
cc @lemire i have bad news and good news |
fuzz/href.cc
Outdated
#include <memory> | ||
#include <string> | ||
|
||
#include "ada.cpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prior to this these includes, we could add....
#define ADA_DEVELOPMENT_CHECKS 1
This would add additional checks.
Indeed, the fuzzer found a bug, here it is... It is easy to narrow it down, just follow this recipe: https://gist.github.com/lemire/f501b3b2bb8c33673de4f0a0674a6112 Takes a minute!!! Fuzzers are great. |
See #662 |
I expect that I have a fix for the bug in question at #659 |
@CarlosEduR Please rebase. I have merged a bug fix. |
thanks @lemire! |
I am!!!!!!!!! |
@lemire assertion added! |
fuzz/parse.cc
Outdated
@@ -17,12 +17,23 @@ extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) { | |||
auto parse_url = ada::parse<ada::url>(source); | |||
auto parse_url_aggregator = ada::parse<ada::url_aggregator>(source); | |||
|
|||
if ((parse_url && !parse_url_aggregator) || (!parse_url && parse_url_aggregator)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using xor would make this a lot simpler:
parse_url ^ parse_url_aggregator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting... will update it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated a little bit more: parse_url.has_value() ^ parse_url_aggregator.has_value()
but seems it raised the abort function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
================== Job 0 exited with exit code 77 ============
/github/workspace/build-out/parse -dict=/github/workspace/build-out/url.dict -max_len=1024 -timeout=25 -rss_limit_mb=2560 -len_control=0 -seed=1337 -artifact_prefix=/tmp/tmplfz2au7i/ -max_total_time=300 -print_final_stats=1 /github/workspace/cifuzz-corpus/parse >fuzz-2.log 2>&1
Dictionary: 7 entries
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 1337
INFO: Loaded 1 modules (12847 inline 8-bit counters): 12847 [0x5562cca46348, 0x5562cca49577),
INFO: Loaded 1 PC tables (12847 PCs): 12847 [0x5562cca49578,0x5562cca7b868),
INFO: 13333 files found in /github/workspace/cifuzz-corpus/parse
INFO: seed corpus: files: 13333 min: 1b max: 4394568b total: 16829628b rss: 43Mb
AddressSanitizer:DEADLYSIGNAL
=================================================================
==32==ERROR: AddressSanitizer: ABRT on unknown address 0x000000000020 (pc 0x7fd3f3cc[80](https://github.com/ada-url/ada/actions/runs/9230862431/job/25399824163#step:5:81)0b bp 0x7ffec9050880 sp 0x7ffec90504d0 T0)
SCARINESS: 10 (signal)
#0 0x7fd3f3cc800b in raise (/lib/x86_64-linux-gnu/libc.so.6+0x4300b) (BuildId: 87b331c034a6458c64ce09c03939e947212e18ce)
#1 0x7fd3f3ca7858 in abort (/lib/x86_64-linux-gnu/libc.so.6+0x22858) (BuildId: 87b331c034a6458c64ce09c03939e947212e18ce)
#2 0x5562cc938973 in LLVMFuzzerTestOneInput /src/ada-url/fuzz/parse.cc:21:5
#3 0x5562cc7468b0 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:614:13
#4 0x5562cc7460d5 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool, bool*) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:516:7
#5 0x5562cc748062 in fuzzer::Fuzzer::ReadAndExecuteSeedCorpora(std::__Fuzzer::vector<fuzzer::SizedFile, std::__Fuzzer::allocator<fuzzer::SizedFile>>&) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:[82](https://github.com/ada-url/ada/actions/runs/9230862431/job/25399824163#step:5:83)9:7
#6 0x5562cc74[83](https://github.com/ada-url/ada/actions/runs/9230862431/job/25399824163#step:5:84)97 in fuzzer::Fuzzer::Loop(std::__Fuzzer::vector<fuzzer::SizedFile, std::__Fuzzer::allocator<fuzzer::SizedFile>>&) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:867:3
#7 0x5562cc7369a6 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:914:6
#8 0x5562cc762ed2 in main /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
#9 0x7fd3f3ca9082 in __libc_start_main (/lib/x[86](https://github.com/ada-url/ada/actions/runs/9230862431/job/25399824163#step:5:87)_64-linux-gnu/libc.so.6+0x24082) (BuildId: 87b331c034a6458c64ce09c03939e947212e18ce)
#10 0x5562cc727b1d in _start (build-out/parse+0x93b1d)
DEDUP_TOKEN: raise--abort--LLVMFuzzerTestOneInput
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: ABRT (/lib/x86_64-linux-gnu/libc.so.6+0x4300b) (BuildId: [87](https://github.com/ada-url/ada/actions/runs/9230862431/job/25399824163#step:5:88)b331c034a6458c64ce09c03939e947212e18ce) in raise
==32==ABORTING
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is relatively simple. Look at what you are feeding the parser:
"Ws:X\xff"
That's not a valid UTF-8 string. Our parsers assume valid UTF-8 inputs.
This breaks the contract: ada::parse<ada::url>(...)
will accept it, while ada::parse<ada::url_aggregator>(...)
will reject it.
It would be trivial to front-load the parsing with UTF-8 validation, but we specifically do not do it. So the result is effectively undefined when passing an invalid string (because we assume it does not happen).
You can check that the input is valid UTF-8 with a function like so:
bool validate(const char *buf, size_t len) {
const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
uint64_t pos = 0;
uint32_t code_point = 0;
while (pos < len) {
uint64_t next_pos = pos + 16;
if (next_pos <= len) { // if it is safe to read 16 more bytes, check that they are ascii
uint64_t v1;
std::memcpy(&v1, data + pos, sizeof(uint64_t));
uint64_t v2;
std::memcpy(&v2, data + pos + sizeof(uint64_t), sizeof(uint64_t));
uint64_t v{v1 | v2};
if ((v & 0x8080808080808080) == 0) {
pos = next_pos;
continue;
}
}
unsigned char byte = data[pos];
while (byte < 0b10000000) {
if (++pos == len) { return true; }
byte = data[pos];
}
if ((byte & 0b11100000) == 0b11000000) {
next_pos = pos + 2;
if (next_pos > len) { return false; }
if ((data[pos + 1] & 0b11000000) != 0b10000000) { return false; }
code_point = (byte & 0b00011111) << 6 | (data[pos + 1] & 0b00111111);
if ((code_point < 0x80) || (0x7ff < code_point)) { return false; }
} else if ((byte & 0b11110000) == 0b11100000) {
next_pos = pos + 3;
if (next_pos > len) { return false; }
if ((data[pos + 1] & 0b11000000) != 0b10000000) { return false; }
if ((data[pos + 2] & 0b11000000) != 0b10000000) { return false; }
code_point = (byte & 0b00001111) << 12 |
(data[pos + 1] & 0b00111111) << 6 |
(data[pos + 2] & 0b00111111);
if ((code_point < 0x800) || (0xffff < code_point) ||
(0xd7ff < code_point && code_point < 0xe000)) {
return false;
}
} else if ((byte & 0b11111000) == 0b11110000) { // 0b11110000
next_pos = pos + 4;
if (next_pos > len) { return false; }
if ((data[pos + 1] & 0b11000000) != 0b10000000) { return false; }
if ((data[pos + 2] & 0b11000000) != 0b10000000) { return false; }
if ((data[pos + 3] & 0b11000000) != 0b10000000) { return false; }
code_point =
(byte & 0b00000111) << 18 | (data[pos + 1] & 0b00111111) << 12 |
(data[pos + 2] & 0b00111111) << 6 | (data[pos + 3] & 0b00111111);
if (code_point <= 0xffff || 0x10ffff < code_point) { return false; }
} else {
return false;
}
pos = next_pos;
}
return true;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, validating that we have UTF-8 is kind of a separate issue. So the fuzzer could just require the inputs to be valid UTF-8 by checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, got it, thanks for the investigation and explanation @lemire.
Should I update the fuzzer by adding this function to validate the inputs and only compare the two variables once we have a valid UTF-8 string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might give it a try!!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to improve the line coverage for fuzz testing, add fuzzer for
set_href
andget_href