Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICU-22549 Add RuleBasedBreakIterator fuzzer #2709

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion icu4c/source/test/fuzzer/Makefile.in
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ CPPFLAGS += -I$(srcdir) -I$(top_srcdir)/common -I$(top_srcdir)/i18n -I$(top_srcd
DEFS += -D'U_TOPSRCDIR="$(top_srcdir)/"' -D'U_TOPBUILDDIR="$(BUILDDIR)"'
LIBS = $(LIBCTESTFW) $(LIBICUTOOLUTIL) $(LIBICUIO) $(LIBICUI18N) $(LIBICUUC) $(DEFAULT_LIBS) $(LIB_M)

FUZZER_TARGETS = break_iterator_fuzzer calendar_fuzzer collator_compare_fuzzer collator_rulebased_fuzzer converter_fuzzer date_format_fuzzer list_format_fuzzer locale_fuzzer locale_morph_fuzzer number_format_fuzzer relative_date_time_formatter_fuzzer ucasemap_fuzzer uloc_canonicalize_fuzzer uloc_for_language_tag_fuzzer uloc_get_name_fuzzer uloc_is_right_to_left_fuzzer uloc_open_keywords_fuzzer unicode_string_codepage_create_fuzzer uregex_open_fuzzer
FUZZER_TARGETS = break_iterator_fuzzer calendar_fuzzer collator_compare_fuzzer collator_rulebased_fuzzer converter_fuzzer date_format_fuzzer list_format_fuzzer locale_fuzzer locale_morph_fuzzer number_format_fuzzer relative_date_time_formatter_fuzzer rule_based_break_iterator_fuzzer ucasemap_fuzzer uloc_canonicalize_fuzzer uloc_for_language_tag_fuzzer uloc_get_name_fuzzer uloc_is_right_to_left_fuzzer uloc_open_keywords_fuzzer unicode_string_codepage_create_fuzzer uregex_open_fuzzer

OBJECTS = $(FUZZER_TARGETS:%=%.o)
OBJECTS += fuzzer_driver.o locale_util.o
Expand Down
26 changes: 26 additions & 0 deletions icu4c/source/test/fuzzer/rule_based_break_iterator_fuzzer.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
// © 2023 and later: Unicode, Inc. and others.
// License & terms of use: http://www.unicode.org/copyright.html

#include <cstring>

#include "fuzzer_utils.h"
#include "unicode/localpointer.h"
#include "unicode/locid.h"
#include "unicode/rbbi.h"

IcuEnvironment* env = new IcuEnvironment();

extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
UErrorCode status = U_ZERO_ERROR;

size_t unistr_size = size/2;
std::unique_ptr<char16_t[]> fuzzbuff(new char16_t[unistr_size]);
std::memcpy(fuzzbuff.get(), data, unistr_size * 2);
icu::UnicodeString fuzzstr(false, fuzzbuff.get(), unistr_size);

UParseError parse;
icu::LocalPointer<icu::RuleBasedBreakIterator> brk(
new icu::RuleBasedBreakIterator(fuzzstr, parse, status));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is almost always going to give you an error code saying the rule string is invalid. Maybe that's fine, but I wonder if we need a more devious test that takes advantage of knowing what the syntax of RBBI looks like.

Of course, that'd be a lot of work. I think this is probably fine for now, but we should think about smarter/more evil fuzz testing here for some time in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the purpuse for fuzzing is the data is power by AI and try to generate input that is "unexpected" by the programmer to make sure during that condition the code won't break but return error. The smart of the fuzzing is based on the fuzzer and I will leave that smart to the AI to do it. The AI might be able to see all the branching in our code and figure out how to generate better fuzzing data to attack those (if not, then it is not a good AI)


return 0;
}
Loading