Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICU-22549 Add RuleBasedBreakIterator fuzzer #2709

Merged

Conversation

FrankYFTang
Copy link
Contributor

Checklist
  • Required: Issue filed: https://unicode-org.atlassian.net/browse/ICU-22549
  • Required: The PR title must be prefixed with a JIRA Issue number.
  • Required: The PR description must include the link to the Jira Issue, for example by completing the URL in the first checklist item
  • Required: Each commit message must be prefixed with a JIRA Issue number.
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable

Copy link
Contributor

@richgillam richgillam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like the bare minimum, but should be fine.


UParseError parse;
icu::LocalPointer<icu::RuleBasedBreakIterator> brk(
new icu::RuleBasedBreakIterator(fuzzstr, parse, status));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is almost always going to give you an error code saying the rule string is invalid. Maybe that's fine, but I wonder if we need a more devious test that takes advantage of knowing what the syntax of RBBI looks like.

Of course, that'd be a lot of work. I think this is probably fine for now, but we should think about smarter/more evil fuzz testing here for some time in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the purpuse for fuzzing is the data is power by AI and try to generate input that is "unexpected" by the programmer to make sure during that condition the code won't break but return error. The smart of the fuzzing is based on the fuzzer and I will leave that smart to the AI to do it. The AI might be able to see all the branching in our code and figure out how to generate better fuzzing data to attack those (if not, then it is not a good AI)

@FrankYFTang FrankYFTang merged commit 5d3e84a into unicode-org:main Nov 29, 2023
97 checks passed
@FrankYFTang FrankYFTang deleted the ICU-22549-BreakIteratorRule branch November 29, 2023 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants