Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fixed issues to regex fuzzer #6013

Merged

Conversation

anthony-chang
Copy link
Contributor

@anthony-chang anthony-chang commented Jul 18, 2022

Closes #4603

This PR adds regex | choice operator, octal chars in character classes, and escaped characters to character class ranges to the fuzzer, and fixes some corner cases from these updates:

  1. When a hex or octal representation of a meta character is used, eg. \x24 for the character $, Java treats it like \$ but cuDF treats it as the end of line anchor character, so we need to escape these characters in the transpilation.
  2. There was a bug when parsing octal chars in character classes where we would sometimes read 1 character past the octal digit
  3. Zero-repetitions (ie, {0} and {0,0}) inside a capture group has some inconsistences so I have disabled them.

Signed-off-by: Anthony Chang <antchang@nvidia.com>
Signed-off-by: Anthony Chang <antchang@nvidia.com>
@anthony-chang anthony-chang self-assigned this Jul 18, 2022
@anthony-chang
Copy link
Contributor Author

build

@anthony-chang anthony-chang requested a review from NVnavkumar July 19, 2022 15:09
@sameerz sameerz added the task Work required that improves the product but is not user facing label Jul 20, 2022
@sameerz sameerz requested a review from andygrove July 27, 2022 06:03
@andygrove
Copy link
Contributor

LGTM. Left one code style suggestion.

andygrove
andygrove previously approved these changes Aug 2, 2022
Signed-off-by: Anthony Chang <antchang@nvidia.com>
@anthony-chang
Copy link
Contributor Author

build

@anthony-chang anthony-chang merged commit ac95bf6 into NVIDIA:branch-22.08 Aug 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Improve detection of regexp patterns that cuDF cannot compile
4 participants