[BUG] regexp: \d
, \w
inconsistencies with non-latin unicode input
#5530
Labels
bug
Something isn't working
cudf_dependency
An issue or PR with this label depends on a new feature in cudf
Milestone
Describe the bug
Updating our existing fuzz tests to use the full range of unicode characters for input data has exposed some issues with
\d
, and\w
. Everything seems to work fine so far for the upper case versions\D
, and\W
.Steps/Code to reproduce bug
See above.
Expected behavior
Behavior should be consistent between CPU and GPU or we should fall back to CPU.
Environment details (please complete the following information)
Failed in CI.
Additional context
None
The text was updated successfully, but these errors were encountered: