-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Rewrite some regular expressions RLIKE
cases to faster expressions
#10741
Comments
Actually I would rather see us write a custom kernel for |
Yes, this kernel can be more general so these two cases can be combined. Thanks for catching this! |
I still don't see a lot of generality for |
Is your feature request related to a problem? Please describe.
RLIKE
,REGEXP
orREGEXP_LIKE
are widely used but very expensive, and many use cases are quite simple and similar, like^abc(.*)
,(.*)abc$
,pattern
, and^pattern$
.These pattern cases can be replaced with faster expressions like
GpuStartsWith
,GpuEndsWith
orGpuContains
when overriding.Some commonly used patterns are even worth to writing a custom kernel to match, such as
pattern[0-9]{3,4}
(some digits followed by a string) and[\u4e00-\u9fa5]+
(any Chinese character).Describe the solution you'd like
We have a regex parser in plugin code here to translate a regex pattern to cudf supported style and check fallback. We can reuse it if possible to match if it is a simple pattern that can be replaced, and replace that case to the faster expressions.
Here is a list of planned/possible tasks:
pattern[A-B]{X,Y}
(a pattern string followed by X to Y chars in range A - B) inRLIKE
to a custom kernel #10821pattern1|pattern2|pattern3
to multiple contains inrlike
#10976^
prefix[a-zA-Z0-9]
#11037The text was updated successfully, but these errors were encountered: