-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stringr functions not working for strings with \r or \n #517
Comments
The regex dot does match a newline character by default, but see the 'dotall' flag/setting in the stringi paper https://www.jstatsoft.org/article/view/v103i02 |
But "@[^(@@@@)]*@" make it work back again. |
The meaning of Another good tutorial on regexes is https://www.regular-expressions.info/ |
But they are behaving differently. Could I ask you for a more specific explanation? Those links are too vague. They should return the same output, but they are not. Don't you agree with that? when using the base-R things work differently from stringr. In my mind the logic is: The regex pattern is the same, so it should return the same output. Where am I getting it wrong? |
If you pass For ICU regexes (used in stringi and hence stringer), see https://unicode-org.github.io/icu/userguide/strings/regexp.html For PCRE regexes (perl=TRUE) in base R, see https://www.pcre.org/current/doc/html/pcre2pattern.html For TRE regexes (perl=FALSE - default), see https://github.com/laurikari/tre/ -- but these are not particularly well-documented. I would rather say it is TRE that does things differently, not ICU/PCRE Even Python regexes (https://docs.python.org/3/howto/regex.html) have the DOTALL distinction. HTH |
Hum interesting, thank you for this. I'm closing the issue! |
Also see the |
Is it a bug?
PS: str_detect() has this behavior too.
Created on 2023-07-10 with reprex v2.0.2
The text was updated successfully, but these errors were encountered: