-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does not support (?!...) negative lookahead assertion? #127
Comments
?!...
negative lookahead assertion?
Those aren't supported. From the documentation:
The reason why your code panicked is that the regular expression is invalid and your code called |
So when I need to use those unsupported syntax, is there anything I can turn to? |
I don't know because I don't know what problem you're trying to solve. Any of the following things might work depending on your situation:
|
Does implementing lookahead / lookbehind worsen performance for expressions that don't use them? |
No. |
Hmm... Maybe it'd make sense to implement them, then? To achieve compatibility with other regex engines and with people's expectations. |
There is a reasonable case that someone could make for that, sure. But it's mostly theoretical. You also need to find someone willing to do it. (And by that, I mean, "someone willing to write the code, maintain it and do the necessary API design work." It is significant.) You also need to provide a compelling argument that the added complexity is worth it. For example, "If I want to run a regular expression supplied by an end user and I want to be sure that it completes in a reasonable amount of time, how can I do that?" The answer today is, "that's already true." I probably won't comment much more on this. Plenty of other people have made arguments for and against these features on the Internet. Both sides are reasonable. |
Ah, right, regexes supplied by users. You're right, hat's a consideration. One possibility is to provide an option for turning off features that may increase run time significantly, or a feature to detect whether they're present beforehand. As for "maintain it"... there's already someone who maintains the regex package, I hope. And as for API design work, that's not really significant at all, is it? Sure, deciding whether and if so how to handle shutting off the "slow" features might take a bit of consideration, but beyond that there's no extra API design to do at all. The main (only?) real point is that you have to find someone to write the code. Which of course isn't trivial at all, but I don't think entirely foregoing very common features (I'd even dare to say they're expected) for performance is the right choice. |
Yes. That's me. :-)
I included a lot more than API design work in my previous comment.
I disagree. So do a lot of people. Predictable performance is important. I don't think we're going to get very far with this. Here are the facts on the ground:
|
@BurntSushi Complete hypothetical here - if the maintainers of fancy_regex (repo here) were to help & support, would it maybe be suitable to merge into this project? I am not associated with that repo in any way, but it is actively maintained since 2016 and is more or less a drop-in for this A builder option Just a thought, curious to see what you think. |
Nope. Just use fancy_regex. |
In some cases using this could work:
|
Thanks for the tip. Usually you can refactor to kind of fake the positive lookarounds just using capture groups, but the negative ones are more difficult. For example:
There just really isn't a solid concept of "not" outside of full PCRE, so these negatives are limitations without good refactoring, where you kind of have to know all possibilities you might get. Consumption is the other issue - i.e., the first |
I have trouble converting a regex that uses negative lookahead to rust's regex. this is my use case if anyone is interested: Is there a way to convert this? I don't know how to approach this problem. |
Ideally the filter system would let you specify another regex to whitelist your gif URLs. It will otherwise be difficult to convert small convenient look-arounds. And it's good that Discord doesn't support look-arounds, because then otherwise they might not exist the filter system at all, for fear of easy ReDoS. |
@BurntSushi Unfortunately, it doesn't have a whitelist, hence having this problem in the first place. I had no luck finding an alternative in one of Discord's official servers (Discord Admin Community) other than using a third-party bot (We're using Dyno) but we want to migrate everything to Discord's AutoMod solution. |
You might have more luck at a general help forum. I'm generally the only one answering questions here, and I don't really have time to convert look-arounds to non-look-arounds. Maybe reddit.com/r/regex? But if you're right, then Discord's regex filtering sounds pretty limited. You might not be able to make it far if you require more sophisticated filtering. On top of that, Discord likely has limits on the size of regexes they allow, so you can't just go as big as you want. |
And in the future, it would be helpful if you post a Discussion question instead of bumping an old issue that is only tangentially related to the problem you're trying to solve. |
Thanks , it works. |
This comment was marked as abuse.
This comment was marked as abuse.
I think this issue has run its course. I'm locking it. If you have questions about the regex crate, please open a new Discussion question. If you need help converting a regex that uses look-around to one that doesn't (which may not be possible or may require more than one regex), then I think a general help forum would be more appropriate. Failing that, you'll want to use either |
Python
re
module supports(?!...)
syntax, see https://docs.python.org/2/library/re.html#regular-expression-syntaxThe code below compiles but paniced at runtime:
The pattern works fine in Python:
The text was updated successfully, but these errors were encountered: