-
Notifications
You must be signed in to change notification settings - Fork 446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support empty alternations #524
Comments
As a workaround, I believe |
I recently ran into this issue making a user agent parser. The nice thing about how I'm trying to go about it is that there's already this community that happens to maintain similar (identical in practice?) libraries that do the same thing and consume the same |
Making the supported syntax be identical to some other arbitrary set of regex engines is not a hard goal of this crate. The are very likely other (possibly very subtle) differences that are being missed too. You will almost certainly need to apply some kind of transform to any pre-existing set of regexes. In any case, this issue is really just about supporting empty alternations, and nothing more than that. This isn't going to happen until the compiler is reworked, as I said above. |
Here's another bug that appears to be fallout from failing to handle empty alternations correctly: fn main() {
let re = regex::Regex::new(r"^(?:(j(?:))|([?]))").unwrap();
let s = "john.doe@www.example.com:123/forum/questions/?tag=networking&order=new%20est#top";
assert!(re.is_match(s));
assert!(re.find(s).is_some());
} The NFA compiler in
I want to hopefully soon replace the matching engines in |
@BurntSushi IIUC FYI (because you asked for some numbers in case I ever had some) with the latest stable |
Sure. That's why I'm working on adding proper APIs for all of the regex engines inside this crate in The idea that I would force all users of this crate to start compiling all regexes to full DFAs ahead of time is a bit crazy. :-)
Could you please file an issue with a complete reproduction? Thanks. (The performance difference may be a wontfix bug, but I'm happy to investigate.) |
Could you elaborate a bit more on this? Why does Mercurial need empty alternation support? I've generally regarded this as a corner case that is "nice to have." But if this is a singular blocker for using this crate inside of Mercurial, then I could be convinced to just try and fix this so that you don't have to wait for the Great Regex Engine Refactor to be done. (Which could take months or even years, depending on how things go.) |
That is also what I thought, thanks for the confirmation!
Sure, I just need to make sure I make all tests pass with
It does not need it per se, but the current prefix used for relative glob patterns in
In the end it's something that people might use in their own |
It does to me as well. I'm not sure why an empty alternation is being used there. I wonder if any of the core Mercurial developers might know?
Aye, okay. For now, I'll plan on having this fixed (along with many other bugs) in the cut-over to
Perfect, thanks. |
Your questioning prompted me to elaborate a bit more on my future plans. Please see #656 for more details. :-) |
I've asked the other devs and only have received a "maybe it's faster?"... but it's not, at least not in Python's
Sounds good.
Glad to see it, thank you! |
For folks monitoring this issue, support for empty sub-expressions is now available in |
@BurntSushi This is the last nail in |
Oh please do! :) |
Today, if you try to compile a regex with an empty alternation, e.g.,
a||b
, then you'll get this error message:When I initially built the regex crate, I don't think I was clear on what an empty alternation meant, so I simply made them illegal. However, an empty alternation should have the same match semantics as an empty regex. That is,
a||b
should matcha
,b
or the empty string.When I rewrote the regex-syntax crate, I specifically made sure to support empty alternations, which I believe were forbidden in the previous version of regex-syntax. The intent was to propagate that through the regex compiler. However, when I did that, I discovered that it did not implement the correct match semantics. Fixing it did not seem easy, so I simply made the compiler return an error if it found an empty alternate:
regex/src/compile.rs
Lines 491 to 500 in 488fe56
Part of my plans for the future are to rethink a lot of the regex internals, and the compiler itself is at the top of that list. So I plan to tackle this problem when I rework the compiler.
The text was updated successfully, but these errors were encountered: