-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for quantifiers and the shuffle operator in regular expressions #109
Comments
@EduardoGoulart1 I don't recall a specific reason, but I am certainly open to any PRs you would be willing to submit! @eliotwrobson would quantifiers (like Eduardo is describing) fit within the theoretical regular expressions this library implements? I only ask because it seems like Perl/PCRE regex and the automata-based regex syntaxes are not the same, otherwise we would be using the stdlib's |
@caleb531 the quantifiers would be only syntactic sugar to avoid concatenating the same expression repeatedly. So The shuffle operator might require some rework on the parser. We would need to construct the NFAs and compute their products explicitly. |
So I think quantifiers do fit (as they correspond to just concatenating a certain number of copies of an NFA with itself). I actually thought about adding this in before, but it causes some issues with the way that the default alphabet is set for regexes. If no objections to changing this though, I think it's very doable. I also think that the shuffle operator is easy to do. I think this can actually be done with the same logic as the NFA |
@eliotwrobson but we still want the parser to be based on ASCII characters, right? I can look up for what other characters would better suit it |
@EduardoGoulart1 yes, sticking to ascii characters is best I think. Perhaps the With respect to implementation, the shuffle product is pretty easy to integrate with the existing parser. However, the quantifiers requires some changes to what gets passed in to factory functions for the |
@eliotwrobson ok thank you, I'll give it a try. Just to understand the design choice: You opted for writing the Lexer following the structure of the tdparser. Was that just to simplify the implementation? I see pyparsing is added a dependency but not used |
@EduardoGoulart1 This was mainly as a guide to myself while writing. The parser here is actually a port of one in another project, and there we aren't allowed to add external dependencies, so everything had to be written from scratch. I didn't realize that pyparsing was a dependency, that may have been used as part of the old parser. Let me know if you have any issues or want some input. EDIT: @caleb531 do you know why pyparsing is a dependency here? It doesn't look like it's getting used anywhere in the library right now. |
@eliotwrobson pyparsing is a dependency of pandoc, which I used to compile my Markdown README to RST for PyPI (see c669875). I removed pandoc a while back since PyPI now has native Markdown support, but I guess I forgot to remove pyparsing as well. @EduardoGoulart1 Looking forward to your PR! Please remember to add the appropriate unit tests and documentation (under |
@eliotwrobson @EduardoGoulart1 What's left to do before this issue can be closed? The regex shuffle operator is now in, don't quantifiers still need to be implemented? |
@caleb531 Yeah, still missing quantifiers. Discussed a bit above, but that requires a bit of a refactor to the regex engine. The token class now needs to accept a match object instead of just the text that was matched (so that items from specific match groups can be read). This refactor isn't that bad in terms of the code changes that need to be made, but there are a lot of tests that need to be modified. |
@eliotwrobson I'd like to take over that one if it's not too urgent to close the ticket. It would be a good opportunity to get into the code a bit more... |
@EduardoGoulart1 Nope, it's not urgent at all. Just wanted to confirm what was still outstanding. 🙂 |
@EduardoGoulart1 go for it! As I mentioned #115, this will be a good step towards that refactor (and I'm a bit occupied next week). Let me know if you want any input. |
Closed because turns out I'll be a bit busy the next days/weeks |
@EduardoGoulart1 @caleb531 I can actually pick this up. I was going to ask about the status since I want to finish out some work for my side-projects by the end of the year. Should be good to reopen. |
@eliotwrobson Very well! Issue reopened. 🙂 |
@caleb531 we close this as both were implemented, right? |
@EduardoGoulart1 Thank you for the nudge—yes, both quantifiers and the shuffle operator are merged, so we can close this issue now. cc @eliotwrobson |
Is there a specific reason why quantifiers for regular expressions are not supported? (so things like
a{2,10}
) If there is no specific reason, would you be open to a PR that implements it?Also, what do you think of adding a special regex shuffle operator? This would be very handy to express languages like "all permutations of a,b,c". My proposal would be to use the
~
character and implement a separate parser for thatThe text was updated successfully, but these errors were encountered: