Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial matches #102

Closed
mrabarnett opened this issue Jan 2, 2014 · 5 comments
Closed

Partial matches #102

mrabarnett opened this issue Jan 2, 2014 · 5 comments
Labels
enhancement New feature or request minor

Comments

@mrabarnett
Copy link
Owner

Original report by Geert Jansen (Bitbucket: geertj, GitHub: geertj).


Partial matches would be very useful.

A partial match is when a pattern did not match due to end of input, but could have matched if more input had been available. This is very useful e.g. when tokenizing input in a character by character way using regular expressions.

Boost has it here: http://www.boost.org/doc/libs/1_31_0/libs/regex/doc/partial_matches.html

Java as a hitEnd() method: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html#hitEnd%28%29

@mrabarnett
Copy link
Owner Author

Original comment by Anonymous.


Partial matching is something I've considered adding, but I think it'll be too difficult to retrofit it into the existing implementation.

@mrabarnett
Copy link
Owner Author

Original comment by Anonymous.


I recently tried to find this exact feature, and likewise ended up finding it in Boost and Java. I think it would be a very good feature in a language like Python, because it can e.g. be used to traverse a directory tree and match whole path names, or to validate user input, etc.

However in the process I came to think of a more "advanced" version of it. In Boost and Java it is an API feature, i.e. a flag you set and it is applied to the whole match.

Just in case it's worth considering, one could also view it as an alternative for ranges... consider:

\d{1,3} - matches up to three digits.

You could also use any regex instead of \d, but specify allowable amounts of characters to consume (with this or an alternate syntax), for example:

(abc|123){1,3}

This would allow the partial matched a, ab, 1, 12 and the full matches abc and 123. The information about how deep the match went could be stored in the group. It would be a unique feature of the library.

I don't know many use cases (except for "partial matching" the whole regex), but at least it would use a somewhat existing feature of the language, which simply amounts to counting characters - instead of a character class, you would be counting characters consuming by the regex.

Without looking at the code, it could also make the implementation easier.

@mrabarnett
Copy link
Owner Author

Original comment by Anonymous.


Probably it would also be the same as allowing up to N deletions at the end... so that would be another possibility.

@mrabarnett
Copy link
Owner Author

Original comment by Anonymous.


The syntax:

(abc|123){1,3}

already has a meaning; it's 1..3 repeats of (abc|123).

@mrabarnett
Copy link
Owner Author

Original comment by Anonymous.


Added in regex 2014.04.10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request minor
Projects
None yet
Development

No branches or pull requests

1 participant