Partial matches #102

mrabarnett · 2014-01-02T23:33:25Z

Original report by Geert Jansen (Bitbucket: geertj, GitHub: geertj).

Partial matches would be very useful.

A partial match is when a pattern did not match due to end of input, but could have matched if more input had been available. This is very useful e.g. when tokenizing input in a character by character way using regular expressions.

Boost has it here: http://www.boost.org/doc/libs/1_31_0/libs/regex/doc/partial_matches.html

Java as a hitEnd() method: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html#hitEnd%28%29

mrabarnett · 2014-01-03T12:22:36Z

Original comment by Anonymous.

Partial matching is something I've considered adding, but I think it'll be too difficult to retrofit it into the existing implementation.

mrabarnett · 2014-02-17T03:58:33Z

Original comment by Anonymous.

I recently tried to find this exact feature, and likewise ended up finding it in Boost and Java. I think it would be a very good feature in a language like Python, because it can e.g. be used to traverse a directory tree and match whole path names, or to validate user input, etc.

However in the process I came to think of a more "advanced" version of it. In Boost and Java it is an API feature, i.e. a flag you set and it is applied to the whole match.

Just in case it's worth considering, one could also view it as an alternative for ranges... consider:

\d{1,3} - matches up to three digits.

You could also use any regex instead of \d, but specify allowable amounts of characters to consume (with this or an alternate syntax), for example:

(abc|123){1,3}

This would allow the partial matched a, ab, 1, 12 and the full matches abc and 123. The information about how deep the match went could be stored in the group. It would be a unique feature of the library.

I don't know many use cases (except for "partial matching" the whole regex), but at least it would use a somewhat existing feature of the language, which simply amounts to counting characters - instead of a character class, you would be counting characters consuming by the regex.

Without looking at the code, it could also make the implementation easier.

mrabarnett · 2014-02-17T04:21:10Z

Original comment by Anonymous.

Probably it would also be the same as allowing up to N deletions at the end... so that would be another possibility.

mrabarnett · 2014-02-17T11:46:00Z

Original comment by Anonymous.

The syntax:

(abc|123){1,3}

already has a meaning; it's 1..3 repeats of (abc|123).

mrabarnett · 2014-04-10T12:54:00Z

Original comment by Anonymous.

Added in regex 2014.04.10.

mrabarnett closed this as completed Apr 10, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partial matches #102

Partial matches #102

mrabarnett commented Jan 2, 2014

mrabarnett commented Jan 3, 2014

mrabarnett commented Feb 17, 2014

mrabarnett commented Feb 17, 2014

mrabarnett commented Feb 17, 2014

mrabarnett commented Apr 10, 2014

Partial matches #102

Partial matches #102

Comments

mrabarnett commented Jan 2, 2014

mrabarnett commented Jan 3, 2014

mrabarnett commented Feb 17, 2014

mrabarnett commented Feb 17, 2014

mrabarnett commented Feb 17, 2014

mrabarnett commented Apr 10, 2014