Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: (?(DEFINE)...) #152

Closed
mrabarnett opened this issue Sep 9, 2015 · 13 comments
Closed

Request: (?(DEFINE)...) #152

mrabarnett opened this issue Sep 9, 2015 · 13 comments
Labels
enhancement New feature or request minor

Comments

@mrabarnett
Copy link
Owner

Original report by boolbag NA (Bitbucket: boolbag, GitHub: boolbag).


Hi again Matthew,

This is the second in a series of posts to present a case for three features.
In this post, I'll focus on (?(DEFINE)...)

When crafting long expressions with repeated components, I find the (?(DEFINE)...) syntax immensely valuable. It is the key to writing modular regex. Some time ago I presented an example to show the value of such a modular regex here.

(?(DEFINE)...) allows you to drop short names in the pattern. These names expand to large sub-expressions. When sub-expressions are repeated in multiple places in the pattern, this lets you keep your sanity, because you don't have to change the pattern in multiple places.

I came up with a workaround that I explained here some time ago. Nevertheless, for compatibility with PCRE and Perl when translating large expressions, it would be wonderful to have the same (?(DEFINE)...) syntax in regex.

Thanks in advance for considering it.

@mrabarnett
Copy link
Owner Author

Original comment by Matthew Barnett (Bitbucket: mrabarnett, GitHub: mrabarnett).


In your webpages you give a workaround for implementations that don't support it:

#!python

(?:(?<foo> … )(?!))?

There's a shorter alternative:

#!python

(?=|(?<foo> … ))

@mrabarnett
Copy link
Owner Author

Original comment by Matthew Barnett (Bitbucket: mrabarnett, GitHub: mrabarnett).


Added in regex 2015.09.14.

@mrabarnett
Copy link
Owner Author

Original comment by boolbag NA (Bitbucket: boolbag, GitHub: boolbag).


Wow, that is awesome, thank you so much!!! And thank you also for the "shorter workaround tip" two posts up.

I've already "advertised" the new features on the StackOverflow regex chat room, where many regex heads hang out. :)

Will also go update my pages to mention that this is now supported. I've been meaning for some time to do a real tut on your engine (at the time I first mentioned it, I wasn't doing much Python.) Need to make the time for this.

Here is a short example for anyone who would like to see the feature at work:

#!python

import regex as mrab
define_showcase = mrab.compile(r'''(?x)
                                (?(DEFINE)
                                   (?<price>
                                   \$\d+\.\d{2}\b
                                   ) # end price def
                                )

                                (?&price)[-+](?&price)=(?&price)
                                ''')
print(define_showcase.search('total: $1.99+$2.99=$4.98'))

// Output: <regex.Match object; span=(7, 24), match='$1.99+$2.99=$4.98'>

@asarkar
Copy link

asarkar commented Jun 15, 2023

I can't get the above example to work with Python 3.11 and latest regex package.

p = r'''
    (?(DEFINE)
       (?<price>\$\d+\.\d{2}\b)
    )

    (?&price)[-+](?&price)=(?&price)
'''

m = regex.match(p, 'total: $1.99+$2.99=$4.98', flags=regex.VERBOSE)
print(m)

None

@5j9
Copy link

5j9 commented Jun 15, 2023

@asarkar: Try regex.search instead of regex.match. Match only matches at the beginning of the string. (total: is not in the pattern)

@asarkar
Copy link

asarkar commented Jun 15, 2023

@5j9 search matches but m.group('price') is None.

<regex.Match object; span=(7, 24), match='$1.99+$2.99=$4.98'>

@asarkar
Copy link

asarkar commented Jun 15, 2023

It seems this issue was brought up before in #250 and #452. Both tickets are closed, and the documentation doesn’t show how to extract the named groups, so, it’s not clear to me either.

@facelessuser
Copy link

I'm not entirely sure why group('price') doesn't show the last capture, but captures('price') I will show all the captures. I'm not sure if group() returning None in this case is intentional or not.

@mrabarnett
Copy link
Owner Author

Calling a subroutine is not the same as capturing, even though a subroutine is defined as a group. Other regex implementations that support subroutines behave the same, except in Ruby, I've read. If you want to capture, define a separate capture group.

.captures(...) should be empty. That's a bug.

@mrabarnett
Copy link
Owner Author

It appears that in #296 I decided that .captures(...) not being empty was unexpected, but not a bug, because some users found it useful.

Should it change?

@facelessuser
Copy link

🤷🏻 I'll say I found it confusing is all. I assumed if captures() was capturing it, then they were recognized as capture groups and would be found in group() (which they are not). Does that mean you can replace with them like groups via sub()? I honestly haven't played with these subroutines much, so I'm not sure I have specific opinions.

@asarkar
Copy link

asarkar commented Jun 15, 2023

captures(...) not being empty was unexpected

Based on the comments in the tickets I've linked to earlier, it seems most users expect groups to be not None. IMO, If captures is not None, groups shouldn't be either. Currently, the behavior is contradictory. If both are None, then the documentation should show how to extract the named groups, which is what we are after.

@facelessuser
Copy link

I guess if it were me, I'd either commit to them being capture groups or commit to them not being capture groups. I think either way seems reasonable to me, but half capture, but not really seems odd and unintuitive. At least that is my personal feelings. If the idea is to model the majority of implementations, then not being a capture group makes the most sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request minor
Projects
None yet
Development

No branches or pull requests

4 participants