Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implementing regex lookahead #95

Open
mgood7123 opened this issue Nov 26, 2018 · 4 comments
Open

implementing regex lookahead #95

mgood7123 opened this issue Nov 26, 2018 · 4 comments

Comments

@mgood7123
Copy link

mgood7123 commented Nov 26, 2018

to implement regex lookahead i need to

  1. obtain the text captured in mpc_parens and Regex

  2. look for the following at the start of the text:

    ?! (negate assertion, eg [^a])
    ?: (assertion, eg [a])

  3. save a copy of the current active mpc_input_t structure ( can be done via

	mpc_input_t ii = *i;
	// do stuff
	*i = ii; // original input state is restored

) as using mpc_input_rewind will segmentation fault or probably fail in the case of a multi-step parser due to one needing to execute mpc_input_rewind for each parser succession where the input is modified
4. if it is a lookahead shift the string by 2 otherwise leave as is
5. execute it as mpc_re_mode with the parser as Regex and mode as mode, retaining the original mode and regex parser, in order to work recursively
6. if lookahead, restore the current active mpc_input_t structure with the one that was saved in step 3
7. fail or succeed depending on the lookahead mode (?! or ?:)

the problem i am having is trying to obtain the text captured from the Regex parser needed in order to look for ?! or ?: at the start of the text inside the parenthesis

for example, given (?:abc(?!e)d)

`(?:abc(?!e)d)` > `?:abc(?!e)d` {advance 2} > `abc(?!e)d` {save state}
    > `abc` {check if match returns true}
        > if false
            > {restore last saved state}
            > lookahead fails
        > if true
            > `(?!e)` > `?!e` {advance 2} > `e` {save state}
                > `e` { check if match returns false}
            > {restore last saved state}
                > if true
                    > {restore last saved state}
                    > lookahead fails
                > if false
                    > `d` { check if match returns true}
                        > if true
                            > {restore last saved state}
                            > lookahead succeeds
                        > if false
                            > {restore last saved state}
                            > lookahead fails
@orangeduck
Copy link
Owner

I think it is going to be too difficult to add lookahead because I make lots of assumptions when parsing the regex that there will be no lookahead (for example I disable backtracking). If I was you, for this I would just go with a standard regex library such as pcre (https://www.pcre.org/).

@mgood7123
Copy link
Author

True but isnt that unportable? mpc is meant to be fully portable right?

@orangeduck
Copy link
Owner

Depends what platforms you are interested in. I imagine it will work okay on most platforms but you'd just have to test.

@skull-squadron
Copy link

skull-squadron commented Mar 31, 2019

Hyperscan and Re/flex (C++) are much faster than pcre1/2 without rarely-used features. pcre2-jit is quite fast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants