Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negative lookahead? #58

Open
acruise opened this issue Aug 19, 2022 · 9 comments
Open

Negative lookahead? #58

acruise opened this issue Aug 19, 2022 · 9 comments

Comments

@acruise
Copy link

acruise commented Aug 19, 2022

Hi there, I have a grammar with some negative lookahead rules. From what I can tell, some other PEG parsers have a ! operator that can help with this, but it appears this library doesn't. Is this something that would be easy to implement "in userspace"? I'm not a parsing expert. Thanks!

@acruise
Copy link
Author

acruise commented Aug 19, 2022

Maybe something like this?

operator fun <T> Parser<T>.not(): NotParser<T> {
    return NotParser(this)
}

class NotParser<T>(
    private val child: Parser<T>
) : Parser<T> {
    override fun tryParse(tokens: TokenMatchesSequence, fromPosition: Int): ParseResult<T> {
        return when (val res = child.tryParse(tokens, fromPosition)) {
            is Parsed<*> -> {
                val tokenMatch = tokens[fromPosition] ?: return UnexpectedEof(tokens.first().type)
                NegatedResult(tokenMatch)
            }
            else -> res
        }
    }
}

data class NegatedResult(val unexpected: TokenMatch) : ErrorResult()

@acruise
Copy link
Author

acruise commented Aug 22, 2022

I guess this won't work, I'd need some kind of "empty success" value on the positive branch 🤔

@h0tk3y
Copy link
Owner

h0tk3y commented Aug 22, 2022

I'd say you can use a Parser<Unit> and maybe wrap it into a SkipParser so that it's easier to use in and-chains. I tried with something like the code below, and it seems to work. This kind of a combinator looks like a good candidate for inclusion into the built-in combinators, feel free to submit a PR with a complete implementation, or let me do this at some point later. :)

class NegativeLookaheadCombinator(internal val lookaheadParser: Parser<*>) : Parser<Unit> {
    override fun tryParse(tokens: TokenMatchesSequence, fromPosition: Int): ParseResult<Unit> =
        when (lookaheadParser.tryParse(tokens, fromPosition)) {
            is Parsed -> LookaheadFoundInNegativeLookahead
            else -> ParsedValue(Unit, fromPosition)
        }
}

object LookaheadFoundInNegativeLookahead : ErrorResult()

fun not(parser: Parser<*>): SkipParser = 
    skip(NegativeLookaheadCombinator(parser))

class NegativeLookaheadCombinatorTest {
    @Test
    fun testNegativeLookahead() {
        val grammar = object : Grammar<Int>() {
            val digits by regexToken("[0-9]+")
            val dot by literalToken(".")
            val semi by literalToken(";")
            
            override val rootParser: Parser<Int>
                get() = (digits * not(dot * digits) * -semi).use { text.toInt() }
        }
        
        assertEquals(LookaheadFoundInNegativeLookahead, grammar.tryParseToEnd("1.0;"))
        assertEquals(ParsedValue(1, 2), grammar.tryParseToEnd("1;"))
    }
}

@acruise
Copy link
Author

acruise commented Aug 22, 2022

I'd say you can use a Parser<Unit> and wrap it into a SkipParser so that it's easier to use in and-chains.

Great, thanks! I'll give this a try and submit a PR if it looks good. :)

@acruise
Copy link
Author

acruise commented Aug 22, 2022

hmm, a few things:

  1. In order for my tests to pass I had to change else clause to fromPosition + 1 but that breaks your test 😿
  2. Is there some way to avoid discarding the non-A value that shows up in the not-A position? Maybe an additional special case rather than relying on SkipParser?
  3. Does it ever make sense to allow not() (or, for that matter, skip()) outside of an AND?

@h0tk3y
Copy link
Owner

h0tk3y commented Aug 22, 2022

In order for my tests to pass I had to change else clause to fromPosition + 1 but that breaks your test 😿

My idea was that this combinator should only test the specified parser for a match at the given position. It should not "advance" the position, as it only looks ahead. What does your test check?

Is there some way to avoid discarding the non-A value that shows up in the not-A position?

I'm not sure I'm getting it right. Should this non-A value rather get consumed and parsed by some of the following parsers that get invoked after the negative lookahead combinator returns the successful (Unit) result?

Does it ever make sense to allow not() (or, for that matter, skip()) outside of an AND?

skip(...), as well as the SkipParser, is designed for use only in the and-chains, and no other combinators accept them. I'm not sure about not, though. Maybe it makes some sense to use it outside and-chains, and if that's the case, not should not wrap the result into a SkipParser. We should judge by looking at the use cases, and unfortunately, I don't have any data about negative lookahead. We should gather some use cases to look at. If in doubt, we should go with the more flexible solution: make not return an ordinary Praser combinator, not a SkipParser.

@acruise
Copy link
Author

acruise commented Aug 22, 2022

What does your test check?

This is what I was going for:

val abAndNotDAndD by a and b and not(d) and d
    
val tokens = tokenizer.tokenize("abcd")

val res = abAndNotDAndD.parseToEnd(tokens)
assertSame(a, res.t1.type)
assertSame(b, res.t2.type)
assertSame(d, res.t3.type) // TODO can we avoid discarding the `c`?
//        assertSame(d, res.t4.type)

@acruise
Copy link
Author

acruise commented Aug 22, 2022

It looks like it's very uncommon for PEG parsers to return any value from a lookahead operator, so for now I'll get rid of my test case and maybe add positive lookahead while I'm at it. :)

@acruise
Copy link
Author

acruise commented Aug 22, 2022

Extremely WIP attempt in #59!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants