MismatchedToken error: `Token(EOF)` is expected, despite requesting the found token of `LiteralToken(' ')` #10

aSemy · 2023-06-16T17:05:27Z

I would like to parse the following (which is a simplified example of the full text I would like to parse).

Version: 1.2.3
  Features:
  Fixes:

Following Version is a list of category strings. Each category must prefixed with two spaces and suffixed with a colon :.

I'd like to parse it into this class:

data class Demo(
  val version: String,
  val categories: List<String>
)

I have written a parser (see full code below) that takes the leading whitespace into account.

  /** leading category-name whitespace, to be ignored */
  private val categoryNameIndent by literalToken("  ")
  private val categoryNameSuffix by literalToken(":")
  private val categoryName by -categoryNameIndent * text * -categoryNameSuffix

However, I get an error

MismatchedToken(expected=Token(EOF), found=TokenMatch(token=LiteralToken('  '), offset=15, length=2))

This error is very confusing because it seems to have swapped around the expected/found. I didn't expect EOF, while I did expect LiteralToken(' '). And even then, why did the parser not find the literal token? It's hard to figure out, even when debugging, so help would be appreciated.

fun main() {
  val demo = DemoGrammar.parseEntire(
    /* language=text */ """
Version: 1.2.3
  Features:
  Fixes:
""".trimIndent()
  )

  println("parsed demo: $demo")
}


object DemoGrammar : Grammar<Demo>(debugMode = true) {
  private val newline by regexToken("""\n|\r\n|\r""")
  private val text by regexToken(""".+""")

  private val versionTag by literalToken("Version: ")
  private val version by -versionTag * text

  private val categoryNameIndent by literalToken("  ")
  private val categoryNameSuffix by literalToken(":")
  private val categoryName by -categoryNameIndent * text * -categoryNameSuffix

  private val categorySection: Parser<String> by parser {
    println("parsing CategorySection")
    val name = categoryName().text
    println("  name: $name")
    println(newline())
    name
  }

  override val root: Parser<Demo> by parser {
    val version = version().text
    println("  version:$version")
    val categories = repeatZeroOrMore(categorySection)
    repeatZeroOrMore(newline)
    Demo(
      version = version,
      categories = categories,
    )
  }
}

Output:

  version:1.2.3
parsing CategorySection
parsed demo: MismatchedToken(expected=Token(EOF), found=TokenMatch(token=LiteralToken('  '), offset=15, length=2))

The text was updated successfully, but these errors were encountered:

alllex · 2023-08-21T22:00:23Z

Sorry it took so long to respond.

This error message can indeed be cryptic, but in general, it would not be possible to improve it significantly. The "mismatched token" is a last resort kind of error, when the parser ran out of alternatives. Still, something could be improved, and I'll consider trying something out in this area.

Regarding your grammar, I think the main issue is the moment you expect a newline. In the grammar you provided, you try parsing things in the following order:

version
(then you are not trying to parse a newline)
zero or more of
- category name (that starts with indent)

The grammar misbehaves on your input, because there is a newline immediately following the version. So, you either need to add parsing of a newline inside the version parser, which I don't recommend. Or, you can say that each category must start with at least one newline, which I would recommend.

  private val categorySection: Parser<String> by parser {
    println("parsing CategorySection")
    repeatOneOrMore(newline) // <---------------- parsing at least one newline
    val name = categoryName().text
    println("  name: $name")
    name
  }

There is another problem with your grammar that is not obvious because of a bug in Parsus.

You declare a wildcard regex .+ before you declare literal tokens, such as categoryNameIndent or categoryNameSuffix. Parsus is sensitive to the declaration order of tokens. It would try to parse the tokens in this order, stopping as soon as it finds a token that matches. Due to a bug, the regex tokens lose their declaration order priority. This will be fixed in the next release.

What you need to do to protect from the bug: move the text literal declaration below all other token declarations. So, for your grammar, move it after categoryNameSuffix.

I hope that helps!

aSemy changed the title ~~MismatchedToken error: Token(EOF) is expected, despite requesting actual token of LiteralToken(' ')~~ MismatchedToken error: Token(EOF) is expected, despite requesting the found token of LiteralToken(' ') Jun 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MismatchedToken error: `Token(EOF)` is expected, despite requesting the found token of `LiteralToken(' ')` #10

MismatchedToken error: `Token(EOF)` is expected, despite requesting the found token of `LiteralToken(' ')` #10

aSemy commented Jun 16, 2023 •

edited

Loading

alllex commented Aug 21, 2023

MismatchedToken error: Token(EOF) is expected, despite requesting the found token of LiteralToken(' ') #10

MismatchedToken error: Token(EOF) is expected, despite requesting the found token of LiteralToken(' ') #10

Comments

aSemy commented Jun 16, 2023 • edited Loading

alllex commented Aug 21, 2023

MismatchedToken error: `Token(EOF)` is expected, despite requesting the found token of `LiteralToken(' ')` #10

MismatchedToken error: `Token(EOF)` is expected, despite requesting the found token of `LiteralToken(' ')` #10

aSemy commented Jun 16, 2023 •

edited

Loading