Nested tokens #1268

JanBaklan · 2020-10-30T10:20:53Z

Hello! I have a little problem. And I cannot find any answer.

There is one of my tokens that can get a multiple values
const ATTRIBUTE = createToken( {name: 'Attribute', pattern: /hostname|os/, label: 'Attribute' } );

And when I try to parse something like this:
consumer(hostn
I expected to get only 2 tokens - consumer and ( but in reality I get 3 tokens: consumer , ( and os

I tried to change my ATTRIBUTE pattern to /hostname|(?<!.)os(?!.)/ but have an error Error: Unable to use "first char" lexer optimizations:

How can I get only 2 tokens from this string?

The text was updated successfully, but these errors were encountered:

bd82 · 2020-10-30T13:13:17Z

Hello @JanBaklan

I only partially understand the question.
I suspect it may be related to longer_alts

https://sap.github.io/chevrotain/docs/features/token_alternative_matches.html

But I am not sure, if the link above does not help, please create a small reproducible example I can execute to understand the issue.

JanBaklan · 2020-10-30T14:52:27Z

@bd82 Thanks for your answer!
I created repo with example. https://github.com/JanBaklan/chevrotain-test

bd82 · 2020-10-30T16:33:01Z

O.k. firstly please in the future provide simpler reproduction repo, e.g the React parts are irrelevant...
For example, this is the code I've used to reproduce the issue:

const { createToken, EmbeddedActionsParser, Lexer } = require( "chevrotain");


 const CONTEXT = createToken({ name: "Context", pattern: Lexer.NA, label: "Context" })

 const ATTRIBUTE = createToken({
  name:    "Attribute",
  pattern: /hostname|os|os_family/
})

const WhiteSpace = createToken({
  name:    "WhiteSpace",
  pattern: /\s+/,
  group:   Lexer.SKIPPED
})

 const Comma = createToken({
  name:    "Comma",
  pattern: /,/
})

 const LeftPar = createToken({
  name:    "LeftPar",
  pattern: /\(/
})

 const RightPar = createToken({
  name:    "RightPar",
  pattern: /\)/
})

// CONTEXT
 const Consumer = createToken({
  name:       "Consumer",
  pattern:    /consumer/,
  categories: CONTEXT
})

 const Remote = createToken({
  name:       "Remote",
  pattern:    /remote/,
  categories: CONTEXT
})

const allTokens = [
  WhiteSpace,

  Comma, LeftPar, RightPar,

  CONTEXT,
  Consumer, Remote,

  ATTRIBUTE
]

class AutocalcParser extends EmbeddedActionsParser {
  constructor() {
    super(allTokens)

    this.RULE("startRule", () => {
      this.SUBRULE(this.atomRule)
    })

    this.RULE("atomRule", () => {
      this.CONSUME(CONTEXT)
      this.CONSUME(LeftPar)
      this.CONSUME(ATTRIBUTE)
      this.CONSUME(RightPar)
    })

    this.performSelfAnalysis()
  }
}

const AutocalcLexer = new Lexer(allTokens);
const lexingResult = AutocalcLexer.tokenize("consumer(hostn")
console.log(lexingResult.tokens)

Your issue is that the lexer automatically attempts to perform error recovery by dropping characters until it can recognize a new
token.

https://github.com/SAP/chevrotain/blob/master/packages/chevrotain/src/scan/lexer_public.ts#L674-L686

This feature is unfortunately not currently configurable.

You seem to be implementing some kind of content assist logic. Which makes tends to make things more complex.

Possible Workaround

You may be able to workaround the issue by only sending a subset of the tokens to your auto-complete request,
e.g by inspecting the lexer errors and only providing tokens until the first error from the lexer.

Possible Workaround 2

You may be able to define an "UnknownKeyword" token /(a-zA-Z)+/ and put it last in the Lexer definition, so every erroneous token would be clearly identified making the production of the subset of relevant tokens for the content assist APIs easier.

General Notes

I have some thoughts on deprecating the content assist support in future releases as I believe a fully functional content assist implementation
should be implemented differently with more semantic knowledge, see: Evaluate Deprecating Syntactic Content Assist Support #1165
The above link also directed to a content assist framework implementation for XML, I suggest you inspect it.

Cheers.
Shahar.

bd82 added the Question label Oct 30, 2020

JanBaklan closed this as completed Nov 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nested tokens #1268

Nested tokens #1268

JanBaklan commented Oct 30, 2020

bd82 commented Oct 30, 2020 •

edited

Loading

JanBaklan commented Oct 30, 2020

bd82 commented Oct 30, 2020

Nested tokens #1268

Nested tokens #1268

Comments

JanBaklan commented Oct 30, 2020

bd82 commented Oct 30, 2020 • edited Loading

JanBaklan commented Oct 30, 2020

bd82 commented Oct 30, 2020

Possible Workaround

Possible Workaround 2

General Notes

bd82 commented Oct 30, 2020 •

edited

Loading