Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested tokens #1268

Closed
JanBaklan opened this issue Oct 30, 2020 · 3 comments
Closed

Nested tokens #1268

JanBaklan opened this issue Oct 30, 2020 · 3 comments

Comments

@JanBaklan
Copy link

Hello! I have a little problem. And I cannot find any answer.

There is one of my tokens that can get a multiple values
const ATTRIBUTE = createToken( {name: 'Attribute', pattern: /hostname|os/, label: 'Attribute' } );

And when I try to parse something like this:
consumer(hostn
I expected to get only 2 tokens - consumer and ( but in reality I get 3 tokens: consumer , ( and os

I tried to change my ATTRIBUTE pattern to /hostname|(?<!.)os(?!.)/ but have an error Error: Unable to use "first char" lexer optimizations:

How can I get only 2 tokens from this string?

@bd82
Copy link
Member

bd82 commented Oct 30, 2020

Hello @JanBaklan

I only partially understand the question.
I suspect it may be related to longer_alts

But I am not sure, if the link above does not help, please create a small reproducible example I can execute to understand the issue.

@bd82 bd82 added the Question label Oct 30, 2020
@JanBaklan
Copy link
Author

@bd82 Thanks for your answer!
I created repo with example. https://github.com/JanBaklan/chevrotain-test

@bd82
Copy link
Member

bd82 commented Oct 30, 2020

O.k. firstly please in the future provide simpler reproduction repo, e.g the React parts are irrelevant...
For example, this is the code I've used to reproduce the issue:

const { createToken, EmbeddedActionsParser, Lexer } = require( "chevrotain");


 const CONTEXT = createToken({ name: "Context", pattern: Lexer.NA, label: "Context" })

 const ATTRIBUTE = createToken({
  name:    "Attribute",
  pattern: /hostname|os|os_family/
})

const WhiteSpace = createToken({
  name:    "WhiteSpace",
  pattern: /\s+/,
  group:   Lexer.SKIPPED
})

 const Comma = createToken({
  name:    "Comma",
  pattern: /,/
})

 const LeftPar = createToken({
  name:    "LeftPar",
  pattern: /\(/
})

 const RightPar = createToken({
  name:    "RightPar",
  pattern: /\)/
})

// CONTEXT
 const Consumer = createToken({
  name:       "Consumer",
  pattern:    /consumer/,
  categories: CONTEXT
})

 const Remote = createToken({
  name:       "Remote",
  pattern:    /remote/,
  categories: CONTEXT
})

const allTokens = [
  WhiteSpace,

  Comma, LeftPar, RightPar,

  CONTEXT,
  Consumer, Remote,

  ATTRIBUTE
]

class AutocalcParser extends EmbeddedActionsParser {
  constructor() {
    super(allTokens)

    this.RULE("startRule", () => {
      this.SUBRULE(this.atomRule)
    })

    this.RULE("atomRule", () => {
      this.CONSUME(CONTEXT)
      this.CONSUME(LeftPar)
      this.CONSUME(ATTRIBUTE)
      this.CONSUME(RightPar)
    })

    this.performSelfAnalysis()
  }
}

const AutocalcLexer = new Lexer(allTokens);
const lexingResult = AutocalcLexer.tokenize("consumer(hostn")
console.log(lexingResult.tokens)

Your issue is that the lexer automatically attempts to perform error recovery by dropping characters until it can recognize a new
token.

This feature is unfortunately not currently configurable.

You seem to be implementing some kind of content assist logic. Which makes tends to make things more complex.

Possible Workaround

You may be able to workaround the issue by only sending a subset of the tokens to your auto-complete request,
e.g by inspecting the lexer errors and only providing tokens until the first error from the lexer.

Possible Workaround 2

You may be able to define an "UnknownKeyword" token /(a-zA-Z)+/ and put it last in the Lexer definition, so every erroneous token would be clearly identified making the production of the subset of relevant tokens for the content assist APIs easier.

General Notes

  • I have some thoughts on deprecating the content assist support in future releases as I believe a fully functional content assist implementation
    should be implemented differently with more semantic knowledge, see: Evaluate Deprecating Syntactic Content Assist Support #1165
  • The above link also directed to a content assist framework implementation for XML, I suggest you inspect it.

Cheers.
Shahar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants