Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

createToken shortcut functions #1029

Closed
HoldYourWaffle opened this issue Sep 1, 2019 · 8 comments
Closed

createToken shortcut functions #1029

HoldYourWaffle opened this issue Sep 1, 2019 · 8 comments
Labels

Comments

@HoldYourWaffle
Copy link
Contributor

The current createToken function does exactly what it should do, but it's pretty clunky to use (especially when you have a lot of tokens).

I propose adding 2 extra helper functions to make creating your lexicon less painful:

createToken shorthand

Definition:

function createToken(name: string, pattern: RegExp, config?: Partial<ITokenConfig>) { ... }

As you can see it's just a simple shorthand for the 2 properties all tokens will need. Most tokens don't need any extra configuration, saving a lot of space and increasing readability. With this helper the full lexicon from the tutorial page can be shortened to this:

const Identifier = createToken("Identifier", /[a-zA-Z]\w*/ )
// We specify the "longer_alt" property to resolve keywords vs identifiers ambiguity.
// See: https://github.com/SAP/chevrotain/blob/master/examples/lexer/keywords_vs_identifiers/keywords_vs_identifiers.js
const Select = createToken("Select", /SELECT/, { longer_alt: Identifier })
const From = createToken("From", /FROM/, { longer_alt: Identifier })
const Where = createToken("Where", /WHERE/, { longer_alt: Identifier })

const Comma = createToken("Comma", /,/)
const Integer = createToken("Integer", /0|[1-9]\d*/ )
const GreaterThan = createToken("GreaterThan", />/ )
const LessThan = createToken("LessThan", /</ )

const WhiteSpace = createToken("WhiteSpace", /\s+/, { group: chevrotain.Lexer.SKIPPED })

I don't want to change the existing API, I just want to add an overload for convenience.

createKeyword

Almost every language has keywords, and all keywords have two things in common:

  1. The longer_alt option should be set to an Identifier-like token (if such a token exists)
  2. The name and pattern of the token are (almost always) the same

This repetition is ground for the other helper function I want to propose:

function createKeywordToken(keyword: string, identifierToken?: TokenType, caseSensitive = true) { ... }

The identifierToken parameter is needed to set the longer_alt option. It's optional because not every language might have such an Identifier-like token. There's also a caseSensitive option to toggle the i option for the generated regex pattern, SQL has caseinsensitive keywords for example.
The name property of the generated token would be the first-capitalized version of the keyword parameter, as per the 'standard'.

This extra helper allows the lexicon from the tutorial page to be shortened even further to just this:

const Identifier = createToken("Identifier", /[a-zA-Z]\w*/ )

const Select = createKeywordToken("select" Identifier, false)
const From = createKeywordToken("from", Identifier, false)
const Where = createKeywordToken("where", Identifier, false)

const Comma = createToken("Comma", /,/)
const Integer = createToken("Integer", /0|[1-9]\d*/ )
const GreaterThan = createToken("GreaterThan", />/ )
const LessThan = createToken("LessThan", /</ )

const WhiteSpace = createToken("WhiteSpace", /\s+/, { group: chevrotain.Lexer.SKIPPED })

I feel like there should be a way to set the Identifier-like token only once for all keywords, but I can't figure out how such a system would work. Perhaps someone with more JS experience can tune in on that.


The tutorial page is probably not the best example for the usefulness of these shorthands, the benefit grows massively the more tokens (especially keywords) you add (like when you want to put in a full SQL language).
I always define these shorthand functions myself, but I figured they could be useful for more people.

@HoldYourWaffle
Copy link
Contributor Author

Interesting.

So do you think it'd make sense to add some of these shortcuts to the main API?

@bd82
Copy link
Member

bd82 commented Sep 2, 2019

So do you think it'd make sense to add some of these shortcuts to the main API?

Not sure, I am not sure its a good idea to have multiple ways to do things, additionally it is more consistent to have most of the APIs using the JavaScript Config Pattern. and I don't think it is a huge advantage to save a few property names...

I think end users would benefit more from other upgrades to the tokenizer, for example:

@bd82
Copy link
Member

bd82 commented Sep 4, 2019

I will be closing this for now as I think it would be quite easy to any end user to implement this themselves if they want and it would also be confusing to provide multiple ways to accomplish the same thing so I would rather stick with the more verbose but also extensible and consistent config object style API.

Thanks for the suggestion and feedback, it is appreciated 👍

Also note that if you want a really concise API for token/lexer definitions it is always possible to build
this externally from Chevroain, e.g:

@HoldYourWaffle if you are interested in "good first issues" improvements in the API area
have a look at #1032, perhaps it would interest you? 😄

@bd82 bd82 closed this as completed Sep 4, 2019
@HoldYourWaffle
Copy link
Contributor Author

I agree that there isn't much value to adding this, I just figured "why not suggest it" 😄
Great to see you being open on this kind of stuff, makes contributing to this project a very pleasant experience.

Lexdoc looks really promising, maybe we could add a reference to it somewhere in the tutorial?
#1032 looks way out of my league unfortunately 😅

@bd82
Copy link
Member

bd82 commented Sep 6, 2019

#1032 looks way out of my league unfortunately 😅

It is just a similar tiny method that does return this.CST_STACK[this.CST_STACK.length - 1]
Most of the work is in APIs/docs/testing which is why it is a good first issue
as there is very little actual code / logic and it touches on full flow of implementing a new feature.

@HoldYourWaffle
Copy link
Contributor Author

Now that I've read the issue with a not-very-tired-head it indeed looks very simple. Unfortunately I don't have the time to help right now due to deadline, but if it's still open when I do have time I'll be sure to take a look 😄

@bd82
Copy link
Member

bd82 commented Sep 7, 2019

Now that I've read the issue with a not-very-tired-head it indeed looks very simple. Unfortunately I don't have the time to help right now due to deadline, but if it's still open when I do have time I'll be sure to take a look 😄

Great, take your time no rush 😄
I am intentionally leaving some simple non-urgent issues open to be used as "good first issues"...
To make it easier for new contributors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants