Skip to content

GenericLexerTokenChannels

Olivier Duhart edited this page Jul 2, 2022 · 3 revisions

Generic Lexer Token Channels

By default generic lexer skip comment tokens and whitespaces tokens.

This allows to not cluttered the grammar with useless data.

But sometimes comments are meaningfull and we want to get them back.

To solve this CSLY borrow the ANTLR channel concept. tokens are never skipped, they simply are sent to different channels :

  • Main channel (0) : it is the default channel used by the parser to choose grammar rules
  • WhiteSpaces channels (1): this is the channel containing whitespaces tokens
  • Comments channel (2) : the channel containing all comments

Channels are defined with an optional parameter on the [Lexeme] attribute (and all Short code lexeme attributes)

Channels are accessed through Token in production rule visitors. The Token class defines methids to access tokens or other channel :

  • Token Next(int channel) : token on channel channelthat immediatly follows the token.
  • Token Next(int channel) : token on channel channelthat immediatly precedes the token.
  • List<Token> NextTokens(int chnnel) : this list of tokens on channel channelthat immediatly follow the token.
  • List<Token> NextTokens(int chnnel) : this list of tokens on channel channelthat immediatly precede the token.

let's define a language that's only consist of a list of IDs that could be commented. Our parser goal is to return the list of ids with a comment metadata if a comment precede the id.

The lexer will only have 2 lexems : an id and a comment.

public enum Lex {

        [SingleLineComment("//")]
        SINGLELINECOMMENT,

        [Lexeme(GenericToken.Identifier, IdentifierType.AlphaNumeric)]
        ID
}

the grammar is quite simple as it's a mere list of ids

  program : ID*

Our AST is really simple :

    public class CommentedId {
        public string Id {get; set;}
        public string Comment {get; set;}
        bool IsCommented => !string.IsNullOrEmpty(Comment);
    }

    public class CommentedProgram {
        public List<CommentedId> Ids {get; set;}
    }

And so the parser could be defined as :

public class Parse {

    [Production("program : ID*")]
    public CommentedProgram Program(List<Token<Lex>> ids)
    {

        var commentedIds = ids.Select(token =>
        {
            // get the preceding token for each id
            var previous = token.Previous(Channels.Comments);

            string comment = null;
            // previous token may not be a comment so we have to check 
            if (previous != null && (previous.TokenID == Lex.SINGLELINECOMMENT))
            {                
                comment = previous?.Value;
            }
            return new CommentedId()
            {
                Id = token.Value,
                Comment = comment
            }
        }).ToList();

        return new CommentedProgram() { Ids = commentedIds };
    }
}