Skip to content

lexer implementation notes

SinghCoder edited this page Feb 16, 2020 · 1 revision

Maintaining efficiency

  • Reduce # of I/O operations
    • Don't read char by char
    • Read block by block from disk
      • To test the performance, create test cases with duplicate code right now
  • Twin buffer
  • Avoid modularity at very basic level
    • Don't use isalpha, isdigit type functions
    • maybe use inline functions or macros instead

Return value of lexer

  • a token

    • name of token
      • accept state kaunsi thi
    • lexeme recorded
      • prefer char [] instead of char * // avoid pointers as much as possible
      • (char *)(begin...forward_ptr-1)
      • then change begin pointer to forward_ptr
    • line number
      • unsigned int
    • value
      •   union{
              int
              float
          }
        
    • tag for value
  • Right now, print lexer output as

    • token_name | value | line_num \n
    • printing token_name requires mapping table (maps token {enum value} to corresponding string)
  • Parser retreives tokens one by one.

    • get_next_token()
    • lexer stores one single record for a token.
    • Lexing and parsing will go hand in hand
    • On parser demand, lexer will create token and return to parser
  • Implement hash table for keyword lookup - should be collison resistant.

Clone this wiki locally