Compiler has broadly 6 phases : Lexical Analyser, Syntax Analyser, Semantic Analyser, Intermediate Code Generator, Code Optimizer, Code Generator.
Lexical Analyzer : Responsible for generating a stream of tokens <name,value> issued for each lexeme ( chunk of text ). It is also responsible for constructing the symbol Table.
Tokens ( NFA ) : <name,value> ;
eg. <RELOP,EQ>, <num,80>,
For our purpose, we have considered 8 Lexical Categories. Actual Number of lexical categories, may vary and can be quite large.
List Order of Lexical Categories with the types of token's issued by them :
Here I have implemented NFA's using Switch-case statements. However, the NFA for Keywords is implemented using Trie data-structure.
Also I have implemenented a Symbol Table using Class and a Hash-Map. Symbol table stores the variables and issues token number for variables depending on the row number.
Also, for generating tokens and navigating through the source text file, we have two techniques : Serial and Parallel.
If for a lexemme, no token is issues a Lexical Error is declared !
- Relational Operators category : RELOP
- Variable
- Keywords
- Numbers
- Whitespaces
NFA which says yes first wins. ( NFA's are applied in decreasing order ).
Combine All NFA's in one common diagram.
In case of multiple Yes :
- Go for the longest lexeme in case of Multiple Yes
- Go for Order, if the length is also same
Parses the string to check if the rule of grammar were followed, and if followed - creates a parse tree for the input string.
Parsing : Constructing a tree for an input string using the CFG ( Context Free Grammar ) Rules. There are two types of parsing, Top-down parsing and bottom-up parsing.
To Compute FIRST(X) for all grammar symbols X, apply the following rules until no more terminals or ε symbols can be added to any FIRST set.
To Compute FOLLOW(A) for all nonterminals A, apply the following rules until nothing can be added to any FOLLOW set.
LL(1) is top-down parser, short for Left to Right Scanning, Left most derivation with one Look ahead.
It starts from the root or the Start Variable S and constructs the tree. Before applyinh make sure that there is no Left Recursion in the Grammar.
It Constructs a parsing table, and the performs a sequence of moves on the input string. Eg.
- Grammar
- Grammar After Removing Left-Recursion
Left recursion is such that A --> Aα
Replace the left recursive productions by non-left recusrsive Productions.
For our Grammar :
- Constructing Parsing Table from the Given Grammar
Then, we construct a Predictive Parsing Table :
- Sequence of Moves for input id + id * id
Moves made by the predictive parser on our input :
Outcome : ( $,$ ) : Yes or Accept ( No Syntax Error ).