diff --git a/design.md b/design.md deleted file mode 100644 index 279cfbf..0000000 --- a/design.md +++ /dev/null @@ -1,158 +0,0 @@ -# Compiler Design - -## Frontend - -The compiler frontend is the initial stage of a compiler responsible for processing the source code input and transforming it into a form suitable for further analysis and processing. This document outlines the architecture and stages of the compiler frontend. - -### Overview - -At a high level, the compiler frontend consists of several interconnected stages that progressively refine and prepare the source code for deeper analysis by the compiler: - -1. **Preprocessing**: The initial step takes a stream of bytes as input, which could be a file on the filesystem, a buffer in a REPL, or any valid source code input. It runs a preprocessor that applies various transformations to the source code, resolving and expanding macros, handling conditional compilation, and emitting a modified stream of bytes. The output from preprocessing is a modified stream of bytes that is passed as input to the lexer. - -2. **Tokenization (Lexer)**: After preprocessing, the modified stream of bytes is passed to the lexer. The lexer tokenizes the input stream, converting it into a stream of tokens. A token typically consists of three parts: its kind (e.g., identifier, keyword, operator), its span (indicating its position in the source code), and the lexeme (the actual text of the token). - -3. **Parsing (Recursive Descent Parser)**: The token stream generated by the lexer is consumed by a handwritten recursive descent parser. This parser processes the tokens and constructs a parse tree, often referred to as a "concrete syntax tree" (CST). The CST is a one-to-one representation of the original source code and captures the syntax structure of the code. It is easily convertible to and from the original source code, aiding in error reporting and source-to-source transformations. The parser is designed to be error-resilient and includes the beginnings of error recovery mechanisms. - -4. **AST Transformation**: Once the CST is constructed, we proceed to transform it into a more minimal Abstract Syntax Tree (AST). This transformation involves pruning or simplifying many of the interior nodes found within the CST, resulting in a more concise representation of the code. The AST serves as the foundation for further analysis. - -- Example: The CST may contain a node for a function call, which includes the function name, the arguments, and the parentheses. The AST, on the other hand, may contain a node for a function call, which includes the function name and the arguments. The parentheses are not included in the AST because they are not semantically meaningful. - -5. **Typechecking and Name Resolution**: With the AST in hand, the frontend begins the process of typechecking and name resolution. This stage involves verifying the correctness of types and resolving variable and function names within the code. It lays the groundwork for semantic analysis and code optimization in the subsequent compiler stages. - -### Pipeline Visualization - -Here is a simplified visualization of the compiler frontend pipeline: - -``` -Source Code (Bytes) - | - v -Preprocessing - | - v -Tokenization (Lexer) - | - v -Parsing (Recursive Descent Parser) - | - v -Concrete Syntax Tree (CST) - | - v -AST Transformation - | - v -Abstract Syntax Tree (AST) - | - v -Typechecking and Name Resolution - | - v -Intermediate Representation (IR) -``` - -### Conclusion - -The compiler frontend is the critical initial stage of a compiler that takes raw source code and refines it into a structured form suitable for further analysis and processing. It encompasses preprocessing, tokenization, parsing, AST transformation, and initial semantic analysis. A well-designed frontend lays the foundation for the subsequent phases of compilation, including optimization and code generation. - -### Passes and Examples - -Here's a visualization of what each subsequent pass accomplishes with a simple example: - -#### Source Code (Before Preprocessing) - -```c -#define MAX 100 - -int main() { - int sum = 0; - for (int i = 1; i <= MAX; ++i) { - sum += i; - } - return sum; -} -``` - -#### Preprocessing - -```c -int main() { - int sum = 0; - for (int i = 1; i <= 100; ++i) { - sum += i; - } - return sum; -} -``` - -#### Tokenization (Lexer) - -``` -Token Stream: -- [Keyword: int] -- [Identifier: main] -- [(] -- [)] -- [{] -- [Keyword: int] -- [Identifier: sum] -- [=] -- [Number: 0] -- [;] -- [Keyword: for] -- [(] -- [Keyword: int] -- [Identifier: i] -- [=] -- [Number: 1] -- [;] -- [Identifier: i] -- [<=] -- [Number: 100] -- [;] -- [++] -- [Identifier: i] -- [)] -- [{] -- [Identifier: sum] -- [+=] -- [Identifier: i] -- [;] -- [}] -- [Keyword: return] -- [Identifier: sum] -- [;] -- [}] -``` - -#### Parsing (Recursive Descent Parser) TODO: Update this - -``` -Concrete Syntax Tree (CST): -- [Function: main] - - [Declaration: int sum = 0;] - - [For Loop] - - [Declaration: int i = 1;] - - [Condition: i <= 100] - - [Increment: ++i] - - [Block] - - [Expression: sum += i;] - - [Return: sum;] -``` - -#### AST Transformation TODO: Update this - -``` -Abstract Syntax Tree (AST): -- [Function: main] - - [Declaration: int sum = 0;] - - [For Loop] - - [Declaration: int i = 1;] - - [Condition: i <= 100] - - [Increment: ++i] - - [Assignment: sum += i;] - - [Return: sum;] -``` - -This illustrates the progression from the raw source code through preprocessing, tokenization, parsing, and AST transformation in the compiler frontend. The resulting AST is a more concise and structured representation of the code, ready for typechecking and further analysis.