Skip to content

Commit

Permalink
Merge pull request #17 from mrLSD/feat/register-number
Browse files Browse the repository at this point in the history
Feat: extend SemanticStack with registers
  • Loading branch information
mrLSD authored Nov 18, 2023
2 parents 5ad44c2 + c662618 commit ef93e56
Show file tree
Hide file tree
Showing 10 changed files with 429 additions and 122 deletions.
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "semantic-analyzer"
version = "0.2.6"
version = "0.3.0"
authors = ["Evgeny Ukhanov <mrlsd@ya.ru>"]
description = "Semantic analyzer library for compilers written in Rust for semantic analysis of programming languages AST"
keywords = ["compiler", "semantic-analisis", "semantic-alalyzer", "compiler-design", "semantic"]
Expand Down
40 changes: 24 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,56 +11,56 @@
Semantic analyzer is an open source semantic analyzer for programming languages
that makes it easy to build your own efficient compilers.

## What is the library for and what tasks does it solve
## 🌀 What is the library for and what tasks does it solve

Creating a compilers for a programming language is process that involves several key
stages. Most commonly it is:

- **Lexical Analysis (Lexer)**: This stage involves breaking down the input stream
▶️ **Lexical Analysis (Lexer)**: This stage involves breaking down the input stream
of characters into a series of tokens. Tokens are the atomic elements of the programming language, such as identifiers, keywords, operators, etc.

- **Syntax Analysis (Parsing)**: At this stage, the tokens obtained in the previous
▶️ **Syntax Analysis (Parsing)**: At this stage, the tokens obtained in the previous
stage are grouped according to the grammar rules of the programming language. The result
of this process is an **Abstract Syntax Tree (AST)**, which represents a hierarchical structure of the code.

- **Semantic Analysis**: This stage involves checking the semantic correctness of the code. This can include
**Semantic Analysis**: This stage involves checking the semantic correctness of the code. This can include
type checking, scope verification of variables, etc.

- **Intermediate Code Optimization**: At this stage, the compiler tries to improve the intermediate representation of the code to make it more efficient.
▶️ **Intermediate Code Optimization**: At this stage, the compiler tries to improve the intermediate representation of the code to make it more efficient.
This can include dead code elimination, expression simplification, etc.

- **Code Generation**: This is the final stage where the compiler transforms the optimized intermediate representation (IR) into
▶️ **Code Generation**: This is the final stage where the compiler transforms the optimized intermediate representation (IR) into
machine code specific to the target architecture.

This library represent **Semantic Analysis** stage.

### Features
### 🌻 Features

- **Name Binding and Scope Checking**: The analyzer verifies that all variables, constants, functions are declared before they're used,
**Name Binding and Scope Checking**: The analyzer verifies that all variables, constants, functions are declared before they're used,
and that they're used within their scope. It also checks for name collisions, where variables, constants, functions, types in the same scope have the same name.

- **Checking Function Calls**: The analyzer verifies that functions are called with the number of parameters and that the type of
**Checking Function Calls**: The analyzer verifies that functions are called with the number of parameters and that the type of
arguments matches the type expected by the function.

- **Scope Rules**: Checks that variables, functions, constants, types are used within their scope, and available in the visibility scope.
**Scope Rules**: Checks that variables, functions, constants, types are used within their scope, and available in the visibility scope.

- **Type Checking**: The analyzer checks that operations are performed on compatible types for expressions, functions, constant, bindings.
**Type Checking**: The analyzer checks that operations are performed on compatible types for expressions, functions, constant, bindings.
For operations in expressions. It is the process of verifying that the types of expressions are consistent with their usage in the context.

- **Flow Control Checking**: The analyzer checks that the control flow statements (if-else, loop, return, break, continue) are used correctly.
**Flow Control Checking**: The analyzer checks that the control flow statements (if-else, loop, return, break, continue) are used correctly.
Supported condition expressions and condition expression correctness check.

- **Building the Symbol Table**: For analyzing used the symbol table as data structure used by the semantic analyzer to keep track of
**Building the Symbol Table**: For analyzing used the symbol table as data structure used by the semantic analyzer to keep track of
symbols (variables, functions, constants) in the source code. Each entry in the symbol table contains the symbol's name, type, and scope related for block state, and other relevant information.

### Semantic State Tree
### 🌳 Semantic State Tree

The result of executing and passing stages of the semantic analyzer is: **Semantic State Tree**.

This can be used for Intermediate Code Generation, for further passes
semantic tree optimizations, linting, backend codegen (like LLVM) to target machine.

#### Structure of Semantic State Tree
#### 🌲 Structure of Semantic State Tree

- **blocks state** and related block state child branches. It's a basic
entity for scopes: variables, blocks (function, if, loop).
Expand All @@ -87,7 +87,7 @@ However, parent elements cannot access child elements, which effectively limits

All of that source data, that can be used for Intermediate Representation for next optimizations and compilers codegen.

### Subset of programming languages
### 🧺 Subset of programming languages

The input parameter for the analyzer is a predefined
AST (abstract syntax tree). As a library for building AST and the only dependency
Expand All @@ -104,4 +104,12 @@ analysis and source code parsing, it is recommended to use: [nom is a parser com

AST displays the **Turing complete** programming language and contains all the necessary elements for this.

## 🛋️ Examples

- 🔎 There is the example implementation separate project [💾 Toy Codegen](https://github.com/mrLSD/toy-codegen).
The project uses the `SemanticStack` results and converts them into **Code Generation** logic. Which clearly shows the
possibilities of using the results of the `semantic-analyzer-rs` `SemanticStackContext` results. LLVM is used as a
backend, [inkwell](https://github.com/TheDan64/inkwell) as a library for LLVM codegen, and compiled into an executable
program. The source of data is the AST structure itself.

## MIT [LICENSE](LICENSE)
105 changes: 72 additions & 33 deletions src/semantic.rs
Original file line number Diff line number Diff line change
Expand Up @@ -533,18 +533,26 @@ impl State {
params.push(expr_result);
}

// Result of function call is stored to register
body_state.borrow_mut().inc_register();
let last_register_number = body_state.borrow().last_register_number;
// Store always result to register even for void result
body_state.borrow_mut().context.call(func_data, params);
body_state
.borrow_mut()
.context
.call(func_data, params, last_register_number);
Some(fn_type)
}

/// # condition-expression
/// Analyse condition operations.
/// Analyse condition operations.
/// ## Return
/// Return result register of `condition-expression` calculation.
pub fn condition_expression(
&mut self,
data: &ast::ExpressionLogicCondition<'_>,
function_body_state: &Rc<RefCell<BlockState>>,
) {
) -> u64 {
// Analyse left expression of left condition
let left_expr = &data.left.left;
let left_res = self.expression(left_expr, function_body_state);
Expand All @@ -553,12 +561,15 @@ impl State {
let right_expr = &data.left.right;
let right_res = self.expression(right_expr, function_body_state);

let (Some(left_res), Some(right_res)) = (left_res, right_res) else {
return;
// If some of the `left` or `right` expression is empty just return with error in the state
let (Some(left_res), Some(right_res)) = (left_res.clone(), right_res.clone()) else {
self.add_error(error::StateErrorResult::new(
error::StateErrorKind::ConditionIsEmpty,
format!("left={left_res:?}, right={right_res:?}"),
data.left.left.location(),
));
return function_body_state.borrow().last_register_number;
};
// Unwrap result only after analysing
// let left_res = left_res?;
// let right_res = right_res?;

// Currently strict type comparison
if left_res.expr_type != right_res.expr_type {
Expand All @@ -567,7 +578,7 @@ impl State {
left_res.expr_type.to_string(),
data.left.left.location(),
));
return;
return function_body_state.borrow().last_register_number;
}
match left_res.expr_type {
Type::Primitive(_) => (),
Expand All @@ -577,29 +588,46 @@ impl State {
left_res.expr_type.to_string(),
data.left.left.location(),
));
return;
return function_body_state.borrow().last_register_number;
}
}

// Increment register
function_body_state.borrow_mut().inc_register();

let register_number = function_body_state.borrow_mut().last_register_number;
// Codegen for left condition and set result to register
function_body_state
.borrow_mut()
.context
.condition_expression(left_res, right_res, data.left.condition.clone().into());
.condition_expression(
left_res,
right_res,
data.left.condition.clone().into(),
register_number,
);

// Analyze right condition
if let Some(right) = &data.right {
let left_register_result = function_body_state.borrow_mut().last_register_number;
// Analyse recursively right part of condition
self.condition_expression(&right.1, function_body_state);
let right_register_result = self.condition_expression(&right.1, function_body_state);

// Increment register
function_body_state.borrow_mut().inc_register();

let register_number = function_body_state.borrow_mut().last_register_number;
// Stategen for logical condition for: left [LOGIC-OP] right
// The result generated from registers, and stored to
// new register
function_body_state
.borrow_mut()
.context
.logic_condition(right.0.clone().into());
function_body_state.borrow_mut().context.logic_condition(
right.0.clone().into(),
left_register_result,
right_register_result,
register_number,
);
}
function_body_state.borrow_mut().last_register_number
}

/// # If-condition body
Expand Down Expand Up @@ -793,18 +821,20 @@ impl State {
// If condition contains logic condition expression
ast::IfCondition::Logic(expr_logic) => {
// Analyse if-condition logic
self.condition_expression(expr_logic, if_body_state);
let result_register = self.condition_expression(expr_logic, if_body_state);
// State for if-condition-logic with if-body start
if is_else {
if_body_state
.borrow_mut()
.context
.if_condition_logic(label_if_begin.clone(), label_if_else.clone());
if_body_state.borrow_mut().context.if_condition_logic(
label_if_begin.clone(),
label_if_else.clone(),
result_register,
);
} else {
if_body_state
.borrow_mut()
.context
.if_condition_logic(label_if_begin.clone(), label_if_end.clone());
if_body_state.borrow_mut().context.if_condition_logic(
label_if_begin.clone(),
label_if_end.clone(),
result_register,
);
}
}
}
Expand Down Expand Up @@ -1169,18 +1199,21 @@ impl State {
ast::ExpressionValue::ValueName(value) => {
// Get value from block state
let value_from_state = body_state.borrow_mut().get_value_name(&value.name().into());
// Register contains result
body_state.borrow_mut().inc_register();
let last_register_number = body_state.borrow().last_register_number;
// First check value in body state
let ty = if let Some(val) = value_from_state {
body_state
.borrow_mut()
.context
.expression_value(val.clone());
.expression_value(val.clone(), last_register_number);
val.inner_type
} else if let Some(const_val) = self.global.constants.get(&value.name().into()) {
body_state
.borrow_mut()
.context
.expression_const(const_val.clone());
.expression_const(const_val.clone(), last_register_number);
const_val.constant_type.clone()
} else {
// If value doesn't exist in State or as Constant
Expand All @@ -1192,7 +1225,6 @@ impl State {
return None;
};
// Return result as register
body_state.borrow_mut().inc_register();
ExpressionResult {
expr_type: ty,
expr_value: ExpressionResultValue::Register(
Expand Down Expand Up @@ -1274,10 +1306,14 @@ impl State {
})?
.clone();

body_state
.borrow_mut()
.context
.expression_struct_value(val.clone(), attributes.clone().attr_index);
// Register contains result
body_state.borrow_mut().inc_register();
let last_register_number = body_state.borrow().last_register_number;
body_state.borrow_mut().context.expression_struct_value(
val.clone(),
attributes.clone().attr_index,
last_register_number,
);

body_state.borrow_mut().inc_register();
ExpressionResult {
Expand All @@ -1303,14 +1339,17 @@ impl State {
// Do not fetch other expression flow if type is wrong
return None;
}
// Expression operation is set to register
body_state.borrow_mut().inc_register();
let last_register_number = body_state.borrow().last_register_number;
// Call expression operation for: OP(left_value, right_value)
body_state.borrow_mut().context.expression_operation(
op.clone().into(),
left_value.clone(),
right_value.clone(),
last_register_number,
);
// Expression result value for Operations is always should be "register"
body_state.borrow_mut().inc_register();
ExpressionResult {
expr_type: right_value.expr_type,
expr_value: ExpressionResultValue::Register(
Expand Down
1 change: 1 addition & 0 deletions src/types/error.rs
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ pub enum StateErrorKind {
TypeNotFound,
WrongReturnType,
ConditionExpressionWrongType,
ConditionIsEmpty,
ConditionExpressionNotSupported,
ForbiddenCodeAfterReturnDeprecated,
ForbiddenCodeAfterContinueDeprecated,
Expand Down
Loading

0 comments on commit ef93e56

Please sign in to comment.