Skip to content
This repository has been archived by the owner on Jan 13, 2023. It is now read-only.
/ simpleCompiler Public archive

Uni project of building a Simple Compiler using ANTLR4

Notifications You must be signed in to change notification settings

Stasky745/simpleCompiler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Basic High Level Language (LANS) Compiler

This is a uni project which consisted in building a compiler using ANTLR4. It is originally coded in Catalan for uni.

This compiler gets a LANS (Llenguatge d'Alt Nivell Senzill - Basic High Level Language) and generates* a Bytecode file (.class) from HLL code which can be executed using Java Virtual Machine.

This compiler is developed using Java, and it uses ANTLR4 to construct the lexical and syntax analysis. In other words: ANTLR4 generates the Java code to do the analysis.

1 Types of data

1.1 Basic Types

  • enter: Integer
  • real: Float. Scientific notation accepted (with E). When scientific notation is used, there can only be a single digit before the decimal point.
    • Valid examples: 0.1, 2.10, 3.1416, 6.023E23, 1E-5
    • Invalid examples: 12E-5, 124.0E12
  • car: Character. Will be contained within single quotes (ie: 'a'). Accepted characters will only be letters, numbers, punctuation signs and other basic ASCII characters.
  • boolea: Boolean. Can only be cert (true) or fals (false).

There are also string values, which will be inside double quotes (ie: "hello world"). Like with normal characters, strings will only contain basic ASCII characters. Strings will only be used as a parameter for stdin or stdout as explained in section 6.6. There are no string variables nor any operation using strings.

1.2 Defined Types

We can define new types of data: tuples, vectors or aliases for basic types. These can be used to define variables inside the principal program or any void/function. Voids/functions can have defined types as parameter, but will only return basic types.

2 General Structure of LANS

A program in LANS is written in a single file (there are no imports) and always specifies a name for the program. The file generated by the compiler will have the same name. For example, a file named HelloWorld will be compiled into a HelloWorld.class file. The general structure of a LANS file contains the following elements (in this same order):

<Constant Declaration Block>
<Type Declaration Block>
<Function Declaration Block>
programa <program_name>
<Variable Declaration Block>
<Sentence>*
fiprograma
<Function Implementation Block>

2.1 Constants Declaration Block

It contains a list (can be empty) of constant declarations, with the syntax of:

const name_const : <basic type> := <constant basic type value>;

The constants are accessible from within the main program and any void or function, and they can't be modified.

2.2 Type Declaration Block

It contains a list (can be empty) of type declarations. It can be of three types:

  • An alias of a basic type:
tipus <type_name> : <basic type>;
  • Vector of a unidimensional basic type:
tipus <type_name> : vector de <basic type> inici <min_index> fi <max_index>;
  • Tuple with basic type fields. Tuples must contain at least 1 field:
tipus <type_name> : tupla
<field_id : <basic type>;>+
ftupla;

2.3 Function Declaration Block

In this block we can declare as many voids/functions as we want (or none). The format is the following:

//void
accio <void_name>(<formal parameters>?);

//function
funcio <function_name>(<formal parameters?) returns <basic type>;

The formal parameters' format is the following:

<pe|pes>? <parameter_id> : <type><,<pe|pes>? <parameter_id> : <type>>*

pe: Incoming Parameter pes: Incoming/Outgoing Parameter

By default, if not specified, it's an incoming parameter. Functions can only have incoming parameters.

Parameters can be of non-basic types.

2.4 Function Implementation Block

This block contains the implementation of the declared voids/functions. All declared voids/functions must be implemented and viceversa. Order doesn't have to be the same.

Implementation is the following:

//void
accio <void_name>(<formal parameters>?)
<Variable Declaration Block>
<Sentence>*
fiaccio

//function
funcio <function_name>(<formal parameters?) returns <basic type>
<Variable Declaration Block>
<Sentence>*
retorna <expr basic type>;
fifuncio

2.5 Variable Declaration Block

Consists of a list (can be empty) with the following syntax:

var <variable_id> : <type>;

Variables can only be accessed from the main program or void or function they were declared in. A variable can be of a non-basic type.

3 Identifiers

The identifiers (names for constants, variables, types, voids, functions and tuple fields) have to be a word without spaces with only characters from a to z (upper or undercase), digits from 0-9 and _. There are other properties:

  • And identifier cannot be made from only _.
  • If an identifier has digits, there has to be a letter before it:
    • Non-valid examples: 9, _9, _9_a
    • Valid examples: _a1, a23, A_5
  • It cannot be a key word (finsque, accio, const...)

4 Comments

LANS comments are like those in C. Single line comments start with // and finish when there's a linebreak. Multi-line comments start with /* and finish at the first occurrence of */.

5 Expressions

An expression can be:

  • Constant basic value: 3, 2.9, cert(true), 'a'
  • Constant (identifier)
  • Variable (identifier)
  • Access to tuple
  • Access to vector
  • Function call
  • Operation on one or more expressions

Accepted operators are:

  • +, -, *: addition, substraction, multiplication. Defined between integers and floats.
  • /: division. Defined between integers and floats. Result is always a float.
  • \, %: integer division, mod. Defined between integers, result is always an integer.
  • ~: Defined between integers and floats. This operator changes the sign of the expression in its right.
  • ==, !=: equal-to, not-equal-to. Defined for all basic types. Result is a boolean.
  • <, <=, >, >=: less-than, less-equal-than, more-than, more-equal-than. Defined for integers and floats. Returns a boolean.
  • no, &, |: not, and, or. Defined for booleans, returns a boolean.

Changing from integer to float are automatic when convenient. For example, these expressions are correct: 3.5+1, 2==4.7; No other basic type conversions are admitted.

Vectors are indexed with []:

vector_id[<integer_expression>]

Tuple fields are accessed with a dot:

tuple_id.field_name

Functions have their parameters between parenthesis and separated by commas:

function_name(<<expr><,<expr>>*>?)

Priority of operators is the following:

  1. . []
  2. no ~
  3. * / \ %
  4. + -
  5. == =! < <= > >=
  6. & |

If the priority is the same, the order will be from left to right. We can modify the order using parenthesis.

6 Sentences

A sentence can be:

  • Variable assignment
  • Conditional (if)
  • per loop (for)
  • repeteix loop (do-until)
  • void call
  • read/write operation

6.1 Variable Assignment

Has the following syntax:

variable_name := <expr>;

We can assign positions of a vector or tuple fields, but we can't assign a vector or a tuple. We can assign an integer to a float, but not a float to an integer.

6.2 Conditional (if)

Has the following syntax:

si <boolean_expr> llavors
<sentence if true>*
<altrament
<sentence if false>*
>?
fisi

6.3 Per (for)

This loops iterates an integer variable in an inclusive range:

per <integer_variable> de <int_expr>
fins <int_expr> <pas <int_expr>>? fes
<sentence>*
fiper

The variable used to iterate has to be declared beforehand. pas (step) is 1 by default.

6.4 Repeteix (do-until)

This loop stops executing instructions when the final condition is true. This means the instructions are always executed once

repeteix
<sentence>*
finsque <boolean_expr> escert

6.5 Void Call

A void is called the same way a function is:

void_name(<<expr><,<expr>>*>?);

IN/OUT parameters accept only variables (not constants, function calls, additions, multiplications...).

6.6 Read/Write Instructions

We have the read function llegeix and the write functions escriu and escriuln. These are already implemented and can be executed like voids.

llegeix gets a single OUT parameter, which is a basic type variable that will contain the value read from the keyboard:

llegeix(<variable_id>);

escriu gets at least un parameter and shows on screen the concatenation of the different values:

escriu(<expr><,<expr>>*);

escriuln is similar to escriu but includes a linebreak at the end. escriuln doesn't require a parameter (to make a linebreak). escriu and escriuln parameters can be any kind of expressions and strings.

About

Uni project of building a Simple Compiler using ANTLR4

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published