This is a uni project which consisted in building a compiler using ANTLR4. It is originally coded in Catalan for uni.
This compiler gets a LANS (Llenguatge d'Alt Nivell Senzill - Basic High Level Language) and generates* a Bytecode file (.class) from HLL code which can be executed using Java Virtual Machine.
This compiler is developed using Java, and it uses ANTLR4 to construct the lexical and syntax analysis. In other words: ANTLR4 generates the Java code to do the analysis.
- enter: Integer
- real: Float. Scientific notation accepted (with E). When scientific notation is used, there can only be a single digit before the decimal point.
- Valid examples:
0.1
,2.10
,3.1416
,6.023E23
,1E-5
- Invalid examples:
12E-5
,124.0E12
- Valid examples:
- car: Character. Will be contained within single quotes (ie: 'a'). Accepted characters will only be letters, numbers, punctuation signs and other basic ASCII characters.
- boolea: Boolean. Can only be cert (true) or fals (false).
There are also string values, which will be inside double quotes (ie: "hello world"). Like with normal characters, strings will only contain basic ASCII characters. Strings will only be used as a parameter for stdin or stdout as explained in section 6.6. There are no string variables nor any operation using strings.
We can define new types of data: tuples, vectors or aliases for basic types. These can be used to define variables inside the principal program or any void/function. Voids/functions can have defined types as parameter, but will only return basic types.
A program in LANS is written in a single file (there are no imports) and always specifies a name for the program. The file generated by the compiler will have the same name. For example, a file named HelloWorld will be compiled into a HelloWorld.class file. The general structure of a LANS file contains the following elements (in this same order):
<Constant Declaration Block>
<Type Declaration Block>
<Function Declaration Block>
programa <program_name>
<Variable Declaration Block>
<Sentence>*
fiprograma
<Function Implementation Block>
It contains a list (can be empty) of constant declarations, with the syntax of:
const name_const : <basic type> := <constant basic type value>;
The constants are accessible from within the main program and any void or function, and they can't be modified.
It contains a list (can be empty) of type declarations. It can be of three types:
- An alias of a basic type:
tipus <type_name> : <basic type>;
- Vector of a unidimensional basic type:
tipus <type_name> : vector de <basic type> inici <min_index> fi <max_index>;
- Tuple with basic type fields. Tuples must contain at least 1 field:
tipus <type_name> : tupla
<field_id : <basic type>;>+
ftupla;
In this block we can declare as many voids/functions as we want (or none). The format is the following:
//void
accio <void_name>(<formal parameters>?);
//function
funcio <function_name>(<formal parameters?) returns <basic type>;
The formal parameters' format is the following:
<pe|pes>? <parameter_id> : <type><,<pe|pes>? <parameter_id> : <type>>*
pe: Incoming Parameter pes: Incoming/Outgoing Parameter
By default, if not specified, it's an incoming parameter. Functions can only have incoming parameters.
Parameters can be of non-basic types.
This block contains the implementation of the declared voids/functions. All declared voids/functions must be implemented and viceversa. Order doesn't have to be the same.
Implementation is the following:
//void
accio <void_name>(<formal parameters>?)
<Variable Declaration Block>
<Sentence>*
fiaccio
//function
funcio <function_name>(<formal parameters?) returns <basic type>
<Variable Declaration Block>
<Sentence>*
retorna <expr basic type>;
fifuncio
Consists of a list (can be empty) with the following syntax:
var <variable_id> : <type>;
Variables can only be accessed from the main program or void or function they were declared in. A variable can be of a non-basic type.
The identifiers (names for constants, variables, types, voids, functions and tuple fields) have to be a word without spaces with only characters from a to z (upper or undercase), digits from 0-9 and _. There are other properties:
- And identifier cannot be made from only _.
- If an identifier has digits, there has to be a letter before it:
- Non-valid examples:
9
,_9
,_9_a
- Valid examples:
_a1
,a23
,A_5
- Non-valid examples:
- It cannot be a key word (
finsque
,accio
,const
...)
LANS comments are like those in C. Single line comments start with //
and finish when there's a linebreak. Multi-line comments start with /*
and finish at the first occurrence of */
.
An expression can be:
- Constant basic value:
3
,2.9
,cert
(true),'a'
- Constant (identifier)
- Variable (identifier)
- Access to tuple
- Access to vector
- Function call
- Operation on one or more expressions
Accepted operators are:
+
,-
,*
: addition, substraction, multiplication. Defined between integers and floats./
: division. Defined between integers and floats. Result is always a float.\
,%
: integer division, mod. Defined between integers, result is always an integer.~
: Defined between integers and floats. This operator changes the sign of the expression in its right.==
,!=
: equal-to, not-equal-to. Defined for all basic types. Result is a boolean.<
,<=
,>
,>=
: less-than, less-equal-than, more-than, more-equal-than. Defined for integers and floats. Returns a boolean.no
,&
,|
: not, and, or. Defined for booleans, returns a boolean.
Changing from integer to float are automatic when convenient. For example, these expressions are correct: 3.5+1
, 2==4.7
; No other basic type conversions are admitted.
Vectors are indexed with []
:
vector_id[<integer_expression>]
Tuple fields are accessed with a dot:
tuple_id.field_name
Functions have their parameters between parenthesis and separated by commas:
function_name(<<expr><,<expr>>*>?)
Priority of operators is the following:
.
[]
no
~
*
/
\
%
+
-
==
=!
<
<=
>
>=
&
|
If the priority is the same, the order will be from left to right. We can modify the order using parenthesis.
A sentence can be:
- Variable assignment
- Conditional (if)
per
loop (for)repeteix
loop (do-until)- void call
- read/write operation
Has the following syntax:
variable_name := <expr>;
We can assign positions of a vector or tuple fields, but we can't assign a vector or a tuple. We can assign an integer to a float, but not a float to an integer.
Has the following syntax:
si <boolean_expr> llavors
<sentence if true>*
<altrament
<sentence if false>*
>?
fisi
This loops iterates an integer variable in an inclusive range:
per <integer_variable> de <int_expr>
fins <int_expr> <pas <int_expr>>? fes
<sentence>*
fiper
The variable used to iterate has to be declared beforehand. pas
(step) is 1 by default.
This loop stops executing instructions when the final condition is true. This means the instructions are always executed once
repeteix
<sentence>*
finsque <boolean_expr> escert
A void is called the same way a function is:
void_name(<<expr><,<expr>>*>?);
IN/OUT parameters accept only variables (not constants, function calls, additions, multiplications...).
We have the read function llegeix
and the write functions escriu
and escriuln
. These are already implemented and can be executed like voids.
llegeix
gets a single OUT parameter, which is a basic type variable that will contain the value read from the keyboard:
llegeix(<variable_id>);
escriu
gets at least un parameter and shows on screen the concatenation of the different values:
escriu(<expr><,<expr>>*);
escriuln
is similar to escriu
but includes a linebreak at the end. escriuln
doesn't require a parameter (to make a linebreak). escriu
and escriuln
parameters can be any kind of expressions and strings.