-
Notifications
You must be signed in to change notification settings - Fork 1
Compiler
The preprocessor loads the source and calls the tokenizer for each file. When a file is separated into tokens then the preprocessor parses the tokens to find and apply all meta commands. This can result in a number of include files, each of these include files are then loaded and parsed by the preprocessor again.
The entry point for the preprocessor is the processFile function:
preProcessor.processFile({filename: projectFilename, token: null});
The
tokenizer
separates the loaded source into tokens.
A token can be one or more lexemes.
For the lexemes for
, if
, while
are keyword
tokens.
Information about the position and file is stored with the token.
This information can be used by the compiler for error messages and can be used by the IDE for code hints.
After all files are loaded and tokenized the syntax is validated. The syntax validator checks if every lexeme is followed by a valid lexeme.
If the syntax is correct then the next step is to compile a list of namespaces and apply a namespace prefix to all found namespaced identifiers.
The next step is compiling the tokens with a recursive descent parser.
Boolean and math expressions are compiled to a tree structure first.
Constant expressions are optimized before the tree is converted to VM code.
For example the expression 4 * 5
will be replaced with 20
.
When a command is added to the
program
it's checked against the last command(s) in the program for optimization.
For example, if the last command is: add [10], 5
and the command which is added is: add [10], 9
then no new command will be added and the existing command will be changed to: add [10], 14
.
The compilation process has two passes. In the first pass a database is built with a reference count, this pass does not output any code. The second pass generates code but only for the procedures which are actually called.
The following piece of code is from a unit test. The number n1 is assigned a value,
the address of n1 is assigned to the pointer n. The value of n is assigned to n2,
this involves implicit pointer dereferencing.
And in the last two lines the addr is called on n2 and the value is logged.
The unoptimized output of this code is:
The same compiled code but optimized looks like this:
The first number of each line is the command number, the second number is a block id. Commands which have the same block id can be combined by the optimizer. Commands which have a different block id will not be combined by the optimizer because it would result in the removal of functionally necessary code.