This is an optimizing compiler for the subset of the C99 language producing LLVM assembler. It is my bachelor's diploma work.
There are some examples of compilable code in the directory with tests.
All tests can be checked via run_tests script.
To do this, one should set the path to the compiler executable
via CC_tst
variable inside this script. Also, one should change the value
of CW39_LLC
in this script (see below).
Following optimizations are implemented in this compiler:
- Memmory to registers conversion, SSA-form creating
- Tail recursion elimination
- Functions inlining (manual detection)
- Dead code elimination
- Copy propagation and constants folding
- Algebraic simplifications and common subexpressions elimination
- Loop invariant code motion
- Intrisics introduction (special optimization, CTZ only)
- CMake (v3.5)
- gcc-10 (or equivalent compiler with C++20 support)
- LLVM (v13.0.1, early versions cannot be compiled with C++20)
- fmtlib (v8.1)
Following packages are used for code generation. They are optional because all code, that should be generated by them, are already included in this repository.
To build this compiler one should execute following commands,
which are common for most CMake projects.
Executable binary (cw39
) will be placed into the build
directory.
mkdir build
cd build
cmake ..
make
By default, all code generators are enabled. But if you don't have such generators installed, the project can be built using already generated source files. To do this, you should call CMake with some of following options:
-DCW39_NO_BISON=TRUE
to disable flex and bison-DCW39_NO_GPERF=TRUE
to disable gperf
For example, cmake command can be executed as shown below.
cmake -DCW39_NO_BISON=TRUE -DCW39_NO_GPERF=TRUE ..
Also, if you have troubles with CMake and generators communication, you can generate source codes manually using this Makefile.
Docker container can be built using the setup.sh
script.
After the first usage, this command should be executed
if Dockerfile has been changed.
Project can be built using the build.sh
script.
This script will store the final binary and building files
in the docker_build
directory created by previous script.
Both scripts don't expect arguments and should be executed from project root (where Dockerfile is located). Optionally, one can specify number of threads used by make in the build script (default is 3).
cw39 [options] <input_file>
Option | Description |
---|---|
--pproc |
Preprocessor output |
--ast |
Abstract syntax tree with pseudo graphics |
--ir |
IR in readable text format |
--cfg |
CFG representation in the dot language |
--llvm |
LLVM assembly code |
--bc |
LLVM bitcode (binary output) |
--asm |
Assmbly code (only in Unix-based systems) |
Each of listed options can accept optional argument
with path to file (e.g. --llvm=./out.ll
).
In this case output will be written into specified file.
If path is empty or -
, output will be written into stdout.
Without any of these options, compiler will print nothing but errors.
Option | Description |
---|---|
-D <macro> |
Define a macro with optional value |
-O <lvl> |
Set optimization level (0-2, default is 2) |
--no-s1 |
Disable special optimization 1 (intrinsics detector) |
--no-s2 |
|
--llc-args <args> |
Specify arguments for llc program |
There are following optimization levels:
- 0: no optimizations
- 1: most common optimizations (without loops optimizations)
- 2: all available optimizations
Note, that specifying any llc argument disables
default arguments: -O0 -mcpu=native
.
There are some llc arguments, that can be interesting:
-march=<arch>
: specify destination architecture (e.g.x86-64
)-mcpu=<cpu>
: specify destination CPU (e.g.native
,skylake
)-O<lvl>
: Set LLC optimization level
Option | Description |
---|---|
--times |
Print elapsed time of each step |
--tr-scanner |
Enable scanner debug mode |
--tr-parser |
Enable parser debug mode |
--help |
Print help page |
--version |
Print compiler version |
This compiler uses some external programs via fork-exec calls. This behaviour available only on Unix-based systems and correct working doesn't guarenteed for other ones.
One can specify names for used executables via environment variables listed below.
CW39_LLC
- name of the llc program from LLVM toolchain (default:llc
)
For example:
export CW39_LLC=llc-13
cw39 --asm test.c
# Or
CW39_LLC=llc-13 cw39 --asm test.c
Print LLVM code into the terminal:
cw39 --llvm test.c
Print LLVM code into the out.ll
file:
cw39 --llvm=out.ll test.c
Execute generated code with arguments 1 and 2 immediately
(also lli-13
and others can be used):
cw39 --llvm test.c | lli - 1 2
Draw CFG into the graph.svg
file:
cw39 --cfg test.c | dot -Tsvg -o graph.svg
Create executable file from assemly code via clang.
cw39 --asm=test.s test.c
clang test.s
# Or
cw39 --asm test.c | clang -x assembler -
# Or
cw39 --llvm test.c | clang -x ir -
Create executable with LLVM optimizations.
cw39 --asm --llc-args="-O3 -mcpu=native" test.c | clang -x assembler -