WARNING: mutability and side effects
- tag lox-chapter-07
- continue on branch lox-chapter-11 on semantic analysis
FIXME
- TODO part-3
./gradlew :modules:lox:test
./gradlew runLox --args=data/exampleN.lox
-
scanning (or lexing or lexical analysis)
- in greek lex means word
- a
scanner
(orlexer
) takes in the linear stream of characters and chunks them together into a series of words ortokens
- a token can be a single char or several chars (numbers, string literal, identifiers)
- ignores whitespaces or comments
- lexical grammar: the rules for how characters get grouped into tokens
-
parsing
- where a syntax gets a grammar: the ability to compose larger expressions and statements out of smaller parts
- a
parser
takes the flat sequence of tokens and builds a tree structure that mirrors the nested nature of the grammar: aparse tree
orabstract syntax tree
orsyntax tree
orast
ortree
- context-free grammar (CFG)
- derivations use a finite set of rules to generate strings that are in the grammar
- rules are called productions because they produce strings in the grammar
- each production in a context-free grammar has a head, its name, and a body, a list of symbols which describes what it generates
- symbols can be terminal i.e. lexemes (tokens coming from the scanner) or nonterminal i.e. a named reference to another rule in the grammar
- BNF Backus–Naur form is a metasyntax notation for context-free grammars
-
static analysis
- the first bit of analysis that most languages do is called
binding
orresolution
- for each identifier, we find out where that name is defined and wire the two together. This is where scope comes into play — the region of source code where a certain name can be used to refer to a certain declaration
- if the language is statically typed, this is when we type check and report
type errors
- all this semantic insight that is visible to us from analysis needs to be stored somewhere: often as attributes on the syntax tree itself, other times in a lookup table off to the side called
symbol table
- the first bit of analysis that most languages do is called
-
intermediate representations
- well-known styles of IRs:
control flow graph
,static single-assignment
,continuation-passing style
,three-address code
- support multiple source languages and target platforms
- well-known styles of IRs:
-
optimization
- swap program with a different one that has the same semantics, but it's implemented more efficiently
- e.g. constant folding: if some expression always evaluates to the exact same value, we can do the evaluation at compile time and replace the code for the expression with its result
-
code generation
- generating code, where "code" here usually refers to the kind of primitive assembly-like instructions a CPU runs
- Do we generate instructions for a real CPU or a virtual one?
- native machine code is lightning fast, but generating it is a lot of work and it's not portable
- virtual machine code is portable, generally called
bytecode
because each instruction is often a single byte long
-
virtual machine
- a (language or process) virtual machine (VM) is a program that emulates a hypothetical chip supporting a virtual architecture at runtime
- running bytecode in a VM is slower than translating it to native code ahead of time because every instruction must be simulated at runtime each time it executes. In return, you get simplicity and portability
-
runtime
- if we compiled it to machine code, we simply tell the operating system to load the executable and off it goes
- if we compiled it to bytecode, we need to start up the VM and load the program into that
1)
to3)
are consideredfront end
of the implementation- from
6)
it'sback end
- single-pass compilers: some simple compilers interleave parsing, analysis, and code generation so that they produce output code directly in the parser, without ever allocating any syntax trees or other IR
- tree-walk interpreters: begin executing code right after parsing it to an AST: to run the program, the interpreter traverses the syntax tree one branch and leaf at a time, evaluating each node as it goes
- transpilers are source-to-source compiler or a transcompiler: write a front end for your language. Then, in the back end, instead of doing all the work to lower the semantics to some primitive target language, you produce a string of valid source code for some other language that's about as high level as yours
- just-in-time compilation: the fastest way to execute code is by compiling it to machine code
- tools
What's the difference between a compiler and an interpreter?
- compiling is an implementation technique that involves translating a source language to some other—usually lower-level—form. When you generate bytecode or machine code, you are compiling. When you transpile to another high-level language, you are compiling too
- when we say a language implementation "is a compiler", we mean it translates source code to some other form but doesn't execute it. The user has to take the resulting output and run it themselves
- when we say an implementation "is an interpreter", we mean it takes in source code and executes it immediately. It runs programs "from source"
- TODO
Set theory
and programming languages meet each other inType theory