Überwald is a simple Lisp dialect and interpreter I’m writing to learn about Lisps, interpreters, and writing C code that doesn’t explode.
object.c
- define the
ubw_obj
type.ubw_obj
is a tagged union and stores every possible primitive type; read.c
- read an object from a sequence of characters;
eval.c
- evaluate a lisp form, given an environment;
env.c
- provide an environment, ie recursive associative arrays for binding values to symbols;
store.c
- manage in-memory storage of objects of various types:
- regular objects (
ubw_obj
) on a preallocated (and extensible) heap; - string values;
- symbol names.
- regular objects (
Like most LISPs, Überwald is garbage collected. Marking of unreachable objects is performed in multiple ways:
- Garbage-collecting environments
- symbols are bound in environments;
- functions hold a reference to the environment they were defined in (because of lexical scoping);
- environments have a reference counter;
- when a function is garbage-collected, it decreases the reference count of the environment it was defined in by one;
- when the reference count of a given environment reaches zero, the environment is marked for garbage collection.
- Garbage-collecting symbols with their environments
- whenever an environment gets collected, all the values of symbols bound inside it are destroyed, recursively if needed.
- Garbage collecting overwritten values
- whenever a symbol is
(set)
, its previous value is recursively marked as destroyed.
- whenever a symbol is
The garbage collection phase in itself consists of updating the list of available heap space. This is an array of pair (index, size).
When Überwald code is (read)
, the AST is by default stored on a special space called ephemeral storage, which gets deleted each time a complete form has been evaluated. Some special forms, though, will instruct the reader to store emitted AST on the heap instead. Such special forms are (define)
and (set)
. This prevents unnecessary copies. Ephemeral storage is pre-allocated before reading start as a ubw_vector
of ubw_obj
.
Symbols and keywords are internally represented as integers, and a associative map is defined in store.h. When reading a symbol, the reader interns it by calling ubw_intern
with its name, and receives the integer that uniquely identifies it. Internally, one or two lookup tables are kept, one to associate names with identifiers, the other
Keywords are implemented in exactly the same way, the only difference is in evaluation (keywords evaluate to themselves).
- LISP-1 (a single namespace for functions and variables)
- Interpreted;
- Source code is UTF-8, unless implementation language doesn’t support it/makes it hard, in which case ASCII.
- Hygienic or unhygienic macros;
- Syntactic bindings by default.
- CL-like reader-macros (
set-macro-character
)
Name bindings are performed within a hierarchical set of environments. An environment is a structure holding a reference to its parent and a list of names bound to Lisp values. Looking up a symbol means looking it up into the bottommost environment, then its parent, until a binding for the symbol has been found or a parentless environment has been reached.
A function is stored as:
[type [args] body]
where:
- type is either
:closure
or:special
. The former is for a regular function, the latter defines a special form, which receives its arguments unevaluated. args
is an argument vector. It’s a vector of either:- symbols, for named argument;
- the
:optional
keyword, once, immediately after the last required argument; - the
:variadic
keyword, once, immediately before the last argument.
- boolean
- integers
- the usual notations
- floats
- same here
- keywords
- identified by a colon, as in
:keyword
. Implemented as a reader macro of the standard library::some-keyword
expands to (lispy.keywords.find-or-create)
Notice these functions aren’t special forms. Lispy has a concept of lazy functions.
(if [condition then & else])
- evaluate
condition
, thenthen
if condition is true, else either. (progn [& body])
- evaluate every sexp in
body
and return the value of the last one. (quote [& cdr])
- evaluate to
cdr
.
(read [s])
- Read
s
and return a S-expression. - =(eval [& cdr])
- Evaluates
cdr
as a Lispy AST.
- (bound? [symbol])
- returns
t
ifsymbol
is bound in the current environment (defined as the bottommost environment and its parents, up to the root.)
No surprise there:
(integer? [x])
(float? [x])
(bool? [x])
- (
number [x])
(string? [x])
(list? [x])
(vector? [x])
(null? [x])
(set [symbol value])
- bind
value
tosymbol
in the deepest environment where symbol is defined, or the root environment.
(fopen path mode)
- Opens
desc
withmode
and returns a file descriptor. The exact type of descriptors is system-dependant. (fclose desc)
- Closes file descriptor DESC.
(fread)
(fwrite desc bytes)
(fseek desc pos)
stdin
,stdout
,stderr
- File descriptors for standard input, output and error output.