Skip to content
/ uberwald Public

A simple Lisp interpreter to teach myself about Lisps and Interpreters.

License

Notifications You must be signed in to change notification settings

thblt/uberwald

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Überwald

Überwald is a simple Lisp dialect and interpreter I’m writing to learn about Lisps, interpreters, and writing C code that doesn’t explode.

The interpreter

Components

object.c
define the ubw_obj type. ubw_obj is a tagged union and stores every possible primitive type;
read.c
read an object from a sequence of characters;
eval.c
evaluate a lisp form, given an environment;
env.c
provide an environment, ie recursive associative arrays for binding values to symbols;
store.c
manage in-memory storage of objects of various types:
  • regular objects (ubw_obj) on a preallocated (and extensible) heap;
  • string values;
  • symbol names.

Memory management

Garbage collection

Like most LISPs, Überwald is garbage collected. Marking of unreachable objects is performed in multiple ways:

  1. Garbage-collecting environments
    • symbols are bound in environments;
    • functions hold a reference to the environment they were defined in (because of lexical scoping);
    • environments have a reference counter;
    • when a function is garbage-collected, it decreases the reference count of the environment it was defined in by one;
    • when the reference count of a given environment reaches zero, the environment is marked for garbage collection.
  2. Garbage-collecting symbols with their environments
    • whenever an environment gets collected, all the values of symbols bound inside it are destroyed, recursively if needed.
  3. Garbage collecting overwritten values
    • whenever a symbol is (set), its previous value is recursively marked as destroyed.

The garbage collection phase in itself consists of updating the list of available heap space. This is an array of pair (index, size).

AST allocation

When Überwald code is (read), the AST is by default stored on a special space called ephemeral storage, which gets deleted each time a complete form has been evaluated. Some special forms, though, will instruct the reader to store emitted AST on the heap instead. Such special forms are (define) and (set). This prevents unnecessary copies. Ephemeral storage is pre-allocated before reading start as a ubw_vector of ubw_obj.

Symbols and keywords

Symbols and keywords are internally represented as integers, and a associative map is defined in store.h. When reading a symbol, the reader interns it by calling ubw_intern with its name, and receives the integer that uniquely identifies it. Internally, one or two lookup tables are kept, one to associate names with identifiers, the other

Keywords are implemented in exactly the same way, the only difference is in evaluation (keywords evaluate to themselves).

The language

General specifications

  • LISP-1 (a single namespace for functions and variables)
  • Interpreted;
  • Source code is UTF-8, unless implementation language doesn’t support it/makes it hard, in which case ASCII.
  • Hygienic or unhygienic macros;
  • Syntactic bindings by default.
  • CL-like reader-macros (set-macro-character)

Language

Name bindings

Name bindings are performed within a hierarchical set of environments. An environment is a structure holding a reference to its parent and a list of names bound to Lisp values. Looking up a symbol means looking it up into the bottommost environment, then its parent, until a binding for the symbol has been found or a parentless environment has been reached.

Functions

A function is stored as:

[type [args] body]

where:

  • type is either :closure or :special. The former is for a regular function, the latter defines a special form, which receives its arguments unevaluated.
  • args is an argument vector. It’s a vector of either:
    • symbols, for named argument;
    • the :optional keyword, once, immediately after the last required argument;
    • the :variadic keyword, once, immediately before the last argument.

Atoms

boolean
integers
the usual notations
floats
same here
keywords
identified by a colon, as in :keyword. Implemented as a reader macro of the standard library: :some-keyword expands to (lispy.keywords.find-or-create)

Functions

Primitive functions

Notice these functions aren’t special forms. Lispy has a concept of lazy functions.

(if [condition then & else])
evaluate condition, then then if condition is true, else either.
(progn [& body])
evaluate every sexp in body and return the value of the last one.
(quote [& cdr])
evaluate to cdr.

Interpreter

(read [s])
Read s and return a S-expression.
=(eval [& cdr])
Evaluates cdr as a Lispy AST.

Inspecting environments

(bound? [symbol])
returns t if symbol is bound in the current environment (defined as the bottommost environment and its parents, up to the root.)

Inspecting values

No surprise there:

  • (integer? [x])
  • (float? [x])
  • (bool? [x])
  • (number [x])
  • (string? [x])
  • (list? [x])
  • (vector? [x])
  • (null? [x])

Manipulating symbols

(set [symbol value])
bind value to symbol in the deepest environment where symbol is defined, or the root environment.

File I/O

(fopen path mode)
Opens desc with mode and returns a file descriptor. The exact type of descriptors is system-dependant.
(fclose desc)
Closes file descriptor DESC.
(fread)
(fwrite desc bytes)
(fseek desc pos)
stdin, stdout, stderr
File descriptors for standard input, output and error output.

About

A simple Lisp interpreter to teach myself about Lisps and Interpreters.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages