Skip to content

A C interpreter for a purely functional language, bolted to a very unsafe FFI

License

Notifications You must be signed in to change notification settings

andrewbuss/crisp

Repository files navigation

crisp: a purely functional language interpreter bolted to a very unsafe FFI

Repo contents:

    crisp.h       : header file with core declarations
    crisp.c       : the core, including cell allocation and evaluation
    parse.c       : routines for converting between cells and strings
    ffi.c         : routines supporting the foreign function interface
    interpreter.c : REPL

    modules/std     : a module containing important functions which are
      |               not required to implement the minimal interpreter
      + std.c       : wrappers around zip, concat, apply, assoc which conform
      |               to the native function interface, plus hash and ispair,
      |               plus asc, sum, product, and modulus
      + std.crisp   : declare the above native functions in the global env
                      This also contains many useful lambda functions:
                      nil, not, and, nand, or, nor, neq, dec, inc,
                      void, makerec, defrec, test, testwith, list-equal,
                      c{a,d}{a,d}r, reduce, map, all, any, none, filter,
                      reverse-concat, reverse, reversed-range, range,
                      repeat, zip, len

    tests.crisp     : an assortment of tests and additional syntax examples
    libc_demo.crisp : a few examples using the FFI with libc
    bintree.crisp   : an implementation of a basic persistent binary tree map

    fuzz_parser.c   : a small program to exercise the parser and verify
                      round-trip stability. Only useful for fuzzing

Building:

    git submodule update --init
    ln -sr libatomic_ops bdwgc
    cmake .
    make

    # optionally, install with all modules; read Modules below
    sudo make install

Some syntax examples and testing code are in the tests file. Run tests:

    crisp < tests.crisp

Run as a REPL:

    crisp

Modules:

    crisp.c contains the minimal evaluation logic, but many common functions
    are implemented in the std module. A module is a crisp script, optionally
    combined with any number of compiled object files, linked into a shared
    library. When a module is loaded with the `import` function, the script is
    executed in an environment with a few extra mappings:

        this            : an FFI_LIBRARY, the module currently being loaded
        native-function : convert an FFI_SYMBOL into a NATIVE_FUNCTION

    In practice, a module will contain `def`'s to define new functions in the
    global environment. For example:

        def hash native-function this.hash

    will expose `hash` for use after the module is imported. Private functions
    need not be declared in this way.

    Modules are loaded at runtime using the `import` function. Modules may
    import other modules similarly, although crisp does not attempt to detect
    or resolve circular dependencies.

    Modules are installed by default to /usr/lib/ with filenames of the form
    `libstd.crisp.so`. In addition to the default library paths, `import`
    tries to find a library named `std` in:

        $CRISP_MODULE_PATH/modules/libstd.crisp.so
        $CRISP_MODULE_PATH/std/libstd.crisp.so

    where CRISP_MODULE_PATH is an environment variable that may be provided at
    runtime, and defaults to `./modules/` if not provided. If imports are
    failing, check that modules are reachable from the current directory.

Syntax notes:

    Whitespace is generally ignored. An exception, in order to work as a REPL,
    is that lines are evaluated after each newline, unless there are unclosed
    parentheses. Example:

        sum 1 2 3  ; result: 6
        4 5        ; result: 4 5

        (sum 1 2 3
             4 5)  ; result: 15

    The if function takes two required arguments: a predicate and a clause to
    be evaluated if the predicate is non-nil. If only two arguments are
    supplied and the predicate is nil, nil is returned. If the predicate is
    nil, everything after the second argument is evaluated. This allows an
    easier syntax for chaining if's. The lambda function works similarly;
    evaluating everything after the first argument as a referent. Example:

        def fizzbuzz (lambda x
            with fizz (equal (modulus x 3) 0)
            with buzz (equal (modulus x 5) 0)
            if (and fizz buzz)
                FizzBuzz
            if fizz
                Fizz
            if buzz
                Buzz
            x)

        map fizzbuzz (range 20)
        ; output:
        ; FizzBuzz 1 2 Fizz 4 Buzz Fizz 7 8 Fizz Buzz
          11 Fizz 13 14 FizzBuzz 16 17 Fizz 19

Evaluation notes:

    A list is evaluated by examining its head. If the head is callable, and
    isn't a special function (examples include def, quote, lambda), which
    holds evaluation for its arguments, the arguments are evaluated and the
    function is applied. If the head is not callable, each element of the list
    is independently evaluated and the resulting list is returned.

    The behavior of native functions is unconstrained, although the interface
    is defined: a native function accepts a cell which may be either NIL or a
    PAIR, and returns a cell which may be of any type.

    Little is specified about the behavior of FFI functions - the interface
    is not stable enough to describe yet. Roughly, an FFI_FN can be
    applied to integer or symbol arguments. Symbol arguments are passed as
    pointers to the underlying strings, so (libc.puts foo) prints "foo\n".
    The return value of an FFI_FN is always assumed to be an integer.
    This means that (libc.malloc 4096) will return an integer pointer which
    can be passed to libc.gets or libc.free, for example.

    Finally, lambdas are evaluated by evaluating the body of a lambda in a new
    environment where the names in the lambda args list are mapped to the
    corresponding values to which the lambda is being applied. This new
    environment is stacked atop the environment in which the lambda was
    created, so variables may override other local variables, but global
    definitions are still accessible.

    For example, to evaluate

        with z 4 (def f lambda (x y) sum y (product z x))

        f 3 7

    the interpreter first pairs x with 3 and y with 7 to yield a new env
    concatenated with the existing environment:

        (x . 3) (y . 7) (z . 4) ... (f . (lambda (x y) sum y (product z x)))

    Then it evaluates the body of the lambda in this environment, obtaining:

        sum y (product z x) ->
            y -> 7
            product z x ->
                z -> 4
                x -> 3
            -> apply product 4 3 -> 12
        -> apply sum 7 12 -> 19

Implementation notes:

    At one point crisp was implemented in a purely functional style
    (as far as that's possible in C). Later work, particularly work on
    optimizing tail calls, has added enough complex control flow that this
    is no longer really true, but the goal of moving as much functionality
    from the interpreter into the language remains.

    There is no special "environment" struct. The environment is a normal list
    containing dotted-pair mappings between variable names and their values.
    Evaluation contexts are stacked by concatenating a list of name/value pairs
    to the previous environment.

    The intention is to implement only the minimum functionality in C required
    to implement higher-level functionality in the crisp language, such as
    recursion, map, filter, defrec, and, via the FFI, libc, and glib, I/O of
    some sort.

    At present, crisp uses the Boehm-Demers-Weiser (http://www.hboehm.info/gc/)
    garbage collector rather than including one of its own. A reference counted
    system is a goal of later work.

    The FFI functionality is exciting because it allows reuse of libraries from
    many other projects with very little extra C code in crisp. Using libc from
    crisp allows all sorts of low-level I/O. 

    Symbols are interned when they are parsed.

Fuzzing:

    Automated fuzzing is a fun way to catch bugs. CMake targets are included
    building crisp without FFI or GC. FFI yields a lot of false positives when
    a fuzzer discovers it can pass a garbage pointer to libc.free for example.
    Allowing the fuzzer to call arbitrary external code is also simply a
    worrying idea. GC tends to clog up the fuzzer with extra logic and control
    flow, and isn't needed for short executions.

    Build the slimmer crisp:

        cd fuzzing
        cmake .. -DCMAKE_C_COMPILER=afl-gcc
        make crisp_fuzz

    Prepare directories for afl:

        mkdir inputs afl_state
        echo "(lambda (x y) y x) 4 2" > inputs/simple

    Run afl with a 200 MB memory limit:

        afl-fuzz -m 200 -x afl_dict.txt -i inputs -o afl_state ./crisp_fuzz

About

A C interpreter for a purely functional language, bolted to a very unsafe FFI

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages