ARCUEID ROADMAP Copyright (C) 2013 Rafael R. Sevilla
Copying and distribution of this file, with or without modification, are permitted in any medium without royalty provided the copyright notice and this notice are preserved.
- [X] Define virtual machine architecture
- [X] Design interpreter core
The interpreter core should be flexible enough that we can easily use switch threading, call threading, GCC labels as values, or even assembly language. Each instruction should be separately implementable.
- [X] Build code generation infrastructure
- [X] Implement virtual machine instructions [35/35]
- [X] nop
- [X] push
- [X] pop
- [X] ldl
- [X] ldi
- [X] ldg
- [X] stg
- [X] lde
- [X] ste
- [X] cont
- [X] env
- [X] envr
- [X] apply
- [X] ret
- [X] jmp
- [X] jt
- [X] jf
- [X] jbnd
- [X] true
- [X] nil
- [X] hlt
- [X] add
- [X] sub
- [X] mul
- [X] div
- [X] cons
- [X] car
- [X] cdr
- [X] scar
- [X] scdr
- [X] is
- [X] dup
- [X] cls
- [X] consr
- [X] imenv
- [X] Environments [2/2] We use what Clinger et. al. call the stack/heap strategy.
Environments and continuations normally go on the stack, but may be moved to the heap if required, e.g. when a closure or continuation is required to outlive its initial context. Environments can have their lives extended if one of these things happens:
- A function returns with a closure in its value register
- A closure is stored in the global environment
- A closure is stored in an environment other than at level 0.
- [X] Stack-based
- [X] Heap-based
- [X] Continuations [2/2]
Heap-based continuations are required to support ccc.
- [X] Stack-based
- [X] Heap-based
- [X] Foreign function interface [3/3]
- [X] Calling convention 1 (straight args)
- [X] Calling convention 2 (args as a C array)
- [X] Arcueid Foreign Functions (AFF)
- [X] Enable GCC’s labels as values for doing interpreter threading
- [X] Type invocations [5/5]
- [X] procedure
- [X] list
- [X] vector
- [X] table
- [X] string
- [X] memory allocator
A BiBOP-based memory allocator. Keep things simple first.
- [X] garbage collector [2/2]
A simple mark and sweep garbage collector for now. Keep things simple first.
- [X] Garbage collection of data types [14/14]
- [X] Boxed Numeric Types [4/4]
- [X] bignum
- [X] rational
- [X] flonum
- [X] complex
- [X] Characters
- [X] Strings
- [X] Symbols
- [X] Conses
- [X] Tables
- [X] Vectors
- [X] Tagged
- [X] Functions
- [X] String Ports
- [X] File Ports
- [X] Threads
- [X] Exceptions
- [X] Channels
- [X] Boxed Numeric Types [4/4]
- [X] Write barrier hooks
- [X] Lists
- [X] Improper lists
- [X] Bracketed functions
- [X] Quotes
- [X] Quasiquotes
- [X] Comma expressions
- [X] Strings
- [X] Characters
- [X] Comments
- [X] Symbols
- [X] Numbers [4/4]
- [X] Integer
- [X] Flonum
- [X] Rational
- [X] Complex
- [X] Regexps
- [X] Atstrings
- [X] Literal expressions [9/9]
- [X] nil
- [X] true (t)
- [X] character
- [X] string
- [X] fixnum
- [X] bignum
- [X] flonum
- [X] rational
- [X] complex
- [X] Symbols [2/2]
- [X] Environment symbols
- [X] Global symbols
- [X] Lists [3/3]
- [X] Special forms [8/8]
- [X] if
- [X] fn [5/5]
- [X] Special forms [8/8]
- [X] single symbol arguments
- [X] simple argument lists
- [X] optional arguments
- [X] rest arguments
- [X] destructuring binds
- [X] quote
- [X] quasiquote
- [X] assign
- [X] compose in a functional position
- [X] complement in a functional position
- [X] andf in a functional position
- [X] Inline functions [7/7]
- [X] cons
- [X] car
- [X] cdr
- [X] +
- [X] -
- [X] *
- [X] /
- [X] Function applications
- [X] Macros
- [X] Special Syntax [4/4]
- [X] Compose (:)
- [X] Complement (~)
- [X] Structure access (. and !)
- [X] And (&)
- [X] Nil
- [X] True
- [X] Numeric Types [5/5]
- [X] fixnum
- [X] bignum
- [X] flonum
- [X] complex
- [X] rational
- [X] Characters
- [X] Strings
- [X] Symbols
- [X] Conses
- [X] Tables [4/4]
- [X] Atomic keys [8/8]
- [X] Fixnum
- [X] Bignum
- [X] Flonum
- [X] Rational
- [X] Complex
- [X] Symbol
- [X] String
- [X] Character
- [X] Cons keys
- [X] Vector keys
- [X] Hash table keys
- [X] Atomic keys [8/8]
- [X] Vectors
- [X] Tagged
- [X] Functions
- [X] Input Ports
- [X] Output Ports
- [X] Threads
- [X] Exceptions
- [X] Channels
- [X] Regular Expressions
Consider whether or not to provide instructions for the asterisked functions, so as to make their use cheaper.
- [X] Initialization for binding runtime primitives to global symbols
- [X] Type handling [5/5]
- [X] coerce [11/11]
- [X] Fixnum conversions [9/9]
- [X] fixnum -> int (trivial)
- [X] fixnum -> num (trivial)
- [X] fixnum -> fixnum (trivial)
- [X] fixnum -> bignum (trivial)
- [X] fixnum -> rational (trivial)
- [X] fixnum -> flonum
- [X] fixnum -> complex (same as fixnum -> flonum)
- [X] fixnum -> char
limit to 0 - 0x10FFFF, also exclude 0xd800-0xdfff, invalid Unicode block.
- [X] fixnum -> string (has base as optional arg)
- [X] Bignum conversions [7/7]
- [X] bignum -> int (trivial)
- [X] bignum -> num (trivial)
- [X] bignum -> bignum (trivial)
- [X] bignum -> rational (trivial)
- [X] bignum -> flonum
- [X] bignum -> complex (same as conversion to flonum)
- [X] bignum -> str
- [X] Flonum conversions [7/7]
- [X] flonum -> fixnum
- [X] flonum -> bignum
- [X] flonum -> rational
- [X] flonum -> flonum (trivial)
- [X] flonum -> num (trivial)
- [X] flonum -> complex (trivial)
- [X] flonum -> string
- [X] Rational conversions [8/8]
- [X] Fixnum conversions [9/9]
- [X] coerce [11/11]
- [X] rational -> fixnum (rounds)
- [X] rational -> bignum (rounds)
- [X] rational -> rational (trivial)
- [X] rational -> num (trivial)
- [X] rational -> flonum
- [X] rational -> complex (same as flonum)
- [X] rational -> string
- [X] rational -> cons
- [X] Complex conversions [4/4]
- [X] complex -> complex (trivial)
- [X] complex -> num (trivial)
- [X] complex -> string
- [X] complex -> cons
- [X] Character conversions [5/5]
- [X] char -> char (trivial)
- [X] char -> int (results in a fixnum from 0 - 0x10FFFF)
- [X] char -> fixnum (same as char -> int)
- [X] char -> bignum (same as char -> int)
- [X] char -> string
- [X] String conversions [10/10]
- [X] string -> string (trivial)
- [X] string -> symbol
- [X] string -> cons
- [X] string -> fixnum
- [X] string -> bignum
- [X] string -> flonum
- [X] string -> complex
- [X] string -> rational
- [X] string -> int Note that unlike for the numeric types (coerce “…” ‘int) is not the same as using (coerce “…” ‘fixnum) or (coerce “…” ‘bignum). What it does amounts to
(coerce … ‘num) (see below) and then converts the result into an integer type of appropriate size.
- [X] string -> num (generic number conversion)
Converts any string into a number of the appropriate type. This should use the best available numeric type that is able to most accurately represent the value described by the string. Numeric base may be specified as an optional argument as before.
Basic algorithm makes the following tests:
- If string ends with ‘i’ or ‘j’, convert as complex
- If string contains ‘.’, convert as floating point.
- If base is less than 14 and the string contains ‘e/E’, convert as floating point.
- If base is less than 25 and the string contains
‘p/P’, convert as floating point.
- If string contains ‘/’, convert as rational.
- Otherwise, consider string as representing an integer
- [X] Symbol conversions [4/4]
- [X] symbol -> symbol (trivial)
- [X] symbol -> string
- [X] nil -> string (produces empty string)
- [X] t -> string
- [X] Cons conversions [4/4]
- [X] cons -> cons (trivial)
- [X] cons -> string
- [X] cons -> vector
- [X] cons -> table
- [X] Table conversions [2/2]
- [X] table -> table (trivial)
- [X] table -> cons
- [X] Vector conversions [2/2]
- [X] vector -> vector (trivial)
- [X] vector -> cons
- [X] type
- [X] annotate
- [X] rep
- [X] sym
- [X] Predicates [9/9]
- [X] Less-than (<) *
- [X] Greater-than (>) *
- [X] Less-than or equal (<=) *
- [X] Greater-than or equal (>=) *
- [X] spaceship operator (<=>) * (Arcueid extension)
- [X] bound
- [X] exact
- [X] is
- [X] iso
- [X] List operations [7/7]
- [X] car
- [X] cdr
- [X] cadr
- [X] cddr
- [X] cons
- [X] scar
- [X] scdr
- [-] Math operations [3/4]
- [X] Arithmetic [5/5]
- [X] * Multiplication
- [X] + Addition
- [X] - Subtraction
- [X] / Division
- [X] div - integer division (extension)
- [X] Complex arithmetic [4/4]
This is again an Arcueid extension. It’s rather annoying to have support for complex numbers but no functions to manipulate them.
- [X] real
- [X] imag
- [X] conj
- [X] arg – complex argument
- [X] Arc3-current functions [6/6]
- [X] expt
- [X] mod
- [X] rand
- [X] srand
- [X] sqrt
- [X] trunc
- [-] C99 math.h functions (Arcueid only) [3/37]
These functions should support complex arguments in as far as it makes sense to do so.
- [X] abs – works for all numeric types
- [ ] acos
- [ ] acosh
- [ ] asin
- [ ] asinh
- [ ] atan
- [ ] atan2
- [ ] atanh
- [ ] cbrt
- [ ] ceil
- [ ] cos
- [ ] cosh
- [ ] erf
- [ ] erfc
- [ ] exp
- [ ] expm1
- [ ] floor
- [ ] fmod
- [ ] frexp
- [ ] hypot
- [ ] ldexp
- [ ] lgamma
- [ ] log
- [ ] log10
- [ ] log2
- [ ] logb
- [ ] modf
- [ ] nan
- [ ] nearbyint
- [ ] pow (alias for expt)
- [ ] sin
- [ ] sinh
- [X] sqrt (also in arc3)
- [ ] tan
- [ ] tanh
- [ ] tgamma
- [X] trunc (also in arc3)
- [X] Arithmetic [5/5]
- [X] Table operations [2/2]
- [X] maptable
- [X] table
- [X] Evaluation [4/4]
- [X] eval
- [X] apply
- [X] ssexpand
- [X] ssyntax
- [X] Macros [4/4]
- [X] macex
- [X] macex1
- [X] sig
This is actually a global variable, and needs to be assigned at initialization.
- [X] uniq
- [X] Basic I/O primitives (src/io.c) [5/5]
These are the base I/O functions provided by the Arcueid C
runtime.
- [X] Input [5/5]
- [X] readb
- [X] readc
- [X] peekc
Implemented in terms of ungetc
- [X] ungetc - this is not part of standard Arc
Note that there is no ungetb function. This is proving a little tricky to implement. Maybe what we should do is simplify the semantics of ungetc so that it requires a character to be unget’d, and the next call to readc OR readb (yes, readb with a ‘b’!) will return this CHARACTER. This saves us the trouble of decoding Unicode all over again, and reinforces the maxim of never mixing the b functions with the c functions.
- [X] sread (see the Arc reader above)
- [X] Output [3/3]
- [X] writeb
- [X] writec
- [X] write
- [X] File I/O [3/3]
- [X] infile
- [X] outfile
- [X] close
- [X] String port I/O [3/3]
Note that doing readb/writeb or readc/writec on a string port have the same effect. Strings are made up of Unicode characters so it would be quite messy to implement a separate ‘byte index’ into what is made up of characters.
- [X] instring
- [X] outstring
- [X] inside
- [X] Seeking / telling [2/2]
Note that these essential functions are not available in
PG-Arc for some reason but will probably be necessary to
implement CIEL.
- [X] seek
- [X] tell
- [X] Input [5/5]
- [X] Additional I/O functions (src/io.c) [8/8]
These other I/O functions are defined in standard Arc but are not
necessary for CIEL or the reader, so we do them later.
- [X] pipe-from
- [X] stdin
- [X] stdout
- [X] stderr
- [X] call-w/stdin
- [X] call-w/stdout
- [X] disp
- [X] flushout
- [X] Threads [2/2]
- [X] Creating and managing threads [8/8]
- [X] new-thread (spawn)
- [X] break-thread
- [X] kill-thread
- [X] current-thread
- [X] dead
- [X] sleep
- [X] atomic-invoke - implemented using channels
- [X] join-thread (not in standard Arc)
- [X] Channels (cf. Limbo and CSP, Arcueid extension) [3/3]
- [X] chan
- [X] <- (recv-channel) *
- [X] <-= (send-channel) *
- [X] Creating and managing threads [8/8]
- [X] Networking [3/3]
- [X] open-socket
- [X] client-ip
- [X] socket-accept
- [ ] Networking Extensions (Arcueid extension) [0/8]
- [ ] getaddrinfo (Arcueid only)
- [ ] socket (Arcueid extension)
- [ ] socket-bind (Arcueid only)
- [ ] socket-listen (Arcueid only)
- [ ] socket-connect (Arcueid only)
- [ ] socket-sendto (Arcueid only)
- [ ] socket-recvfrom (Arcueid only)
- [ ] select (Arcueid only)
This should use epoll(7) on Linux or similar functions on systems that support them. Only fall back to standard POSIX.1-2001 select(2) only if no alternatives are available.
- [X] File system operations [5/5]
- [X] dir
- [X] dir-exists
- [X] file-exists
- [X] rmfile
- [X] mvfile
- [X] Error handling and continuations [6/6]
- [X] details
- [X] err
- [X] on-err
- [X] ccc
- [X] protect
- [X] dynamic-wind
- [X] Strings [1/1]
- [X] newstring
- [X] Time [5/5]
- [X] current-gc-milliseconds
- [X] current-process-milliseconds
- [X] msec
- [X] seconds
- [X] timedate
- [X] Regular Expressions (Arcueid extension) [3/3]
- [X] regular expression input in the reader
- [X] regular expression matching [2/2]
- [X] Basic matching
- [X] Substring captures
- [X] regcomp (compile a regular expression from a string)
- [X] Miscellaneous OS operations [4/4]
- [X] system
- [X] quit
- [X] setuid
- [X] memory
- [X] Miscellaneous [5/5]
- [X] sref *
- [X] len
- [X] bound
- [X] arcueid-code-setname
- [X] declare
- [X] Basic scheduling
- [X] Suspend threads on I/O
- [X] Synchronization
- [ ] Deadlock detection
- [X] Thread control
- [ ] alt mechanism
- [X] Load all arc.arc functions
- [X] Test behaviour of all arc.arc functions
- [X] Framework for disp and write
- [X] Printers for various types [17/17]
- [X] nil
- [X] t
- [X] Numeric Types [5/5]
- [X] Fixnums
- [X] Bignums
- [X] Rationals
- [X] Flonums
- [X] Complex numbers
- [X] Characters
- [X] Strings
- [X] Symbols
- [X] Conses
- [X] Tables
- [X] Vectors
- [X] Tagged
- [X] Functions
- [X] Input Ports
- [X] Output Ports
- [X] Threads
- [X] Exceptions
- [X] Channels
- [X] Regular Expressions
- [X] Simple non-readline REPL
- [X] Read in an initial file for REPL
- [X] Readline support
We don’t plan to provide complete compatibility with Perl or POSIX. Just enough.
- [X] Basic regular expression interface
- [ ] Macro wrapping for matches
- [-] Features [5/11]
- [X] Characters
- [ ] Escaped characters
- [-] Character classes [1/3]
- [X] Basic (e.g. [A-Z])
- [ ] Perl-style character classes (\d, \s, etc.)
- [ ] Unicode property character classes
- [-] Anchors [2/7]
- [X] ^ (beginning of line)
- [X] $ (end of line)
- [ ] \A (start of string)
- [ ] \Z (end of string)
- [ ] \z (absolute end)
- [ ] \b (beginning of word)
- [ ] \B (end of word)
- [X] Kleene star
- [X] Kleene plus
- [ ] Counted repetition
- [X] Alternation
- [X] Capture groups
- [ ] Non-capturing groups
- [ ] Case-insensitive matching
- [ ] Multi-line regexes
In addition to Arc standard prf, there will also be a printf function which can be used to output strings according to a format string specified. The usual conversion specifiers for standard C printf are available, with some additional non-standard ones:
- r or m : no argument required - print the output of strerror(errno).
- v : replace by the pretty-printed form of the argument.
This is also the same format specification used by the error handler function signal_error.
The CIEL dump/restore functionality allows Arcueid to save and load workspaces by tracing the global symbol table and threads and dumping those to a file.
- [ ] gnil
- [ ] gtrue
- [ ] gint
- [ ] gflo
- [ ] gchar
- [ ] gstr
- [ ] gsym
- [ ] gbstr - binary strings
- [ ] crat
- [ ] ccomplex
- [ ] ccons
- [ ] cannotate - this is for the moment limited to creating T_CODE objects from a cons consisting of the binary bytecode string and literals
- [ ] xdup
- [ ] xmst
- [ ] xmld
- [ ] gtab
- [ ] ctadd
- [ ] additional functionality for cannotate, so that it can, you
know, actually perform type annotations…
This is a valuable enhancement as efficent string handling for very long strings will be very useful.
The current interpreter is designed with green threads, scheduled by the virtual machine rather than native threads.
We do want to someday make a statically-typed, non-garbage collected dialect of Arc similar to Richard Kelsey’s PreScheme, so we can write the entire runtime in Arc.
We will provide for format strings similar to C, but with a few extensions that make sense for Arc.
Character/string comparisons, by default use the Unicode Collation algorithm (http://www.unicode.org/reports/tr10/)? Capitalization and decapitalization should also be locale-defined. An implementation of the algorithms for doing these things appears to be ICU4C (http://site.icu-project.org). See if we can adapt the code or use it as a library.