Today we’re learning functional programming from first principles, and the vehicle through which we’re doing it is Scheme. In the process, we’ll be discovering a whole other side of computer science from the one presented in your classes, but one which is just as rich, if not richer, in elegance, applicability, modularity, joy, and understandability than the stuff you’re learning.
Functional programming is an entirely different way of thinking about programs, and has an exceptionally rich literature in academia.
What’s more, the industry has recently been embracing functional programming, with things like React and FRP being a new way to do UIs (although technically it is anything but new, this kind of stuff was elaborated in the 70s, but nobody knows this).
However, functional programming isn’t only used on the frontend. The separation between code and data offers modularity when doing data transformations for stuff like data pipelines. Continuation passing offers wonderfully expressive and powerful abstractions (which we might learn about at a later date). Advanced type theory (which is a branch of functional proramming, effectively) is used to prove properties of programs to ensure they are memory safe (no leaks and/or use-after-frees and/or null dereference and/or aliasing). The list of applications of functional programming is too long to list, because it is the the same applications as any other paradigm, i.e., programming in its entirety.
Thinking functionally lets you be exceptionally productive while ensuring that the programs you write will be modular and effective. You won’t ever write spaghetti code, because spaghetti code is in essence a problem of shared state. Functional programming forces you to be explicit in all state that you use and doesn’t let you change things willy-nilly for no reason.
Scheme is a language that is nearly as old as computers themselves. Scheme is a dialect of LISP, a language introduced by John McCarthy in the late 1950s.
It can be argued whether he “invented” it, or whether he “discovered” it. LISP is a language with incredibly simple syntax (much simpler than Java, Python, or any other language that you might use day to day) because it has a single kind of expression though which everything is built.
The syntax of lisp is composed entirely of two things: Literal objects (such as numbers or strings), symbols (meaning, words, such as variable names), and statements called “S-expressions” meaning “symbolic-expressions”, but which are often abreviated to “s-exps”
Examples of objects are:
12345 ;; A number
12.52 ;; a floating point number
#\f ;; the character "f"
#(1 2 3 4) ;; a vector constant
#t #f ;; booleans
"string" ;; string
'(a b c d) ;; a literal list of symbols
In scheme, a s-exp can either signify the calling of a function, or
a literal list. If there’s no special '
symbol before the list,
then it’s a function or macro (we’ll get into these later) call.
A s-exp looks like the following:
(f a b)
This s-exp signifies the calling of the function “f” with arguments “a” and “b”.
The only rule you need to know about s-exps is this very easy transformation:
func(a1, a2, a3)
gets turned into
(func a1 a2 a3)
And that’s literally the only difference between reading scheme code and reading any other code.
Every function is called like this, including numeric functions, such as multiplication and addition. This can be a source of great joy when you no longer have errors due to order of operations (because the order of your operations are explicit).
Why do you want this? Spot the bug:
if (hamming_weight(c1 & 0b1010) + hamming_weight(c2 & 0b1010) % 2) \
== (hamming_weight(p) % 2):
key_biases[n1 | (n2 << 4)] += 1
In LISP, a list is written like this:
(a b c d)
And, calling the function a
with arguments b c d
is written
like this:
(a b c d)
See the difference? There is none.
Why is this important? Because in LISP, code is literally data. You can write functions which take s-exps, and output code. Those are called macros.
In return for writing your code in regular syntax, you get free exceptionally powerful metaprogramming, and all you need to be able to do to take advantage of it is use list-processing functions.
What is lambda? It is a notation for creating functions. In the 1930s by Alonzo Church as part of his research into the foundations of mathematics.
Lambda calculus is an exceptionally small language, but with it, we can write programs which are equivalent to a turing machine. In particular, infinite loops and stuff.
This language has only three components.
For one, there are variables, such as
For two, there are
And finally, there are “applications”, meaning the application of a function to some argument. The argument could be whatever you’d like.
This is directly supported in scheme via the lambda
function. It
looks like this.
(lambda (a) (* a a))
This function takes an argument a
and returns the square of a
.
A lambda in scheme takes any number of arguments, but they’re divided into a so called “lambda-list” meaning the list of arguments that the function takes, and then the rest of the function, which is its body.
(lambda (arg1 arg2)
(function-evaluated-first arg1)
(function-evaluated-second arg2)
(function-evaluated-third arg1 arg2))
The last expression of a lambda is the one that is returned, so, in
this case, it returns the result of the call
(function-evaluated-third arg1 arg2)
.
Once you have a function, you can pass it around, call it, wrap it in another function, put it in a list, a dictionary, etc.
LISP, the language, originally stood as an acronym for List Processing, but over time it has grown to simply be another name.
Lisp’s list facilities are fantastically easy to use (it’s almost
as if the language was built to accomodate this???). The only three
functions you need to remember are cons
, car
, and cdr
.
;; Creating a list
(cons 'a (cons 'b (cons 'c '())))
;; [whiteboard]
;; this is equivalent to:
(list 'a 'b 'c)
;; Also equivalent to
'(a b c)
When you have a list, you can use car
to get the first element of
the list, and cdr
to access the rest of the list.
(car '(a b c)) ;; => 'A
(cdr '(a b c)) ;; => '(B C)
What’s more, you can use ‘() to denote the empty list.
Let’s write a small function to return number ranges.
(define (range n)
(letrec
((range-aux
(lambda (n)
(cond
((< n 0) '())
(#t (cons n (range-aux (- n 1))))))))
(reverse (range-aux (- n 1)))))
In a scheme file, we use “define” in the same place as you’d use
def
in python.
Let’s look at some example code. The classic example, the fibonnaci numbers.
(define fib
(lambda (n)
(cond
((= n 0) 0)
((= n 1) 1)
(#t (+ (fib (- n 1))
(fib (- n 2)))))))
This defines fibonacci as you might imagine. But, just like you might also imagine, has exponential runtime.
(fib 1) $2 = 1 scheme@(guile-user)> (fib 2) $3 = 1 scheme@(guile-user)> (fib 3) $4 = 2 [snip] scheme@(guile-user)> (fib 15) $9 = 610 scheme@(guile-user)> (fib 30) $10 = 832040 scheme@(guile-user)> (fib 40) ERROR: In procedure scm-error: User interrupt Entering a new prompt. Type `,bt' for a backtrace or `,q' to continue. scheme@(guile-user) [1]>
Let’s interactively do this on my laptop.
(define fib
(lambda (n)
(if (< n 1)
(error "invalid input"))
(letrec
((loop (lambda (i acc prev)
(if (= n i)
acc ;; if true
;; else
(loop (+ i 1)
(+ acc prev)
acc)))))
(loop 1 1 0))))
And this works as you’d expect.
scheme@(guile-user) [1]> (fib 1) $11 = 1 scheme@(guile-user) [1]> (fib 0) ERROR: In procedure scm-error: invalid input Entering a new prompt. Type `,bt' for a backtrace or `,q' to continue. scheme@(guile-user) [2]> (fib 40) $16 = 102334155
Why is this faster? It uses something called tail-call optimization.
Tail-call optimization is what lets us write loops in a functional style which are just as fast as iteration.
The essential idea behind tail-call optimization is that if the last thing you do in a function is call another function, instead of calling that function, getting the value, and then returning it to your caller, LISP can trash all your state (because you’re done your function) and then call the next function overtop of the stack space you were using.
In our case, since the arguments to loop
are stored on the stack,
when we call loop
from inside loop, instead of pushing a new
frame onto the stack, we overwrite the variables on the stack and
then simply jump to where we were previously. In fact, let’s
disassemble the function interactively to see how it was compiled.
[aside about function-caling on the whiteboard]
Lisp’s killer feature, believe it or not, is the syntax, or, more specifically, lack thereof. Because scheme/lisp code is the same as a list of symbols, a macro can be thought of nothing but a piece of lisp code which takes as input a list of symbols and returns a list of symbols which gets inserted in the spot that the macro was originally called.
So what does a macro look like? Almost the same thing as a function. I’ll be showing you how to use unhygienic macros (macros that aren’t provably correct, so you can do wacky stuff, the sky’s the limit and there’s no verification)
Here’s a macro which takes a bunch of statements and returns a block with the statements reversed therein (the . in the argument list means the rest of the arguments to the function are passed as a list bound to ‘body’).
(define-macro (backwards . body)
(cons 'begin
(reverse body)))
Very simple, isn’t it. Nothing
(in-package :sb-graph)
(eval-when (:compile-toplevel :load-toplevel)
(defvar *hook-enabled* (make-hash-table)))
(defmacro hook (fun lambda-list &body body)
(let ((ll (gensym))
(f (gensym))
(orig (gensym)))
`(let ((,f ',fun))
(when (nth-value 1 (gethash ',f *hook-enabled*))
(unhook ,fun))
(setf (gethash ,f *hook-enabled*) t)
(sb-int::encapsulate ,f 'hook
(lambda (,orig &rest ,ll)
(when (hook-enabled ,fun)
(destructuring-bind ,lambda-list ,ll
(block hook
,@body)))
(apply ,orig ,ll))))))
(defmacro disable-hook (fun)
(let ((f (gensym)))
`(let ((,f ',fun))
(when (nth-value 1 (gethash ,f *hook-enabled*))
(setf (gethash ',fun *hook-enabled*) nil)))))
(defmacro enable-hook (fun)
(let ((f (gensym)))
`(let ((,f ',fun))
(when (nth-value 1 (gethash ,f *hook-enabled*))
(setf (gethash ',fun *hook-enabled*) t)))))
(defmacro unhook (fun)
(let ((f (gensym)))
`(let ((,f ',fun))
(when (nth-value 1 (gethash ,f *hook-enabled*))
(sb-int::unencapsulate ,f 'hook)
(remhash ,f *hook-enabled*)))))
(defmacro hook-enabled (fun)
`(gethash ',fun *hook-enabled*))
;; (defun test-hook (a b c &rest d)
;; (list (+ a b c) d))
;; (hook test-hook (a b c &rest d)
;; (format t "This is a hook! ~A ~A ~A ~A~%" a b c d))
;; (unhook test-hook)
;; (hook sb-c::compile-toplevel (lambdas load-time-value-p)
;; (format t "~%Hooking the compiler. compile-toplevel:~%lambdas: ~A~%load-time-value-p: ~A~%"
;; lambdas load-time-value-p))
;; (unhook sb-c::compile-toplevel)
;; (eval-when (:compile-toplevel :load-toplevel)
;; (defvar *acc* nil))
;; (hook sb-c::compile-component (component)
;; (push component *acc*))
(eval-when (:compile-toplevel :load-toplevel)
(defvar *trace-number* 0))
(hook sb-c::ir2-convert (component)
(disable-hook sb-c::ir2-convert)
(when (and (streamp sb-c::*compiler-trace-output*)
(find :sb-graph sb-c::*compile-trace-targets*))
(let* ((pn (pathname sb-c::*compiler-trace-output*))
(out-pn (make-pathname
:host (pathname-host pn)
:directory (pathname-directory pn)
:name
(format nil "trace-~A-~A" (incf *trace-number*)
(coerce (loop for char
across
(let ((cn (sb-c::component-name component)))
(cond
((symbolp cn) (symbol-name cn))
((stringp cn) cn)
((listp cn) (format nil "~{~a~}" cn))
(t "")))
when (or (alpha-char-p char)
(digit-char-p char)
(char= char #\-)
(char= char #\_))
collect char)
'string))
:type "dot")))
(save-graph (render-graph (make-and-dfs component 9999999)) out-pn)
(when sb-c::*compile-progress*
(format *debug-io* "~%; Wrote graphviz of component ~A to ~A.~%" component out-pn))))
(enable-hook sb-c::ir2-convert))
Let’s open the guile source tree, under
module/language/cps/with-cps
and talk about how with-cps works, and what it does.