-
Notifications
You must be signed in to change notification settings - Fork 63
Design Wiki
These are design notes roughly in the order that we would like to deal with the challenges.
Patterns can compile themselves into the required test.
(LiteralPattern. true) ; -> (= y true)
(SeqPattern. foo) ; -> (seq? x)
(VectorPattern. foo) ; -> (vector? x)
(PredicatePattern. even?) ; -> (even? x)
Quoted symbols are simply symbols, unquoted symbols are scoped.
(let [y "a"
x "a"]
(match [y]
[x] 1)) ;;; x is scoped
;=> 1
(let [y 'x]
(match [y]
['x] 1)) ;; 'x is 'x
;=> 1
(defn match-seq [x]
(match [x]
[()] 1
[(1)] 2
[(1 _)] 3
[(1 & _)] 4))
(match-seq [])
;=> 1
(match-seq [1])
;=> 2
(match-seq [1 2])
;=> 3
(match-seq [1 2 3])
;=> 4
(def matrix (build-matrix [x y z]
[(1 2) 3 4] 1))
(specialize matrix (seq-pattern '(1 2)))
;; should produce a new matrix with a row that looks like
;; [1 2 <crash> 3 4] :occurrences [x0 x1 xr y z]
This new matrix should have a constraint that enforces left right order until the seq occurrences are cleared. Simple way, we could put metadata on the seq occurence symbols ^:seq-occurrence, if the first occurrence has this metadata we always specialize the first column.
Perhaps we should also add metadata on the occurrence so we know how to bind this occurrence in the switch node.
^{:bind-expr '(if (nil? x)
(fail-node)
(let [x0 (first x)
x (rest x)]))}
Vectors support random access, there's no reason to not use the full power of lazy pattern matching semantics on them.
(defn match-seq [x]
(match [x]
[ [] ] 1
[ [1] ] 2
[ [1 _] ] 3
[ [1 & _] ] 4))
Note that seq patterns and vector pattern can work with either datatype, but using the correct datatype will deliver considerable performance advantages.
Exploring how to implement rest pattern has shown that we should probably re-assess our current approach to seq pattern matching. It will actually make things considerably simpler. The basic problem is that we were considering seq a multi-arity constructor instead of a always of the form (cons x seq)
.
This change means that we should carefully consider what rest pattern means when used with vectors. It seems like the constructor here is (concat vector seq)
. We can pattern match using lazy evaluation semantics for everything up to the first rest pattern. We can use subvec.
[1 2 3 4 5 & rest]
[1 2 3 4 6 & rest]
[1 2 & rest]
Then the pattern matrix might expand like this:
[ [1 2] [3 4 5 & rest] ]
[ [1 2] [3 4 6 & rest ] ]
[ [1 2] rest]
;; =>
[ [1 2] [3 4 5] rest]
[ [1 2] [3 4 5] rest]
[ [1 2] rest]
This approach seems interesting.
(defmatch simple-binding [x]
[b] b))
(simple-binding 1)
;=> 1
(defmatch seq-binding [x]
[ [1 _ b] ] b))
[ [b] ] b))
(seq-binding [1 2 3])
;=> 3
(seq-binding [4])
;=> 4
You should be allowed to write the above pattern match and similar patterns like so:
(defmatch seq-binding [x]
[1 _ b] b))
[b] b))
(defmatch map-match [x]
[{:a 'foo :only [:a]}] :a0
[{:a 'foo}] :a1
[{:a 'foo :b 'bar}] :a2
[{:a 'bar :b 'foo}] :a3)
(map-match {:a 'foo} 1)
;=> :a0
(map-match {:a 'foo :bar 'wob})
;=> :a1
(map-match {:a 'foo :b 'bar})
;=> :a2
(map-match {:a 'bar :b 'foo})
;=> :a3
Say we have the following match expression:
(match x
{1 :a _ :b} :a0
{_ :a 1 :c} :a1)
This should expand out into a pattern matrix that looks like the following:
k0 k1 k2
[1 _ _ ] :a0
[_ _ 1 ] :a1
The occurrences will have the appropriate binding expression attached to them via metadata.
Consider the case of only:
(match x
{1 :a _ :b :only [:a :b]} :a0
{_ :a 1 :c :only [:a :c]} :a1)
Then we have the following pattern matrix where % represents the crash pattern:
k0 k1 k2
[1 _ % ] :a0
[1 % _ ] :a1
The crash pattern will hold the constraint. crash-pattern should in general cause a column to be tested only after columns without crash-patterns have been tested first.
I think we need to solve the maximal sharing issue sooner rather than later as this effects compilation to SwitchNodes. At the moment we naively compute the decision tree of SwitchNodes. This means that when we compile to Clojure source we'll produce a very large amount of code.
What's our strategy to "collapse" tests together? Maranget's paper suggests implementing maximal sharing via hash-consing. Type-Safe Modular Hash-Consing
As far dealing with recur in the presence of maximal sharing I'm leaning towards assigning values into atoms. The performance hit seems acceptable. Definitely open to better ideas.
For example of where maximal sharing comes into play consider:
(match [x y]
[ [1 2 3] 4] :a0
[ [1 2 3] 5] :a1)
Imagine that the algorithm decides that y is the first column to test. We then have a switch node on y with multiway test for its value - 4 or 5.
(cond
(= y 4) ...
(= y 5) ...)
Now we'll see code duplication in both branches for testing x. Maximal sharing would avoid this.
Syntax proposal: or patterns are wrapped in an extra pair of parenthesis.
(match [x]
[ [10] ] :a0 ;; normal row
([ [1] ]
[ [4] ]
[ [2] ] :a1) ;; or'ed row
([ [2] ]
[ [5] ]
[ [3] ] :a2)) ;; or'ed row
Syntax Proposal: wrap row in parenthesis, and add :guard and vector of guards
(match [x]
[ [(1 :as m) (2 :as e)] ] :a0 ;; normal row
([ [(1 :as m) (2 :as e)] ] :guard [(even? m)
(odd? m)]
:a1) ;; row with guards
http://adam.chlipala.net/mlcomp/ http://learnyousomeerlang.com/syntax-in-functions http://learnyouahaskell.com/syntax-in-functions
(match [x]
[ [(1 :as m) (2 :as e)] ] :a0 ;; normal row
([ [(1 :as m) (2 :as e)] ] :guard [(even? m)
(odd? m)]
:a1) ;; normal guard
( [[2] ] :guard [(even? 1)
(odd? 2)]
[ [5] ]
[ [3] ] :a2) ;; 3 or-ed patterns, first is guarded
([ [9] ] :guard [(even? 1)
(odd? 2)]
[ [7] ]
[ [16] ] :guard [(even? 12)
(odd? 22)]
:a3)) ;; 3 or-ed patterns, first and 3rd guarded
Rough AST
<match> ::== ( match <vars> <patternRow>+ )
<patternRow> ::== <matchRow> <result>
| ( <matchOrGuardRow>+ <result> )
<matchOrGuardRow>
::== <matchRow>
| <matchRow> :guard <guardList>
This is really the static versus dynamic problem. Currently everything we're doing is very much static in nature - compile time. In order for match to be open we will need to convert the decision tree to a DAG which can be traversed at runtime (we could use eval, but we want to explore avoiding that)
If we do have a runtime DAG we'll need to consider how many of the pattern matching concepts translate. In particular new bindings. This seems to mean that we'll need to push an environment around that has very efficient lookup times.