Skip to content

Qi Compiler Sync Mar 10 2023

Siddhartha Kasivajhula edited this page Jul 28, 2023 · 6 revisions

Seek and Ye Shall Bind

Qi Compiler Sync Mar 10 2023

Adjacent meetings: Previous | Up | Next

Summary

We discussed the role of bindings in Qi.

Background

Bindings are being introduced to Qi as part of the compiler work. We wanted to understand more precisely the cases in which they are convenient and in which ones they lend more power to the language in a practical sense. We discussed the matter in relation to the initial design goals for bindings, and subsequent rationale that we had come to.

Scoping Models for Bindings

Proposed Rules

The original scoping rules discussed in the design phase could be summarized this way:

(~> A (-< B₁
          B₂) C)

For an identifier id bound by A,

  • A binds references in B₁, B₂ and C.
  • If B₁ binds id too, then B₂ is unaffected, but references in C become bound by B₁
  • If both B₁ and B₂ bind id, then references in C are bound by B₂. In other words, later tines shadow earlier tines w.r.t. downstream flows (not peers, which are unaffected).

Implemented Rules

The proposed rules imply a DAG (acyclic graph) scoping model rather than the more common tree model (e.g. binding precedence rules in let-style nested scopes are tree-structured). Since syntax-spec currently only supports tree-structured scope, we adopted the following rules instead:

(~> A (-< B₁
          B₂) C)
  • A binds references in B₁, B₂ and C.
  • If B₁ binds id too, then it binds references in B₂ as well as C.
  • If both B₁ and B₂ bind id, then references in C are bound by B₂. In other words, later tines shadow earlier tines w.r.t. both downstream flows as well as peers.

Assessment

We found it hard to come up with simple cases where the DAG-structured scoping rules would allow us to express something more effectively than the tree-structured rules. On the other hand, the tree-structured rules allow some simple cases to be expressed in ways that would, for better or worse, be disallowed by the DAG structure.

In the most common cases, we seemed to conclude that identifiers are unlikely to be rebound in different tines of the same form, and indeed, the flow-oriented paradigm seems to make it less tempting to rebind the same identifier at all. So the precedence rules over tines are in practice moot, and there is no difference between the two models. In cases where one is tempted to rebind identifiers, a simple change of variable name is generally an acceptable, and even less ambiguous, option.

Yet, there are some less common cases we considered where it's possible that there could be reasons to favor the DAG structure.

Examples

Basic binding

This shows a simple example of using bindings in Qi.

(~> (1 2)
    (-< (~> list (as vs))
        +)
    (~a "The sum of " vs " is " _))
;=> "The sum of (1 2) is 3"

This names the inputs before performing some operation on them, and finally reports on the calculation in a user-friendly way. This example demonstrates how earlier flows bind later flows, but it doesn't touch upon any considerations re: DAG vs tree structure. It also isn't a case that presents any problems for Qi without bindings.

Root-mean-square

(define-flow rms
  (-< (~> count (as l))
      (~> (>< sqr) + (/ l) sqrt)))

This is an example that is valid in the tree-structured model but invalid in the DAG structured one. In the latter, as l is bound in the first tine rather than in a temporally preceding flow, it would not be available in the second tine. In the DAG model, it could be expressed as:

(define-flow rms
  (~> (-< (~> count (as l))
          _)
      (>< sqr) + (/ l) sqrt))

This second version works with the tree model as well, and indeed, either of the above is valid in the current implementation.

This example seems to suggest that employing the tree model is not dramatically different from the DAG model in practice, with the latter typically (if this example is any indication) being more constraining. On the other hand, aesthetically, flows represent a (potentially nonlinear) temporal sequence. It seems odd that two contemporaneous flows should be able to bind references in one another (and that there is a precedence order among them). Yet, if there are few practical consequences, then employing the tree model certainly seems expedient.

Lisp Interpreter

Looking at the example of the metacircular Lisp evaluator in the docs, we considered this version that uses bindings:

[lambda? (~> (as _ env)
             (-< lambda-parameters
                 lambda-body)
             (make-procedure env))]

... which suggests a new syntax for as that allows us to selectively pass through values that are not bound. Likewise, the rule for application becomes:

[application? (~> (as _ env)
                  (-< (~> operator (eval env))
                      (~>> operands (△ (eval env))))
                  apply)]

... where we selectively translate nonlinearities to bindings, arguably improving clarity.

Crossed Wires

Flows entail a clear notion of indexing. Values occupy certain positions as inputs and as outputs, and this position is semantically relevant. That is, whether an input is in the first position or the second position may translate to the flow treating it as a value of one type or another. Qi encodes information traditionally signified by identifiers into positions.

Given this nature, one thing that is hard to do in Qi is join outputs together downstream in a way that can't be accomplished by simple concatenation but instead involves some multiplexing. That is, as long as the "wires" are all non-intersecting, things are easy. But as soon as we want to "cross wires," it becomes hard.

To illustrate, here is a modified version of the simple example used earlier. Say we want to compute the sum and product of a pair of numbers and display the results as "The sum of (a b) is c and the product of (a b) is d." An interesting feature of this example is that the input (a b) features twice in the output and these occurrences aren't next to each other. Qi's select form could be used to achieve this kind of crossing of wires.

(~> (2 3)
    (-< list
        +
        *)
    (select 1 2 1 3)
    (~a "The sum of " _ " is " _ " and the product of " _ " is " _))

With bindings, we could make things more explicit by naming the components of the computation:

(~> (2 3)
    (-< (~> list (as vs))
        (~> + (as sum))
        (~> * (as product)))
    (~a "The sum of " vs " is " sum " and the product of " vs " is " product))

In this case, the difference is not that stark although some may consider the second version more clear by virtue of being more explicit. The next example explores a more complex case where bindings are compelling.

"State Monad"

We looked at this example some time ago and the analysis there is sufficiently clear. But there are a few additional things to note for our purposes here. First, let's look at the code:

(~> (5)
    (-< _ (~> list (as S1)) (~> list (as S2)))
    (-< sqr (~>> list (append S1) (as S1)) (~>> list (append S2) (as S2)))
    (-< add1 (~>> list (append S1) (as S1)) (~>> list (append S2) (as S2)))
    (-< _ (gen S1) (gen S2)))

From the standpoint of understanding why someone might conceivably want to rebind the same variable name in different tines as opposed to using different names (i.e. one of the main differences we noted above re: DAG vs tree models), this example seems promising since the different columns all involve the same idea of "state" that we want to keep track of. In this case, we had to name the state variables S1 and S2 in order to avoid clobbering the input in the second and third columns. That proves to be useful in the end, however, since we are able to refer to the distinct outputs of columns 2 and 3 as S1 and S2. Otherwise, with something like this:

    (-< _ (~> list (as S)) (~> list (as S)))

... there would be some ambiguity about what S refers to when we finally need it here:

    (-< _ (gen S) (gen S)))

By the DAG binding rules, the S here would refer to the S bound by the last tine, so that in addition to the output of the original flow, this would produce "S2" twice.

Instead, another speculative possibility could be to allow the different tines of a tee junction to bind in different binding spaces. Then, by introducing a var form to refer to bindings, we could potentially support something like (var S some-space) to refer to the same binding in different spaces.

If we are explicitly declaring the space while binding, that seems like it wouldn't be much different than just using a different variable name like S1 and S2, so that suggests the requirement that the different tines should implicitly use different spaces in a deterministic way. This naturally implies an indexing, so that the different tines of a tee junction always use space qi-1, qi-2, ...

That would allow the example to be written as:

(~> (5)
    (-< _ (~> list (as S)) (~> list (as S)))
    (-< sqr (~>> list (append S) (as S)) (~>> list (append S) (as S)))
    (-< add1 (~>> list (append S) (as S)) (~>> list (append S) (as S)))
    (-< _ (var S 2) (var S 3)))

If we don't specify a space, by using (var S) or just (gen S) here, then it would use the default binding space. So in this code if we had used

    ...
    (-< _ (gen S) (gen S)))

... then assuming it is following the DAG binding rules above (so that the two columns are independent), it should produce two S2's. That is, even though the bindings are in separate spaces, in the default space, later tines still shadow earlier tines w.r.t. downstream flows. Alternatively, a reference to an identifier bound in more than one space that does not indicate the space could be considered an "ambiguous reference" and flagged as a compile-time error.

Would this behavior be desirable? One possibility is it might make it easier to write macros that do these kinds of things if we don't have to worry about shadowing.

Next Steps

(Some of these are carried over from last time)

  • Prepare the Organize benchmarks PR for review and merge it
  • Write validating non-local benchmarks alongside each compiler optimization that will be written.
  • Rebase the "first optimizations" branch and resume writing initial compiler optimizations to have the basic layout of optimizations in place.
  • Add docs for syntax-spec and put it on the package index.
  • Review other bindings cases that we could support next (e.g. selective binding, packing values, and binding exceptions in try).

Attendees

Michael, Sid

Clone this wiki locally