Skip to content

Qi Compiler Sync Dec 1 2023

Siddhartha Kasivajhula edited this page Dec 19, 2023 · 6 revisions

Manic Macromancy

Qi Compiler Sync Dec 1 2023

Adjacent meetings: Previous | Up | Next

Summary

This was a long one! With the release of Qi 4 being on the horizon, there was a lot of ground to cover. We reviewed recent work that was merged that was related to deforestation, error reporting, reducing "chirality" to a core language form, and testing the compiler. We dove deeper into some of the broader issues touched by these, including error messages in Racket and elsewhere in Qi. We picked up on some recent discussions around generalizing deforestation of lists to abstract sequences, and how alternative semantics for flows could be organized around the idea of "compiling to categories." We reviewed the current latency of (require qi) to see if there has been any regression due to the deforestation work, fixed an issue with reporting expansion events in the compiler for debugging purposes, reviewed outstanding TODOs in the code, reviewed the use of side effects in Qi and uncovered a possible regression. We also reviewed the interaction of bindings with conditional form like switch and uncovered a possible blocker for release. Whew!

Background

Last time, we noted that deforestation specifically operated on right-chiral forms and did not operate on fine-grained or blanket templates. We also discussed some challenges with error reporting and with testing the compiler. In the interim, we worked on and merged many of these things:

Deforestation improvements

We discussed recent work improving the deforestation implementation to add support for fine grained templates. We also reduced chirality to an explicit use of a blanket template, and that simplified the patterns needed in deforestation. A noteworthy feature of the current implementation is that through some skillful "macromancy" in functions like generate-fused-expression, extending the core implementation to new list interfaces is just a matter of adding and populating a pattern in the appropriate syntax class -- either the producer, transformer, or consumer class -- and then, of course, providing a corresponding stream implementation.

Taking error messages seriously

With help from Sam Tobin-Hochstadt, Dominik extended the use of the low-level contract API to retain good error messages even with all of the new permutations of patterns introduced to handle fine templates in stream producers.

A typical error message in a deforested pipeline now resembles:

> (~>> (3 1) (range 5 10) (filter odd?) (map sqr))
contract violation
  received: 4 arguments
  expected: 1 to 3 non-keyword arguments
  in: (->* (real?) (real? real?) any)
  contract from: #<procedure:range>
  blaming: (~> (range 5 10 __) (filter odd? __) (map sqr __))
   (assuming the contract is correct)
  at: <pkgs>/qi-lib/flow/extended/forms.rkt:61:2
 [,bt for context]

In addition to using the low level contract API to insert an appropriate contract at the beginning of the deforested pipeline, this also leverages a "de-expander" to provide error messages in terms of a reconstructed version of the actual Qi expression typed by the user, rather than in terms of the Qi core language. There is some "lossiness" in this reconstruction, but it is sufficiently representative to help the user correctly identify the source of the error. In the future, we would hope to be able to preserve the original source expression through its expansion by syntax-spec, perhaps as a syntax property on the expanded expression.

Even better than Racket and the rest of Qi?

It turns out that this error is even better than what Racket natively provides for such a (mis-)use of range!

> (range 5 10 3 1)
range: arity mismatch;
 the expected number of arguments does not match the given number
  given: 4
 [,bt for context]

Here, the context of the error is missing, and even the expected arity of the function is not mentioned (Ryan made a note to investigate this as part of a broader survey of error messages in Racket). A boast was even heard that these errors are now better than those produced by the rest of (non-deforestation-related) Qi! Well! [Makes a note to review error messages in Qi 😄 ]

Chirality reduction

Recently, we had noted that of the three partial application styles supported by Qi -- implicit chirality, fine templates and blanket templates -- the first, chirality, could be reduced to a use of a blanket template. We felt that doing this as part of expansion would allow the compiler to reason in terms of an explicit core language rather than have to parse "subterranean" properties that are not visible in the surface syntax.

Reducing it did eliminate one form from the core language (#%partial-application) and ended up simplifying deforestation patterns as well.

A possible regression?

However, we noticed this:

(~>> () (range 10) (effect displayln (map sqr)) car)

(0 1 2 3 4 5 6 7 8 9)
; map: contract violation
;   expected: procedure?
;   given: '(0 1 2 3 4 5 6 7 8 9)
; Context (plain; to see better errortrace context, re-run with C-u prefix):
;   /Applications/Racket-Latest/collects/racket/private/map.rkt:257:2 gen-map
;   /Users/siddhartha/work/qi/qi-lib/flow/core/compiler.rkt:437:9

We assumed at first that this was a regression, and that the chirality, when reduced to a blanket template, wasn't being propagated to the nested (map sqr) form. But upon further reflection, this is likely already the case in the main branch, and relates to preserving the threading direction in nested forms, which we currently don't do.

The workaround here is to explicitly wrap the nested form with ~>>, which otherwise defaults to left-threading, or to simply use a template:

(~>> () (range 10) (effect displayln (map sqr _)) car)
(0 1 2 3 4 5 6 7 8 9)
0

Given that this has been apparent for a long time as something that might be desirable, and the fact that it obviously surprised us in this instance, it seems to be sufficient evidence that, indeed, we do want the threading direction to be preserved in a "flip-flop" manner, that is, the chirality of a threading form should be preserved through arbitrary nesting, unless overridden by an inner threading form. Chirality should resemble lexical scope in this regard.

The question is, what would it take to accomplish this? As the original issue mentioned, one way we could do it is to have every Qi form, including macros, propagate the syntax property to component forms. This seems gratuitous, and would also impose the requirement to handle chirality on writers of Qi macros. A better way that we had discussed some time ago could be to use a syntax parameter.

For now, since this isn't a regression, this can safely be addressed after the Qi 4 release. Additionally, as syntax parameters are not yet (as far as we know) supported in syntax-spec, we wouldn't be able to solve this the "right" way yet, anyway.

Literal vs Datum matching

Pattern matching on Qi forms in the compiler generally uses datum matching, since we don't have access to the core form literals defined by the expander. But it turned out that we are doing literal pattern matching in deforestation, for instance against thread, which it turns out is matching the multithreading-related thread from racket/base. Evidently this isn't causing any problems at the moment and is just "Interesting, with a capital I."

Reviewing Performance

Deforestation

 (filter-map 1.36)
 (composition 1)
 (range-map-sum 0.72)
 (long-functional-pipeline 0.68)
 (filter-map (large list) 0.43)
 (filter-map-foldl 0.29)
 (filter-map-foldr 0.26)
 (range-map-car 0.01))

With the addition of error reporting, performance is slower than before (e.g. filter-map (on a small list) was formerly around 0.3). But functional pipelines are still overall faster than Racket and will likely only get faster over time. We also consider good error messages here to be worth the performance cost.

There are a couple of specific contributors to slowness that could be further explored in the future, including arity checking that happens prior to the entry contracts being checked, which seems to account for the majority of the cost from the addition of contracts. There are also some uses of apply that could be revisited, to see if they could be avoided.

Require latency impact?

Some time ago, we spent a lot of time reducing the latency involved in a call to (require qi). In response to complaints from users, we managed to reduce the latency from 500+ ms down to around 170ms. Adding the syntax-spec dependency increased it to about 250ms but we were able to bring it back down to 189ms. We hadn't checked it in a while, and wondered if the deforestation changes might have had an impact.

$ raco require-latency qi
201 ms

Not too bad!

Improved testing

Continuing our ongoing explorations of good ways to test the compiler, we implemented a handy test-normalize macro, inspired by suggestions from the Racket community:

(define-syntax-parse-rule (test-normalize msg a b ...+)
  (begin
    (check-equal? (syntax->datum
                   (normalize-pass a))
                  (syntax->datum
                   (normalize-pass b))
                  msg)
    ...))

... which allows us to test normalization rules easily, like:

(test-normalize "left and right identity for ~>"
                #'(thread f _)
                #'(thread _ f)
                #'f)

It would be ideal if we could do this kind of testing by expanding the surface language on demand, something like:

(test-normalize "left and right identity for ~>"
                #'(~> f _))
                #'(~> _ f)
                #'f)

... where the test-normalize macro would be modified to use an expand-qi function to invoke the Qi expander to expand the test expressions prior to normalizing them. That would avoid the need to know the details of expansion (such as the thread core form) in these tests. But due to various phase interactions, we haven't been able to get something like this to work yet.

Having such an expand-qi function that could be invoked at runtime would also be valuable to develop a test suite for the expander, which we currently do not test directly.

check-equal? vs test-equal?

Qi's unit tests make heavy use of test-suite to organize collections of tests, and check-equal? for individual tests. But it turns out that check-equal? is a primitive assertion that isn't properly considered a "test case," and we should be using the test-equal? macro instead. The output from a failing test in each of these cases shows why:

check-equal?:

~> > routing forms > flow tests > Unnamed test
FAILURE
name:       check-equal?
location:   qi-test/tests/flow.rkt:439:5
message:    "basic threading"
actual:     9
expected:   10

test-equal?:

~> > routing forms > flow tests > basic threading
FAILURE
name:       check-equal?
location:   qi-test/tests/flow.rkt:439:5
actual:     9
expected:   10

With test-equal?, we are able to see exactly which test failed as it is considered a "named test case," although of course, we can still use the line number in either case. check-equal? does allow us to be lazy and leave out a name if we can't think of a good one, which, to be fair, isn't always easy.

One odd thing in either case here is that the organization of the test is really flow tests > routing forms > ~> > basic threading -- i.e. the hierarchy prior to the actual test itself is reversed. Perhaps we should file a bug report on rackunit.

Test Coverage

Running make cover reports that coverage is now at 72%! For comparison, it is currently at 95% on the main branch. The actual coverage report shows uncovered portions of the code that clearly are covered in our tests, so we're not sure what the issue is. We are hoping to reach 100% test coverage with this release, or at least soon after.

Generalizing Fusion and Typed Qi

Nia recently brought this up for discussion on Discord, that by thinking about producers, transformers and consumers in category theoretic terms, we could generalize our implementation of stream fusion to arbitrary sequences. We discussed this at a high level, and agreed that although we could explore this in standard Qi, it would be made much more tractable if we had Typed Qi. We agreed that that would perhaps be the most natural next step here. There has been interest in a Typed Qi in the community, so it's not unlikely that the momentum will be enough to coalesce into an actual effort on this front in the not too distant future, especially if we are able to exhibit tangible benefits to be reaped, such as this kind of generalized deforestation.

Compiler extension mechanism

But we'd really like to be able to do this even in regular Qi, without having to wait for Typed Qi! Maybe the approach wouldn't be truly general and abstract, but for every specific data type we'd like to deforest, it seems feasible to be able to provide a custom implementation that achieves it for that type. Recently we've talked about "compiler macros" and "towers of languages" as ways to achieve this.

There are numerous benefits of defining such an extension mechanism, including keeping the footprint of the main Qi language as small and general as possible. We don't want to include custom implementations for specialized data types in the main Qi compiler that are unlikely to be widely used.

We are not aware of any existing language that offers this kind of "compiler extensibility." That's understandable, since exposing a compiler to this kind of extension cannot be done as casually as, for instance, providing macros. Macros expand to a known surface language that is publicly exposed, and which is static in the sense that it comes with any backwards compatibility guarantees the language authors may provide.

But exposing the compiler to extension is different in that authors of such "compiler macros" would need to not only know details of expansion, but also at least one detail of compilation -- "Qi Normal Form." These are typically considered internal implementation details, and as such, no backwards compatibility guarantees are typically provided. Thus, users cannot reliably write such compiler macros.

We could take the position here that compiler macros aren't for regular users. They're for library authors. As such, they follow release cycles that could be coordinated with those of Qi, and in general, many of these packages are likely to be authored by core Qi developers, for instance, the proposed deforestation of racket/list could be such a library. By doing it this way, we gain the small footprint of the core while still being able to offer a broad swath of optimizations.

Sketch of an API

As this hasn't been done before (as far as we know), we aren't sure exactly what challenges we might run into, but we discussed some high level things the API would need to offer:

  • a user should be able to specify and register a compiler pass, which would then be performed as part of compiling Qi
  • they may need to indicate the placement of the pass in the full pipeline of compiler passes. This may take the form of a specific index, or a way to say whether it comes "before" or "after" certain named passes.
  • a way to indicate whether the pass is a "one-shot" optimization or one that should be recursively applied until it reaches a fixed point.

Tower of languages?

Yet, perhaps a simpler way to achieve it could be to use the tower of languages approach we've talked about before. Earlier, we had concluded that this approach would be better than offering compiler macros, but that they also don't necessarily conflict with one another so that offering both could be possible. But after discussing with Michael recently, it emerged that syntax-spec for now assumes that the language is compiling to Racket and not another language like Qi. This means that we cannot directly implement the tower of languages but only do so in an indirect way, as follows:

  1. Define L₀ (e.g. Qi) compiling to Racket.
  2. Define L₁ (e.g. qi/list) compiling to a use of L₀ in Racket (e.g. (flow ...)).

In this case, for instance, both Qi and qi/list would be languages that compile to Racket. Both would use syntax-spec to implement their expander and insert a compiler. In its compiler, qi/list would include deforestation of all of racket/list, but otherwise leave Qi source expressions unchanged. The result of compilation would be received by Qi's flow macro either as untouched Qi code or in fully compiled (for use of functions in racket/list) form wrapped by an esc.

The advantage of this is that this achieves our goals and can already be implemented, and as we talked about last time, it avoids the issues with exposing internal implementation details to library authors that afflict the "compiler macros" approach. The disadvantage is that the compilers would not compose in nontrivial ways. For example, if the higher level language happened to compile an expression to a threading form:

(~> qi ... (dsl-on-top-of-qi (~> x y)) qi ...)
→
(~> qi ... (esc (flow (~> (f x) (f y)))) qi ...)

... it would not compose with any optimizations we may already have for the threading form, i.e. it would not reduce to this:

(~> qi ... (f x) (f y) qi ...)

... (and then be further optimized) due to the use of esc.

In what cases would this be acceptable? Is it OK for qi/list? How about for the dataframe type that we discussed recently in this connection?

Would it be overstepping to detect literal use of (esc (flow ...)) and unwrap the flow in this case for the lower-level compiler to treat as a "native" expression that could be optimized?

These remain to be determined.

Alternative semantics for flows

We recently talked about alternative semantics for flows, and Noah and Nia brought up that categories could, once again, be the natural way of thinking about the scope of this idea. In particular, we discussed the organizing idea, "compiling to categories," that flows could be seen as morphisms, and then essentially every category corresponds to a distinct compiler for the Qi language.

"I/O Qi"

Some time ago, Dominik made some improvements to Racket that allowed ports to be composed in a similar way to functions. This came up in the discussion and we felt that this kind of composition of ports could underlie yet another dialect of Qi (in addition to the "process," "streaming," and "live" dialects, among others, that have already been catalogued).

This dialect could potentially resemble Unix's shell pipeline implemented within Racket. If it's very similar to this, then we could approximate this today by using Rash, and composing shell operations with Qi flows.

But more generally, the idea that even I/O, typically seen as bidirectional, could be composed in a directed, immutable way, is a fascinating prospect (it incidentally reminds me of Madhyamaka philosophy).

Reviewing TODOs in the code

As we are nearing release, we decided to review any TODOs noted in the code, and see if they were still relevant. Many were just outdated, but one was about our "expansion events" macro, and its use interfering with the ability to unit test the compiler:

   ;; TODO: move this to a common utils module for use in all
   ;; modules implementing optimization passes
   ;; Also, resolve
   ;;   "syntax-local-expand-observer: not currently expanding"
   ;; issue encountered in running compiler unit tests

We fixed it by just adding a level of indirection -- a core function that actually does the work and is unit tested (in the runtime phase), and a calling function that simply invokes it and also emits expansion events (in the expansion phase).

Another item was:

      [(_ n:number)
       ;; a slightly more efficient compile-time implementation
       ;; for literally indicated N
       ;; TODO: implement this as an optimization instead
       #`(λ args
           (apply values
                  (append #,@(make-list (syntax->datum #'n) #'args))))]

This happens in the code generation step (i.e. qi0->racket) for fanout and produces an optimized implementation in the case where there is a literally indicated number. We agreed that it doesn't really help to implement this as a compiler optimization, at least for now, since we don't have anything in our core language, or any intermediate representations, that would be appropriate to use here, aside from writing out the full Racket expression above. Doing that would amount to just moving this code to a different stage in compilation, which may even interfere with the ability to apply other optimizations. For instance, we might be able to define a rule where (~> (fanout 3) (select 1)) is deforested, but if fanout has already been rewritten to Racket, that would preclude this optimization.

  ;; TODO: use a box instead of set!

This is in the implementation of bindings in the compiler, from an old suggestion by Ben. We felt that set! serves the purpose for now, but it may be worth revisiting and benchmarking this somewhere down the line.

The ol' bind and switch

When we were defining the rules for bindings in Qi, we considered what would happen in some nonlinear flows, specifically, what should happen if variables were bound in some tines of a tee junction but not in others. We settled on tines binding subsequent peers as well as downstream flows, with subsequent tines shadowing preceding ones.

But what we didn't discuss (or at least, if we did discuss, then we promptly forgot about!) was what should happen in conditional flows like switch, which may bind in some cases but not in others. How should downstream references, or even references in peer clauses, behave? [In fact, we did discuss it here: Declaring Scoping Rules]

We tried it to see what would happen right now, and this is what we saw:

> (~> (-3)
      (switch
        [positive? (as v)]
        [else sqr])
      (+ v 5))
; v: undefined;
;  cannot reference an identifier before its definition

OK, that seems fine. What about the case when it is bound?

(~> (3)
    (switch
      [positive? (as v)]
      [else sqr])
    (+ v 5))
; v: undefined;
;  cannot reference an identifier before its definition

Yikes. Typically, when an as binding form is encountered during compilation, we identify the outermost threading form and then introduce a let form at that level, identifying the scope of the binding with that of this outermost threading form. It appears that we may not be doing that when the threading form occurs within a switch form, so that the scope of the binding is confined to the specific clause in which it appears.

Now in the case of core forms like -<, we enabled bindings to escape the scope of these forms to a containing threading form by notating that appropriately in the syntax-spec expander. But switch isn't a core form, so it will likely come down to doing the same thing but for the core form to which it expands, i.e. if.

While this isn't a regression on any existing functionality on the main branch, we agreed that releasing it this way would then represent behavior whose fix would constitute a "regression" on the erroneous behavior. On these grounds, we agreed that this should be considered a blocker for release.

Next Steps

(Some of these are carried over from last time)

  • Fix the "bind and switch" issue, and identify whether there are any other forms that bindings do not escape.
  • Review the use of explicit side effects in Qi wrt optimizations.
  • Comprehensively review error messages in Qi
  • Update the main documentation to reflect the compiler work
  • Update the developer documentation with details of the compiler implementation
  • Review compiler normalization pass for correctness
  • Add additional nonlocal benchmarks to validate all cases optimized by stream fusion (e.g. for range).
  • Add test cases for the backwards incompatibility on datum vs literal matching, and review whether there is a way to avoid errors here by favoring the Qi binding space in the expander.
  • Prototype and then propose map1 for addition to Racket.
  • Validate that optimization passes can prematurely terminate if they don't return false when they don't transform the input (and then fix it).
  • Implement streams for the other list operations described in St-Amour's writeup (i.e. unfoldr zip average). Note: implementing average might benefit from having "multi-valued CPS" for deforestation.
  • Write unit tests for more compiler rewrite rules, as well as end-to-end tests for pairs of input expressions, and sanity tests for the expander.
  • Find a way to reliably detect list-like operations on values.
  • Rewrite these list-like operations to list-based functional pipelines, e.g. (~> (-< (>< f) (>< g)) ...)
  • Generalize the stream implementation to support multiple values at each step (in addition to zero or one) and recheck the performance.
  • Review other core Qi combinators besides ~> (e.g. -<, == and ><, maybe if and pass) and see whether it would make sense to extend fusion to them.
  • Should Qi provide a map utility (overriding the default one) that supports "0 or more" values (as was discussed in the RacketCon talk)?
  • Design a core continuation form(s?) for Qi, and show how try can be implemented in terms of it.
  • Describe the sequence of compilation for input expressions with the optimizing compiler against the reference compiler, showing cases in which these do not terminate.
  • Come up with a methodology to formalize and theoretically validate compiler behavior.
  • Review other bindings cases that we could support next (e.g. selective binding, packing values, and binding exceptions in try).

Attendees

Dominik, Jorge, Nia, Ryan, Sid

Clone this wiki locally