Skip to content

Qi Compiler Sync Sept 15 2023

Siddhartha Kasivajhula edited this page Sep 23, 2023 · 5 revisions

A Long Stream is Not Easily Exhausted

Qi Compiler Sync Sept 15 2023

Adjacent meetings: Previous | Up | Next

Summary

We discussed some useful research projects that could be done around Qi such as proofs of correctness of compiler transformations, and resumed working on deforestation and completed the incorporation of foldr as a fusable stream terminator and saw good performance. In general, we expect that the longer the stream, the greater will be the benefit from deforestation. We also found a problem where the deforestation optimization is being applied in a case where it shouldn't be. We also looked at some examples of syntactically "splicing" cases into conditional expressions, and also some previews of adding visualizations to flows (already implemented for Actor Basic).

Background

Last time, we began implementing foldr as a fusable stream terminator and got it working in isolation. We picked up on that this time. We also discussed a few other things.

Improved testing and benchmarking

Testing and validating compiler optimizations was proving to be cumbersome so we had discussed recently that it would be good to have three separate things:

  1. Unit tests to validate language semantics
  2. Benchmarks to validate performance
  3. Unit tests to validate compiler rewrite rules

Of these, we had the first two, but the absence of the last meant that we could test the code and run benchmarks and still not be sure that an optimization had actually been performed. We added a new starter test suite for the compiler, which checks specific rewrite rules:

(test-suite
 "compiler tests"

 (test-suite
  "deforestation"
  ;; (~>> values (filter odd?) (map sqr) values)
  (check-equal? (syntax->datum
                 (deforest-rewrite
                   #'(thread values
                             (#%partial-application
                              ((#%host-expression filter)
                               (#%host-expression odd?)))
                             (#%partial-application
                              ((#%host-expression map)
                               (#%host-expression sqr)))
                             values)))
                '(thread
                  values
                  (esc
                   (λ (lst)
                     ((cstream->list
                       (inline-compose1
                        (map-cstream-next
                         sqr)
                        (filter-cstream-next
                         odd?)
                        list->cstream-next))
                      lst)))
                  values)
                "deforestation in arbitrary positions")))

We agreed that it would take some thought to identify the simplest expressions that validate the rewrite rules while avoiding a combinatorial explosion of rewriting cases to test.

We also added a benchmark for a functional pipeline using foldr:

Qi:

(define-flow filter-map-foldr
  (~>> (filter odd?)
       (map sqr)
       (foldr + 0)))

Racket:

(define (filter-map-foldr lst)
  (foldr + 0 (map sqr (filter odd? lst))))

Folding streams

We added the foldr stream implementation that we wrote last time to the compiler:

(define-inline (foldr-cstream op init next)
  (λ (state)
    (let loop ([state state])
      ((next (λ () init)
             (λ (state) (loop state))
             (λ (value state)
               (op value (loop state))))
       state))))

... added a syntax class to match a use of foldr in a functional pipeline:

(define-syntax-class fusable-fold-operation
  #:attributes (op init end)
  #:datum-literals (#%host-expression #%partial-application)
  (pattern (#%partial-application
            ((#%host-expression (~literal foldr))
             (#%host-expression op)
             (#%host-expression init)))
    #:attr end #'(foldr-cstream op init)))

... added another match pattern and template to the generate-fused-operation compiler function:

[(g:fusable-fold-operation op:fusable-list-operation ...)
 #`(esc (λ (lst)
          ((#,@#'g.end
            (inline-compose1 [op.next op.f] ...
                             list->cstream-next))
           lst)))]

... and finally, a match pattern and template to the deforestation rewrite rule:

[((~datum thread) _0:non-fusable ...
                  f:fusable-list-operation ...+
                  g:fusable-fold-operation
                  _1 ...)
 #:with fused (generate-fused-operation (syntax->list #'(f ... g)))
 #'(thread _0 ... fused _1 ...)]

We ran tests and verified that everything was passing, including this new test:

(check-equal? ((☯ (~>> (filter odd?) (map sqr) (foldr + 0)))
               (list 1 2 3 4 5))
              35)

We ran the fold benchmark and got results like these:

$ ./report-competitive.rkt -s "filter-map-foldr"

Running competitive benchmarks...

Running Racket benchmarks...
filter-map-foldr:
1000 ms

Running Qi benchmarks...
filter-map-foldr:
264 ms

Performance relative to baseline:
((filter-map-foldr 0.26))

... a 4x improvement on large lists (and even more on small lists)!

This is in line with our expectation that the speedup will be greater with longer functional pipelines. For comparison, the performance improvement on large lists that we saw recently was about 3x for a 2-stage pipeline.

Qi Normal Form and Proving Correctness

We had talked about "Qi Normal Form" recently as a normalized form of Qi expressions that would be the first pass of the compiler, so that subsequent passes could operate from a common representation and avoid duplication. At the time we had used this name almost in jest, but we're realizing it could be a useful idea to develop more concretely, since in addition to simplifying the implementation of compiler optimizations, it could also be a starting point for formally reasoning about the compiler, and for proving correctness of compiler transformations, which would be valuable.

With respect to proving correctness, it would be helpful to think about there being two compilers:

  1. A reference compiler
  2. An optimizing compiler

The reference compiler for the most part is simply the one derived from what's in the main branch -- those are the semantics we seek to preserve, and which we assume to be the definition of the language. It would be a little more nuanced than this since as part of the compiler work, we might revise some of the reference semantics on the basis of our theory of optimization. For instance, although the current reference implementation provides certain guarantees on the order of side effects (e.g. in a functional pipeline, the filter function effects happen first, then the map function effects, ...), it is likely that we would not provide such guarantees once the compiler work is completed (the sequence of effects would follow the transformations of each individual element in the input list. But more precisely, we'd like to discourage the use of accidental side effects, i.e. effects not explicitly done using Qi's effect form, rather than guarantee a particular order of effects), and this would be an intentional change of the reference semantics. So, the reference compiler would be defined in relation to the original semantics of the language together with the theory of optimization.

The optimizing compiler is the one currently being developed on the lets-write-a-qi-compiler branch. We need to ensure that transformations we perform preserve the reference semantics. Currently, the only safeguard we have here is unit tests, but unit tests are something like sanity checks with regard to correctness, rather than proof. Ben has also talked about the need for proofs before. Some tools that might be relevant here are Coq, Rosette and Redex.

It will likely be simplest to prove results in terms of expressions in their reduced QNF representations as they would be compiled by the two compilers. Such results could include, for example, proofs of (1) correctness of individual rewrite rules, and (2) termination of sequences of rewrite rules.

Eager beaver deforestation!

It turns out that we are optimizing all uses of the threading form using deforestation, both right as well as left threading!

This benchmark that we wrote recently:

(define-flow filter-map
  (~>> values
       (~> (filter odd?)
           (map sqr))))

... is actually semantically wrong, since the inner threading form is left threading, meaning that the list should be passed in the first position instead of the second position. This is a runtime error, since filter and map expect the list in the second position.

This currently works, though, producing the result that we would have expected with right threading, because we simply match uses of thread in the compiler and don't verify whether it is right threading specifically.

We will likely need to check the chirality syntax property and verify that it is 'right, as part of the matching rule. We have also talked about alternative ways to implement right threading in the past that would allow the threading direction to propagate to nested forms, so that would probably also need to be considered here at some point. But for now, we just need to ensure that we only match right-threading.

Other points of discussion

The Need for Splicing Syntax in Dispatchers

Not directly related to the compiler but as more of a design consideration for conditional forms, Dominik showed an example (using case from Actor Basic, similar to Qi's switch) where we'd like to splice a set of conditions into a containing conditional form. This was following up from last time and the subsequent discussion on Discourse. One alternative suggested by Michael was that this could be accomplished procedurally at runtime. But that would require either that the conditional form handle the return value of this procedure and then continue checking conditions (which typically such conditional dispatchers do not do) or that the return value is checked prior to invoking a conditional expression containing "the remaining" conditions, which is probably the traditional way this would be accomplished. The implementation in the latter case would vary depending on the exact usage circumstances and not be generally applicable. It does appear that using such a splicing mechanism could avoid repetition in all cases where there is a dispatching form (e.g. conditionals like cond, case, switch, ...), and where there is a stock set of conditions and actions that is widely used.

Visualizing flows with flow graphs

We considered two possibilities for adding such visualizations down the line. Either as a stage in the compiler after reducing source expressions to Qi Normal Form, or implicitly by e.g. using a different implementation of the define-flow macro which would add the visualization. It would be nice to be able to edit the code and have the visualization update live, or even edit the code directly using a visual representation, similar to how hardware engineers lay out circuits using a GUI.

The use of bindings might present some challenges for such flow graphs. Currently, the implementations draws bounding boxes around flow elements, representing lexical scope. This may be sufficient.

Next Steps

(Some of these are carried over from last time)

  • Implement streams for the other list operations described in St-Amour's writeup (i.e. unfoldr map filter foldl zip average upto). Note: implementing average might benefit from having "multi-valued CPS" for deforestation.
  • Write unit tests for more compiler rewrite rules.
  • Find a way to reliably detect list-like operations on values.
  • Rewrite these list-like operations to list-based functional pipelines, e.g. (~> (-< (>< f) (>< g)) ...)
  • Add additional nonlocal benchmarks to validate all cases optimized by stream fusion.
  • Generalize the stream implementation to support multiple values at each step (in addition to zero or one) and recheck the performance.
  • Review other core Qi combinators besides ~> (e.g. -<, == and ><, maybe if and pass) and see whether it would make sense to extend fusion to them.
  • Should Qi provide a map utility (overriding the default one) that supports "0 or more" values (as was discussed in the RacketCon talk)?
  • Review other bindings cases that we could support next (e.g. selective binding, packing values, and binding exceptions in try).

Attendees

Dominik, Sid

Clone this wiki locally