Skip to content

Qi Compiler Sync Sept 1 2023

Siddhartha Kasivajhula edited this page Sep 10, 2023 · 2 revisions

Sweet Optimizations O' Mine

Qi Compiler Sync Sept 1 2023

Adjacent meetings: Previous | Up | Next

Summary

We improved the match pattern for stream fusion, and then discussed what safeguards we may need to put in place to ensure that the optimization passes do not enter an infinite loop. We implemented the overall layout of compiler passes as sequential stages that don't recur, and also ensured that each of these passes is applied recursively until the source expression reaches a fixed point where no further optimizations are possible, before moving on to the next pass. We manually verified via test expressions that this works end-to-end so that the existing optimizations are applied in any subexpression or nested position, and also considered better ways of testing this that could be incorporated into the development workflow. We also discussed the possibility of operating on lists instead of values in the Qi core and whether it would be appropriate to do that universally or only in specific cases. Michael also put up syntax-spec on the package index and we are now able to officially depend on it, so this opens the door to merging the compiler work whenever it is ready.

Background

Last time, we completed incorporating stream fusion into the Qi compiler so that it would optimize the specific input pattern that we used in our benchmark:

(define-flow filter-map
  (~>> (filter odd?) (map sqr)))

We also had a basic layout of compiler passes in place that only optimized the top level expression.

Generalizing the pattern

We generalized the match pattern from this:

[((~datum thread) f:fusable-list-operation ...+)
 (generate-fused-operation (attribute f))]

... to this:

[((~datum thread) _0:non-fusable ... f:fusable-list-operation ...+ _1 ...)
  #:with fused (generate-fused-operation (attribute f))
  #'(thread _0 ... fused _1 ...)]

... to match more than just the specific expression we were using in benchmarking. We verified that the optimization was still applied by benchmarking this:

(define-flow filter-map
  (~>> values (filter odd?) (map sqr) values))

... which had the same good performance as before.

Traversing the syntax tree

Formerly, the compiler passes would recur until a fixed point was reached, but only at the top level of the expression:

(define (optimize-flow stx)
  (let ([optimized (optimization-pass stx)])
    (if (eq? optimized stx)
        stx
        (optimize-flow optimized))))

We had written a find-and-map utility some time back to traverse the syntax tree and rewrite (as ...) bindings to uses of set!. We reused that utility to traverse the syntax tree and map each subexpression under the rewrite rule for the optimization pass. This is a "top-down" traversal and is appropriate for the optimizations we have so far, but it may not always be applicable and we may need a "bottom-up" traversal in some cases.

We verified that nested positions are now optimized by benchmarking this expression:

(define-flow filter-map
  (~>> values
       (~> (filter odd?)
           (map sqr))))

... and finding performance on parity with the unnested version.

Compiling...

Compiling

Originally, our naive layout of compiler passes simply optimized the top level expression, but it did so recursively until the expression reached a fixed point where no further optimizations applied.

This is fine for stream fusion, where we are rewriting to Racket via an esc form, so there is no chance that the expression would be further optimized after that. But with other optimizations where we may be rewriting Qi → Qi, there is the possibility of rewrite cycles where expressions are rewritten to other expressions in a sequence that forms a cycle, so that the optimization loop would not terminate. Ben had also brought up this possibility before.

One way to avoid this could be limiting the number of rewrites to some function of the depth of the syntax expression, keeping track of the number of recursions. Another way we considered was to optimize the deepest level first, and then on the subsequent pass, optimize one level shallower, and so on, until we optimize the top level expression. A third possibility could be to detect cycles using some kind of caching of intermediate expressions.

Layout of Compiler Passes

For now, instead of allowing all optimizations to be applicable throughout the compilation process, we adopted a more conservative approach of having each optimization pass be done in a distinct stage, and have these stages be done sequentially rather than with the possibility of recurrence.

Within each rewrite pass, it will recur until it reaches a fixed point, but it may still be necessary to explicitly eliminate the possibility of infinite loops here.

This sequential approach won't always produce optimal code since in some cases it may conceivably be beneficial to have done the optimizations in a different order than the one we hardcode, or even to recur on some of the stages. But it has the advantage of being simple and a worthy place to start as it is likely to catch most cases that we'd like to optimize in practice, and when cases arise that elude this paradigm, we will have concrete examples around which to design alternative strategies.

And with this in place, the compiler seems to be past the proof-of-concept stage as it is able to apply nontrivial optimizations at any position in the source expression.

Better Ways of Validating Optimizations

Currently, we have unit tests that tell us if something is semantically wrong in the source code, and we have benchmarks that tell us how various expressions perform.

But for compiler optimizations, if the unit tests pass, that only tells us that the optimizations didn't break anything semantically -- it doesn't tell us that they were actually applied, or that they optimized the expression as we would expect. On the other hand, benchmarks would tell us if the optimization is performing as we expect, but it doesn't tell us if the resulting code is semantically correct or even whether the expression was actually optimized as expected. More importantly, benchmarks aren't deterministic, making them unreliable as a way to validate rewrite rules. Relying on a combination of the two would be cumbersome and would require writing benchmarks that are irrelevant aside from their validating role since they would duplicate other benchmarks that tell us the performance story we care about, and they would also duplicate tests that validate semantics.

We considered two possibilities:

  1. Write a unit test suite for compiler optimizations that calls expand on various Qi expressions and then verifies certain things in the expanded syntax.
  2. Write unit tests for internal compiler utility functions.

(1) fits into the existing testing pattern where we only test the public interface of the language, i.e. the flow macro and above that, but don't test the internal implementation directly. But in this case it would be a roundabout way of validating what we are looking for, and for instance, bindings might have various annotations in the expansions that aren't relevant for our tests, but which we may need to be aware of and parse through. (2) is a more direct and clean way, where we could test optimizations by passing in syntax objects and validating the returned syntax objects.

We seemed to agree that (2) would be a simpler and more robust option.

Qi's Core Values?

Last time and at various times in the past, we've considered whether we could have Qi's core combinators operate in terms of lists rather than values, with conversions to and from values occurring only at the beginning and end of each flow expression.

We revisited this and seemed to converge on the idea that there are two distinct cases to consider here:

  1. Naturally multivalued operations that are like call-with-values.
  2. List-like operations that are like map and filter or fold.

We discussed that in the former case, treating the values as a list could be counterproductive as low-level optimizations such as the use of registers for function arguments would not be applicable anymore.

In the latter case of list-like operations, if we can statically rewrite these to list operations, then that would naturally feed into the deforestation optimization pass which would ensure that these are done efficiently.

So it seems to come down to finding a way to reliably identify list-like operations. If we can do this, then we may not need to separately implement fusion for values-based pipelines. But if we are going to use a common stream implementation, then we will need to generalize it to be able to generate more than one value at each step (the standard implementation (e.g. as found in St-Amour) is zero or one value). This could be an interesting and novel variant, and hopefully it will perform as well.

Syntax Spec v1!

Michael officially released the syntax-spec-v1 package on the package index 👯‍♂️ 🎉. We successfully migrated to depending on it (after the meeting was over). This package points to a v1 branch as there could be backwards-incompatible changes down the line, but as this package will always be available it shouldn't be a problem for Qi to officially depend on it until a stable version arrives, and this opens the door to merging the compiler work whenever we feel it's ready. Follow further development at the source repo.

Next Steps

(Some of these are carried over from last time)

  • Find a way to reliably detect list-like operations on values.
  • Rewrite these list-like operations to list-based functional pipelines, e.g. (~> (-< (>< f) (>< g)) ...)
  • Write unit tests for compiler passes to ensure expressions are rewritten as expected.
  • Implement streams for the other list operations described in St-Amour's writeup (i.e. unfoldr foldr map filter foldl zip average upto).
  • Add additional nonlocal benchmarks to validate all cases optimized by stream fusion.
  • Generalize the stream implementation to support multiple values at each step (in addition to zero or one) and recheck the performance.
  • Review other core Qi combinators besides ~> (e.g. -<, == and ><, maybe if and pass) and see whether it would make sense to extend fusion to them.
  • Review other bindings cases that we could support next (e.g. selective binding, packing values, and binding exceptions in try).

Attendees

Dominik, Michael, Nia, Sid

Clone this wiki locally