Qi Compiler Sync Nov 17 2023

Rogue's Gambit

Qi Compiler Sync Nov 17 2023

Adjacent meetings: Previous | Up | Next

Summary

Dominik the Racketeer gave a demo of new developments in the Racket Roguelike Library (RRLL) in the opening act. After that, Brad the Gambiteer gave an overview of a "delayed evaluation" alternative to deforestation that he used for arrays in SRFI 231. We then went over our current deforestation implementation and reviewed some recent changes, including refactorings, benchmarks, and tests, before taking stock of outstanding items for release.

Background

Last time, we talked about modularizing compiler passes and extending deforestation to car, and writing more benchmarks (among other things). We worked on these in the interim and made progress.

Flow-oriented Actors

Dominik showed off some new developments in the Racket Roguelike Library where you can switch between 2D and 3D representations of a common underlying and dynamic maze representation. The maze is "filled with flow-oriented actors" including doors, keys, and the player themselves, and all of the evolution of play is accomplished with no mutable state by using Actor Basic, a flow oriented language similar to Qi with special support for continuations.

We discussed some aspects of Actor Basic's implementation of continuations for consideration in the design of continuations that may be added to Qi.

We also observed that AB's implementation of "actors" as functions instead of objects was a simple and elegant approach (reminds a little bit of this old koan).

Fueling "reptilian" conspiracy theories, Dominik let slip at one point that the method for achieving the 3D effects in a 2D rasterization in RRLL "[...] is a bit confusing to the human mind."

Some of us were also privately wondering if RRLL could be combined with an earlier demo Dominik had shown that included "walkers" guided by various algorithms, to create a 3D version of Pacman, realizing the dream of 3-Demon.

Generalized Arrays

Brad gave an overview of SRFI 231 which is about a general "array mechanism for Scheme." The way that this mechanism "doesn't compute subvolumes until they are needed" has some parallels to our current implementation of deforestation, and it's possible there are some ideas there that can be adopted.

Some ways in which this as-needed computation is accomplished include distinguishing "specification" of operations to be performed from the actual doing of them.

Specification

There are three main such operations:

map
Inner product
Outer product

These aren't immediately undertaken but instead are taken to specify work to be done later.

Transformation

These are things like ordering, permuting, slicing. These are accomplished not by actually changing the data, but instead by wrapping the data (with a lambda?) to yield a kind of "view" into the underlying array, a view that modifies how that data is accessed. For instance, array-reverse may simply change the indexing, or modify the getters and setters in the interface to the data, rather than actually mutate or copy the underlying array.

Action

These are things like copying and folding over arrays, or converting them to a different type, which need to be performed immediately. At the time when these are undertaken, they look for certain properties on the data and the operations to be performed, where each such property implies a way to perform the computation that is efficient. Among other things, this avoids building intermediate representations of the data, as sequences of operations on elements of the array may be composed in a single step.

More

Read the actual SRFI for the full details.

Hidden Arrays?

In the past, we've talked about how it might be worthwhile to invisibly use an array-like data structure internally in the compiler to optimize certain operations, especially those involving multiple values which do not naively perform well when there are large numbers of them involved, as "values" passed between functions in Racket (and Chez Scheme) are intended to model a small number of arguments passed between functions, rather than collections.

Some people have also drawn parallels in the past between Qi and APL, which happens to use (publicly) a multidimensional array data structure as its core data type.

Another more speculative possibility that we've considered in the past is adding a new concept of "channels" or "conduits" to Qi, where flows could in a sense happen in parallel dimensions to the main flow. This vague metaphor also suggests an array.

Noah has in the past mentioned that the core model of Qi from a mathematical perspective is a Cartesian Closed Category, which once again calls to mind an array.

So, it's not hard to imagine that Qi (or a dialect of Qi) could someday leverage an array data structure to optimize certain patterns, maybe even the dominant "multiple values" pattern. And in that case, this SRFI seems very relevant.

Even aside from that, it's possible we can use some general ideas, such as "views" and separating specifications from actions, in other areas of the compiler.

Business as Usual

We merged a PR that refactors compiler optimizations into modules (just deforestation, for now), and reviewed another that fixes some compiler tests and adds benchmarks.

Inlining needs encouragement?

Some time ago, we defined all of the functions involved in the deforestation implementation using define-inline, and also had at an earlier stage wrapped them in begin-encourage-inline. It had been our assumption that the latter was a safer and more gentle compiler hint, while the former would always cause inlining, which should be done with care as it could make performance worse in some cases.

This lingering redundancy in the implementation was pointed out on Discord recently and so we removed begin-encourage-inline at the time. But running benchmarks now showed that performance of Qi was on par with Racket, where formerly it had been much faster.

$ make profile-competitive

...

 (filter-map 1)
 (filter-map-foldl 1)
 (filter-map (large list) 1)
 (filter-map-foldr 0.7))

Indeed, it turned out that restoring begin-encourage-inline restored the performance to about 3x.

$ make profile-competitive

...

 (filter-map (large list) 0.36)
 (filter-map 0.29)
 (filter-map-foldl 0.25)
 (filter-map-foldr 0.24))

It's unclear why mere "encouragement" to inline, when we have already mandated inlining, should make a difference, but maybe everyone just needs a little encouragement sometimes, even the Chez Scheme compiler!

Range deforestation

We recently added support for deforesting range. We added a range-map benchmark to validate it:

(define-flow range-map
  (~>> (range 0)
       (map sqr)))

We verified that, when called with large numbers like 1000, the performance with range deforestation was on par with Racket.

$ ./report-competitive.rkt -s "range-map"

Running competitive benchmarks...

Running Racket benchmarks...
range-map:
1119 ms

Running Qi benchmarks...
range-map:
1137 ms

Performance relative to baseline:
((range-map 1))

Now, range, as a stream producer, will only ever be used once at the beginning of a functional pipeline, and as this benchmark shows, it doesn't necessarily improve performance over Racket. So why bother incorporating it into our implementation of deforestation?

The reason is, doing so causes deforestation to begin at an earlier stage than if we used the built-in range. In an extreme case where we do something like:

(~> () (range 10000000) (filter odd?) (map sqr) car)

... with both range and car incorporated as a stream producer and consumer, respectively, this would involve only a handful of computations instead of millions!

Avoid conflating testing and benchmarking

In the past we've looked for good ways to test the compiler, and there were many good suggestions from the community. This time, with the release being within sight, we discussed returning to this aspect and more clearly distinguishing the scope of benchmarking and testing. For instance, one of our benchmarks was written this way:

(define-flow filter-map
  (~>> values
       (~>> (filter odd?)
            (map sqr))))

... to simultaneously benchmark our implementation of deforestation and also verify that it happens even if the functional pipeline happens to be in a nested position. The former is a benchmarking concern while the latter is a testing concern, so we simplified the benchmark to do just its job:

(define-flow filter-map
  (~>> (filter odd?)
       (map sqr)))

... and agreed that the simplification of (~>> f (~>> g h)) to (~>> f g h) should be a concern of a unit test for the normalization pass of the compiler.

Additionally, we noted that while our unit tests for the deforestation compiler pass are fairly complicated and necessarily involve specifics of expansion and also of the deforestation implementation, that for the normalization pass, we could instead use the strategy suggested by community members in that Discourse thread, of independently compiling two input expressions and simply verifying that they compile to the same thing (without worrying about what that is).

This "check f(a) == f(b)" approach can't always be applied (e.g. we can't use it for deforestation), but it's a simple and useful technique where applicable (like in normalization).

Taking Stock for Release

At RacketCon, we announced that we plan to release the compiler work "before Christmas." Matthew pointed out that we didn't say which Christmas, so we still have an out 😂. Right now, it doesn't look impossible, but it's also going to be tight!

We reviewed what remains to be done before we can merge the compiler work into main branch, and agreed that these were the remaining items:

Provide good error messages.
Tests and benchmarks
Documentation.

For (1), some combination of contracts and ad hoc runtime checks seems necessary, likely with some way to pass in the input syntax into the implementation functions.

Looking beyond release

This release isn't about writing a perfect compiler and leaving nothing unoptimized. It's just about proving that useful optimizations are possible, and we are pretty close on that (although a lot of testing and review remains to be done). There are already a lot of possibilities for optimization that we aren't yet going to act on, and many other avenues of exploration that have become apparent that we're leaving for later (such as most of the "next steps" enumerated below!). But we talked about some of these things.

No accidental side effects

We agreed that Qi's "no accidental side effects" policy is huge as far as potential for optimizations are concerned, and opens up a lot of doors, starting with the recent suggestion to deforest car, and including much of racket/list (e.g. first, second, take, index-of, member, ... anything that can be considered a "consumer").

apply and fold?

We may be able to rewrite (apply f ...) where f is known to be variadic (e.g. +), to a fold, to avoid an intermediate list construction at the penultimate step.

Deforesting values

The original idea with deforestation was to apply it to pure values (rather than lists), especially as that has proven to be inefficient for large numbers of values. Although we have been focusing on lists, it is possible that there will be cases of use of values that we can identify as "list-like" and then rewrite those expressions to use lists so that they are picked up by our existing implementation of deforestation for lists.

One big aspect of writing a compiler (among others) is building an arsenal of underlying general-purpose optimizations, and then transforming surface code into forms that would be picked up by them.

Next Steps

(Some of these are carried over from last time)

Add additional nonlocal benchmarks to validate all cases optimized by stream fusion (e.g. for range).
Add test cases for the backwards incompatibility on datum vs literal matching, and review whether there is a way to avoid errors here by favoring the Qi binding space in the expander.
Prototype and then propose map1 for addition to Racket.
Add a "long functional pipeline" benchmark.
Continue to integrate macroexpansion events into compiler rewrite rules to afford visibility in e.g. the Macro Stepper.
Provide good error messages in optimized (e.g. deforested) expressions in terms of source expressions.
Validate that optimization passes can prematurely terminate if they don't return false when they don't transform the input (and then fix it).
Implement streams for the other list operations described in St-Amour's writeup (i.e. unfoldr zip average). Note: implementing average might benefit from having "multi-valued CPS" for deforestation.
Write unit tests for more compiler rewrite rules, as well as end-to-end tests for pairs of input expressions, and sanity tests for the expander.
Find a way to reliably detect list-like operations on values.
Rewrite these list-like operations to list-based functional pipelines, e.g. (~> (-< (>< f) (>< g)) ...)
Generalize the stream implementation to support multiple values at each step (in addition to zero or one) and recheck the performance.
Review other core Qi combinators besides ~> (e.g. -<, == and ><, maybe if and pass) and see whether it would make sense to extend fusion to them.
Should Qi provide a map utility (overriding the default one) that supports "0 or more" values (as was discussed in the RacketCon talk)?
Design a core continuation form(s?) for Qi, and show how try can be implemented in terms of it.
Describe the sequence of compilation for input expressions with the optimizing compiler against the reference compiler, showing cases in which these do not terminate.
Come up with a methodology to formalize and theoretically validate compiler behavior.
Review other bindings cases that we could support next (e.g. selective binding, packing values, and binding exceptions in try).

Attendees

Brad, Dominik, Jair, Sam, Sid

Provide feedback

Saved searches

Use saved searches to filter your results more quickly