Skip to content

Qi Compiler Sync Nov 10 2023

Siddhartha Kasivajhula edited this page Nov 28, 2023 · 6 revisions

Deforesting Vindaloo

Qi Compiler Sync Nov 10 2023

Adjacent meetings: Previous | Up | Next

Summary

We discussed various aspects of the deforestation optimization, and whether there could be ways for users to extend it to custom datatypes.

Background

We've been working on deforestation for some time now, and the implementation has gotten more nuanced over time. Since last time, Dominik got a prototype of a stream producer (range) working. Sam has been working on a data science module and wanted to see if deforestation could be applied there.

Deforestation for bespoke datatypes?

The current implementation of deforestation operates specifically on lists. But the idea behind it applies much more generally -- for instance, vectors or other simple collections have analogous semantics to lists in this regard. A more specialized example could be a bitmap, where a series of graphical transformations are applied that may result in the redundant construction of a series of bitmaps instead of just one bitmap. Likewise, the dataframe is a popular datatype used in data science applications, a kind of multi-dimensional array that supports any number of independent ways of indexing into that data. Some time ago, Hazel, author of the Sawzall library, had expressed that fresh construction of dataframes at each step of a data pipeline represented a major performance bottleneck for that library.

Sam has been working on a functional dataframe library for which all copies of the structure would be "shallow" rather than "deep," by leveraging immutability of the underlying representation. This could address some of the aforementioned performance issues, but it would still be desirable to be able to leverage "deforestation" on sequences of transformations of the dataframe, such as adding "series" (columns) to the dataframe, or adding indexes.

We considered some possibilities for how deforestation in Qi could be applied in these diverse cases.

Supporting custom implementations in the core

We could have some way to detect datatypes "literally" and apply a tailored deforestation implementation. This option would of course work, but is obviously undesirable as it would bloat the Qi compiler with implementations for custom datatypes that typical users are unlikely to see.

"Compiler Macros"

This was a very interesting option we considered. Could we provide a macro or a series of macros that would allow users to specify compiler optimizations that would be applied in the Qi compiler? We could potentially use some kind of compile-time datatype to detect, in the Qi compiler, when a user-provided optimization is available, and then apply that optimization prior to any of the built-in compiler passes. This could be a relatively lightweight way to achieve custom compilation of bespoke datatypes.

In a way, this idea is a generalization of things like define-inline and begin-encourage-inline which give users ways to hook into the operation of the Racket compiler. With these "compiler macros," we give users a way to define the operations of the Qi compiler.

Of course, allowing users to extend the compiler is in one sense nothing unusual. Macros are already effectively a way to do this. However, macros are applied prior to expansion, whereas the "compiler macros" we are talking about here are applied after expansion and in the early stages of compilation. Perhaps it would take the form of a "user" compiler pass that would be performed right after the normalization pass (and before list deforestation, and other passes that we may add in the future).

Yet, it is somewhat limited in that the optimization rules would need to be specified in terms of the normalized Qi core language, and so the user would need to know exactly what the expansion is going to look like. This may be OK in practice, but in principle, it may be vulnerable to regressions over the course of Qi development, since there could be changes in the expander or in normalization that would result in slightly different expansions over time.

Tower of Languages

Libraries desiring to use the deforestation optimization could provide a tailored implementation for their datatype, but not in the Qi compiler; instead, they would do so in their own compiler at a higher level. This would mean that the library would have a similar architecture to Qi itself (and Racket), that is, exhibiting a stratified expander and compiler. The target language for this library could either be Qi or Racket. The former would be preferable if the language is flow-like, as it would then benefit from seamless integration with the rest of Qi, including any flow-specific optimizations that Racket would not perform.

This option seems the most robust and flexible. Syntax-spec is under active development (by Michael and others) and has already added functionality to support specific needs that became apparent in the course of work on the Qi compiler. Using it to implement a language on top of Qi which itself sits on top of Racket would be a very interesting success story for language oriented programming and a potentially valuable collaboration between early adopters and syntax-spec authors.

Assessment

While the "tower of languages" approach seems to be the best one, it does not necessarily conflict with also providing a "compiler macros" facility, except that doing so may encourage overuse of this pattern over a more robust available solution.

Other Compiler Hooks

In the past, we've talked about an "arity" compiler pass, where we could maintain a table of known arities for built-in Racket functions and then annotate source expressions with this known arity so that arity-optimized versions of user code could be used instead of implementations assuming arbitrary arity (e.g. requiring packing values into a list via a "rest args" parameter).

We discussed this time the possibility of allowing such an arity pass to be user-extensible, essentially by providing another kind of "compiler macro" like this:

(define-known-arity my-function 3)

Racket also provides a built-in procedure-arity function, but it may be that this can only operate on resolved identifiers rather than syntax. So it may be necessary to have a define-known-arity macro to allow us to have this information at compile time.

Designing the Compiler with Extensibility in Mind

Generally, for each compiler pass, we could consider what information a user could conceivably provide that would help us optimize better in that pass. If such information exists, then we could provide corresponding macros to allow users to supply information relevant for that pass (e.g. define-known-arity above for the arity pass).

A stream producer

Dominik completed the range stream producer that we started on last time, handling cases where no arguments, some arguments, and all arguments are presupplied (range accepts up to 3 arguments). These arguments needed to be placed at the right positions based on the current chirality, using either right or left currying, and so he named this "vindaloo" since it's a particularly spicy form of currying.

More stream consumers

So far we've talked about foldl and foldr as standard stream consumers. We discussed car as another, more simple example (and also, more generally, take). Indeed, deforesting this could be accomplished by simply yielding the first value and terminating the iteration there, instead of processing the entire sequence.

Novel implications for effects

In the past we've said that Qi will not preserve guarantees on order of side effects, since it will assume that flows are pure (i.e. free of side effects). If a side effect is desired, it should be accomplished at the Qi level using effect (e.g. (~> (effect displayln sqr) add1))) and not hidden within a flow implementation.

But so far, we have observed that despite this declaration regarding side effects, we still provide a sensible order of effects. That changes with something like car, however, since in this case, we are not just exhibiting a different order of effects but in fact truncate some of those effects. Side effects of operations done on subsequent elements of the sequence would not be performed at all, rather than merely in a different order.

Our guarantees effectively boil down to:

  1. The effects will either be done in a different order.
  2. Or only a subset of them will be done.

We still consider this "sensible" from Qi's perspective, since we do not ever produce different effects than what the user intended, which we could consider a true semantic deviation.

Modular compiler passes

Initially, the deforestation optimization was simply about detecting "fusable" list operations and then applying stream fusion. But now there is a lot more nuance, with distinct handling of stream producers, transformers, and consumers, and specialized applicability of the optimization in only certain cases to avoid complexities with multiple values (at least initially). Currently, all the deforestation code is in the main compiler.rkt module, along with the code for the normalization pass (the only other pass, at the moment).

We felt that it made sense to extract all deforestation-related code to a distinct module. This would also include the actual target code in the impl.rkt ("implementation") module. It will likely make sense to also handle other compiler passes in this modular way, regardless of the complexity of the actual optimization.

Avoiding duplication in patterns and templates

The main entry point to deforestation is the deforest-rewrite function. This matches patterns in the input syntax and dispatches to generate-fused-operation to produce the appropriate template based on the presence of producers, transformers, and consumers. The latter function matches the annotated syntax and produces the appropriate optimized code.

(define (deforest-rewrite stx)
  (syntax-parse stx
    [((~datum thread) _0:non-fusable ...
                      p:fusable-stream-producer
                      f:fusable-stream-transformer ...+
                      g:fusable-stream-consumer
                      _1 ...)
     #:with fused (generate-fused-operation (syntax->list #'(p f ... g)))
     #'(thread _0 ... fused _1 ...)]
    [((~datum thread) _0:non-fusable ...
                      p:fusable-stream-producer
                      g:fusable-stream-consumer
                      _1 ...)
     #:with fused (generate-fused-operation (syntax->list #'(p g)))
     #'(thread _0 ... fused _1 ...)]
    ...))
(define (generate-fused-operation ops)
  (syntax-parse (reverse ops)
    [(g:fusable-stream-consumer
      op:fusable-stream-transformer ...
      p:fusable-stream-producer)
     #`(esc (λ args
              ((#,@#'g.end
                (inline-compose1 [op.next op.f] ...
                                 p.next))
               (apply p.prepare args))))]
    [(g:fusable-stream-consumer
      p:fusable-stream-producer)
     #`(esc (λ args
              ((#,@#'g.end p.next)
               (apply p.prepare args))))]
    ...))

We realized that if we ensured that there was always a producer, a series of transformers, and a consumer, then the latter function could employ a single template (i.e. the first, most general, one) instead of duplicating the patterns in the former function. This could be accomplished by just annotating with dummy syntax as placeholders if these components happen to be absent, and ensuring that the syntax classes supply appropriate values of attributes like next that are used in the template.

No "Vegas Style" Error Messages

One of the big concerns that had been raised early on in the compiler work is that rewriting user code runs the risk of obscuring errors by translating them to a different (optimized) implementation that the user isn't familiar with. Without additional care on our part, this transformation could lose that source context making the resulting error messages inscrutable. In particular, we'd like the errors reported by the implementation to connect to the source context rather than stay in the target context. Unlike proverbial Vegas, what happens in the target implementation should not stay in the target implementation.

Currently, with the refactoring to have a prepare step in all stream producers, it should be possible to employ a simple contract in this implementation function that would report the appropriate errors, but naively, this would implicate the target context rather than the source context. We will need to investigate whether there would be a way to pass the source syntax down to the implementation function so that the error can indicate it, and, if we use contracts, whether they support a way to specify the error context.

Miscellaneous Discussion

define-inline considered harmful

This is only good to use in very specific cases where we know the functions won't ever be passed as arguments [any other cases?]. If this happens, the compiler would not be able to infer other optimizations (e.g. that rely on matching identifiers) that would have been possible, as the function would be passed in fully-inlined form.

We use this in our implementation of deforestation, and it's OK there as these functions aren't being provided for arbitrary use, and we know exactly how it's used.

Naming Convention for Meaningless Bindings?

In eliminating duplication in deforestation, we had to employ an identifier as a dummy piece of syntax that could be matched against. As it looks like any other identifier, it could confuse future developers as it could be assumed to be referring to a function or variable defined somewhere. So to avoid this confusion, we felt it would be better to develop a naming convention for such dummy identifiers used purely in pattern matching. E.g. sentinel-list->cstream or :symbol:list->cstream instead of list->cstream.

We agreed that with names, sometimes it is easier to know in retrospect when there is some emergent coherence than in advance, so we left this for later.

Racket Rogue-Like Library

Dominik demoed some new ways in RRLL in which the underlying maze generation and consistency algorithms are reused across many different graphical visualizations -- from 2D (a la "Dangerous Dave") to isometric (like more recent Roguelikes) to 3D (like Doom) representations. Try it out and pester Dominik with all of your questions.

Next Steps

(Some of these are carried over from last time)

  • Add additional nonlocal benchmarks to validate all cases optimized by stream fusion (e.g. for range).
  • Add test cases for the backwards incompatibility on datum vs literal matching, and review whether there is a way to avoid errors here by favoring the Qi binding space in the expander.
  • Prototype and then propose map1 for addition to Racket.
  • Add a "long functional pipeline" benchmark.
  • Continue to integrate macroexpansion events into compiler rewrite rules to afford visibility in e.g. the Macro Stepper.
  • Provide good error messages in optimized (e.g. deforested) expressions in terms of source expressions.
  • Validate that optimization passes can prematurely terminate if they don't return false when they don't transform the input (and then fix it).
  • Implement streams for the other list operations described in St-Amour's writeup (i.e. unfoldr zip average upto). Note: implementing average might benefit from having "multi-valued CPS" for deforestation.
  • Write unit tests for more compiler rewrite rules, as well as end-to-end tests for pairs of input expressions, and sanity tests for the expander.
  • Find a way to reliably detect list-like operations on values.
  • Rewrite these list-like operations to list-based functional pipelines, e.g. (~> (-< (>< f) (>< g)) ...)
  • Generalize the stream implementation to support multiple values at each step (in addition to zero or one) and recheck the performance.
  • Review other core Qi combinators besides ~> (e.g. -<, == and ><, maybe if and pass) and see whether it would make sense to extend fusion to them.
  • Should Qi provide a map utility (overriding the default one) that supports "0 or more" values (as was discussed in the RacketCon talk)?
  • Design a core continuation form(s?) for Qi, and show how try can be implemented in terms of it.
  • Describe the sequence of compilation for input expressions with the optimizing compiler against the reference compiler, showing cases in which these do not terminate.
  • Come up with a methodology to formalize and theoretically validate compiler behavior.
  • Review other bindings cases that we could support next (e.g. selective binding, packing values, and binding exceptions in try).

Attendees

Dominik, Sam, Sid

Clone this wiki locally