Skip to content

Qi Compiler Sync Oct 6 2023

Siddhartha Kasivajhula edited this page Dec 14, 2023 · 5 revisions

Validly Verifying that We're Compiling Correctly

Qi Compiler Sync Oct 6 2023

Adjacent meetings: Previous | Up | Next

Summary

We discussed theoretical and practical approaches to testing the Qi compiler, and ways to expose optimization passes to IDE tools for ease of debugging. We also came up with organizing ideas for deforestation, and triaged remaining issues to identify blockers for merging the initial compiler work into the main branch of development (we're close but we've still a little ways to go!).

Background

Last time, we continued the work on deforestation by implementing it for foldl. In the interim, we added a benchmark for it and validated that the performance impact was about the same as for foldr.

Testing the Compiler

The compiler is a complex piece of code whose responsibility is to transform the code written by the user into totally different code! Whether the transformation results in code that is semantically equivalent to the original is a complex problem that is undecidable in the general case. But in the specific transformations we choose to apply to the source code, we want to be sure that any resulting differences in meaning are intentional (and thus presumably communicated to the user). There are two broad kinds of guarantees we can derive here: (1) theoretical guarantees of correctness of transformations, and of termination of such sequences of transformations, and (2) practical guarantees that transformations are applied and produce the same results on sample input.

Theoretical Guarantees

Although having theoretical guarantees of soundness would be valuable (as we've talked about before), it could be claimed that this value is primarily academic rather than something that users would directly experience. Yet, we expect that progress here will surely pay unexpected dividends on the practical side as well, since theoretical exploration will help us understand at a deeper level why these optimizations do or don't work, allowing us to identify any further avenues for development. In addition, as a recent proof of the Four Color Theorem showed, a purely academic exercise of finding a different proof for the theorem also as a side effect provided a quadratic-time algorithm for achieving the coloring in general graphs. The quest for theoretical results could reveal optimizations we haven't thought of before, even outside of its academic value.

We discussed to what extent the methodologies that could be applied here (e.g. specifying the relationship between passes, recognizing aspects within a pass that may interfere with another pass, asserting properties that should hold on composed passes) might be common to approaches used for other, well-known programming languages like Scheme and Haskell, and whether proving results for a flow-oriented language like Qi might present any novel features. We agreed that it would be good to learn from and reuse as much as possible, and if anything, it's likely that the methodologies would be simpler and more theoretically tractable for Qi than for these other languages due to the high level of composability (and decomposability) of flows. In the past we've considered how the "visual" nature of the flow-oriented paradigm could allow Qi to even be a good notation for thinking about certain branches of mathematics (perhaps functional analysis or computability theory -- e.g. the "composition operator" in primitive recursive functions can be expressed simply as (~> (-< g ...) h)), and along these lines, maybe there is a convenient mathematical notation for reasoning about compiler transformations that could be intuitive here?

We felt that formalizing these checks in code (e.g. proof assistant) that could be committed into the repository would be one useful approach (perhaps something like this example from Ben), and also discussed the need to formalize the guarantees provided by the language on order of effects.

Practical Testing Approaches

We asked for input on Discourse from others who might be working on compilers, for approaches to unit testing a compiler.

Some initial feedback here (see the post for more): Racket has "end to end" compiler unit tests that verify that two different source expressions expand and compile to the same thing. With Qi we have been considering only testing individual rewrite rules. The latter of these may scale better with many different permutations of input expressions, but the former avoids all specifics of expansion and compilation besides the high level property we seek to check.

We seemed to be leaning towards having some of both of these types of tests, and in addition, to also introduce tests for the expander in order to ensure that the input to the compiler is what we expect.

We'll continue to discuss feedback from community members and other projects as we work out how to test the compiler.

Debugging

We want the experience of writing new features and debugging errors to be as pleasant as possible for developers and users of Qi.

Visibility into Compiler Passes

As we noted last time, currently, following the expansion of a Qi expression in the Macro Stepper shows the result of expansion and compilation as a single step, with no visibility of intermediate stages of expansion and compilation. We saw that the reason for this is that the compiler is currently implemented as compile-time functions operating on syntax, rather than as macros. As a result, IDE tools don't see these as intermediate stages but as the implementation details of a single expansion pass.

We agreed that rewriting these as macros would be a good next step, as having visibility into intermediate stages would greatly improve the developer experience.

Some functions that may present some challenges here are the handling of bindings, which is a nonlocal transformation of syntax, and the use of find-and-map/qi for tree traversal and mapping.

Good Error Messages

Another aspect of debugging, but for users rather than developers of Qi, is having good error messages. We realized (and others including Ben had expressed concerns on this point a while ago as well) that runtime errors would naively implicate expressions not in the source code written by the user but in the code it compiles to. This would make error messages hard for the user to understand.

As a general approach here, by virtue of optimizations being implemented as macros that match patterns in the user code, we already have access to the source code in the context of writing the optimized version. We should therefore use what we know about the code to include the appropriate checks in the templates.

For instance, in the deforestation optimization, if a user supplies a number instead of a list, then naively, the code would break and complain that car expected a list in the implementation of the compiler utility cstream->list, which, obviously, the user would have no clue what this is referring to. Instead, we would add a contract to the expansion here that should validate that the input is a list. In this particular case, the contract would only need to be present in the stream producer function rather than in other functions that implement stream fusion in the compiler, and so we'd expect the performance impact to be minimal.

We also felt that it would be ideal if we could reuse contracts on functions written by the user (for instance, the contract on the map function), without having to rewrite it extrinsically in the compiler. We aren't sure if Racket provides a way to extract a contract from an identifier such as a function name, which could allow us to "unwrap" and then apply the contract in a way that we control. But in any case, it should be straightforward to implement the checks independently ourselves.

Producers, Transformers and Consumers

So far in our implementation of deforestation, we've implemented map, filter, foldr and foldl as fusable streams. We still have to implement unfoldr, upto, zip and average. We realized that we've implemented two classes of streams so far: transformers like map and filter, and consumers like foldl and foldr. We don't yet have any producers, and implementing them would be different from what we already have. We decided to start by introducing these as explicit categories in the code to help us think about these consistently in the codebase, renaming our syntax classes to fusable-stream-transformer and fusable-stream-consumer, and introducing a new class called fusable-stream-producer that we will flesh out the implementation of, next time.

upto (or range, in Racket) and unfoldr are producers, and it's likely that zip and average would be expressible as some combination of these (in the case of zip, it's likely just a matter of supporting multiple input values in the map transformer).

On the Road to Main Branch

We agreed that the following are the blockers that should be completed before we can merge the compiler work into the main branch:

  1. Implementing at least one fusable stream producer.
  2. Supporting multiple input values in fusable streams.
  3. Provide good error messages.
  4. Documenting the new work (both for users as well as developers), including the modified semantic guarantees.
  5. Validating that optimization passes don't prematurely terminate during tree traversal.

We agreed that adding continuations as a new core form(s) is not a requirement for release since it could get complicated and we don't have a clear usecase in mind yet, besides conceptual simplicity of having a primitive dynamic control mechanism as a core form (instead of try which feels more high-level).

Next Steps

(Some of these are carried over from last time)

  • Attempt to rewrite compiler passes to macros to afford visibility in e.g. the Macro Stepper.
  • Provide good error messages in optimized (e.g. deforested) expressions in terms of source expressions.
  • Validate that optimization passes can prematurely terminate if they don't return false when they don't transform the input (and then fix it).
  • Implement streams for the other list operations described in St-Amour's writeup (i.e. unfoldr zip average upto). Note: implementing average might benefit from having "multi-valued CPS" for deforestation.
  • Write unit tests for more compiler rewrite rules, as well as end-to-end tests for pairs of input expressions, and sanity tests for the expander.
  • Find a way to reliably detect list-like operations on values.
  • Rewrite these list-like operations to list-based functional pipelines, e.g. (~> (-< (>< f) (>< g)) ...)
  • Add additional nonlocal benchmarks to validate all cases optimized by stream fusion.
  • Generalize the stream implementation to support multiple values at each step (in addition to zero or one) and recheck the performance.
  • Review other core Qi combinators besides ~> (e.g. -<, == and ><, maybe if and pass) and see whether it would make sense to extend fusion to them.
  • Should Qi provide a map utility (overriding the default one) that supports "0 or more" values (as was discussed in the RacketCon talk)?
  • Design a core continuation form(s?) for Qi, and show how try can be implemented in terms of it.
  • Describe the sequence of compilation for input expressions with the optimizing compiler against the reference compiler, showing cases in which these do not terminate.
  • Come up with a methodology to formalize and theoretically validate compiler behavior.
  • Review other bindings cases that we could support next (e.g. selective binding, packing values, and binding exceptions in try).

Attendees

Dominik, Sid

Clone this wiki locally